EmbeddedRelated.com

How to Deploy Local LLMs for Embedded Software Development: Terminology and Motivation

Mohammed BillooMohammed Billoo May 12, 2026

In this blog post series, I walk you through creating a fully local, offline AI pipeline. In this first post, I outline the motivation and relevant terminology that are important before we dive into hardware selection and implementation of the pipeline.


Embedded Development Is Broken. Here's the Strategy I'm Betting My Company On.

Joe SchneiderJoe Schneider April 21, 2026

Here's a 79-word summary: Embedded software complexity is growing about 4x per decade while developer productivity grows 1.5x, and regulations like the EU CRA are widening the gap further. After running a firmware services company through this shift, I've come to see three things separating the teams that are pulling ahead: using AI where the work is actually hard, designing security in from day one, and reading the standards that govern their market (62304, 26262, CRA) before writing code, not after.


Small Language Models (SLMs): The Future of AI is Smaller, Faster, and Closer to the Edge

Rohit GuptaRohit Gupta March 30, 2026

AI industry is shifting from a "bigger is better" mentality to a focus on efficiency, localization, and real-world utility. The article argues that the AI industry is pivoting from massive, cloud-bound models toward Small Language Models (SLMs) designed for efficiency, speed, and edge deployment. Driven by the need to overcome cloud-centric hurdles like high latency, bandwidth costs, and privacy risks, SLMs (ranging from 100M to 14B parameters) leverage architectural innovations such as quantization, sparse attention, and high-quality synthetic data to deliver specialized intelligence on local hardware. Rather than replacing large models, SLMs represent a shift toward a hybrid intelligence future where the cloud provides depth while the edge provides real-time, sustainable action, ultimately moving the focus of AI progress from raw parameter count to practical, real-world utility.


Always-On Intelligence Without the Cloud: Why it matters more than you think

Shivangi AgrawalShivangi Agrawal February 5, 2026

Much of the AI conversation today is still focused on scale: larger models, more data, more compute. Embedded systems live in a different reality, where constraints are unavoidable, and efficiency is the priority. What’s emerging is not a smaller version of cloud AI, but a different approach altogether, the one that values locality, predictability, resilience, and trust. Always-on intelligence without the cloud isn’t just a technical milestone. It’s a change in how we think about where intelligence belongs.


3 Tips for Developing Embedded Systems with AI

Jacob BeningoJacob Beningo March 1, 2025

Explore how to leverage AI in developing embedded systems with three practical tips, learn why documenting your workflows, supercharging testing and debugging, and adopting AI-assisted code generation can save time, reduce errors, and boost performance in your projects, and discover actionable insights to streamline development in resource-constrained environments, this blog explains how to prepare for AI integration while keeping the expertise of experienced engineers intact, offering real-world examples that show how even incremental AI adoption can revolutionize your development process, whether you’re new to AI or seeking to enhance existing practices, these strategies provide a clear roadmap to build smarter, more efficient embedded systems using AI.


Getting Started With CUDA C on an Nvidia Jetson: GPU Architecture

Mohammed BillooMohammed Billoo March 28, 2024

In the previous blog post (Getting Started With CUDA C on Jetson Nvidia: Hello CUDA World!) I showed how to develop applications targeted at a GPU on a Nvidia Jetson Nano. As we observed in that blog post, performing a calculation on a 1-D array on a GPU had no performance benefit compared to a traditional CPU implementation, even on an array with many elements. In this blog post, we will learn about the GPU architecture to better explain the behavior and to understand the applications where a GPU shines (hint: it has to do with graphics).


Unraveling the Enigma: Object Detection in the World of Pixels

Charu PandeCharu Pande February 8, 2024

Exploring the realm of embedded systems co-design for object recognition, this blog navigates the convergence of hardware and software in revolutionizing industries. Delving into real-time image analysis and environmental sensing, the discussion highlights advanced object detection and image segmentation techniques. With insights into Convolutional Neural Networks (CNNs) decoding pixel data and autonomously extracting features, the blog emphasizes their pivotal role in modern computer vision. Practical examples, including digit classification using TensorFlow and Keras on the MNIST dataset, underscore the power of CNNs. Through industry insights and visualization aids, the blog unveils a tapestry of innovation, charting a course towards seamless interaction between intelligent embedded systems and the world.


How to Implement Image Processing Algorithms in FPGA Hardware

Lance HarvieLance Harvie December 17, 2023

Recognized for their parallelism and reconfigurability, FPGAs prove ideal for real-time processing in medical imaging and computer vision. The step-by-step approach starts with understanding FPGA basics, emphasizing their reconfigurable nature and parallel processing. It guides users in algorithm selection based on factors like processing speed, resource utilization, and adaptability, then highlights designing modular and scalable algorithms. The process includes simulation for verification, synthesis using tools like Xilinx Vivado and Intel Quartus Prime, interfacing with image sensors, and testing on real hardware. The conclusion underscores FPGA's advantages in image processing, presenting ongoing opportunities for innovation in diverse industries.


Embedded Systems Co-design for Object Recognition: A Synergistic Approach

Charu PandeCharu Pande November 4, 2023

Embedded systems co-design for object recognition is essential for real-time image analysis and environmental sensing across various sectors. This methodology harmonizes hardware and software to optimize efficiency and performance. It relies on hardware accelerators, customized neural network architectures, memory hierarchy optimization, and power management to achieve benefits like enhanced performance, lower latency, energy efficiency, real-time responsiveness, and resource optimization. While challenges exist, co-designed systems find applications in consumer electronics, smart cameras, industrial automation, healthcare, and autonomous vehicles, revolutionizing these industries. As technology advances, co-design will continue to shape the future of intelligent embedded systems, making the world safer and more efficient.


An Iterative Approach to USART HAL Design using ChatGPT

Jacob BeningoJacob Beningo June 19, 202311 comments

Discover how to leverage ChatGPT and an iterative process to design and generate a USART Hardware Abstraction Layer (HAL) for embedded systems, enhancing code reusability and scalability. Learn the step-by-step journey, improvements made, and the potential for generating HALs for other peripherals.


3 Tips for Developing Embedded Systems with AI

Jacob BeningoJacob Beningo March 1, 2025

Explore how to leverage AI in developing embedded systems with three practical tips, learn why documenting your workflows, supercharging testing and debugging, and adopting AI-assisted code generation can save time, reduce errors, and boost performance in your projects, and discover actionable insights to streamline development in resource-constrained environments, this blog explains how to prepare for AI integration while keeping the expertise of experienced engineers intact, offering real-world examples that show how even incremental AI adoption can revolutionize your development process, whether you’re new to AI or seeking to enhance existing practices, these strategies provide a clear roadmap to build smarter, more efficient embedded systems using AI.


An Iterative Approach to USART HAL Design using ChatGPT

Jacob BeningoJacob Beningo June 19, 202311 comments

Discover how to leverage ChatGPT and an iterative process to design and generate a USART Hardware Abstraction Layer (HAL) for embedded systems, enhancing code reusability and scalability. Learn the step-by-step journey, improvements made, and the potential for generating HALs for other peripherals.


How to Implement Image Processing Algorithms in FPGA Hardware

Lance HarvieLance Harvie December 17, 2023

Recognized for their parallelism and reconfigurability, FPGAs prove ideal for real-time processing in medical imaging and computer vision. The step-by-step approach starts with understanding FPGA basics, emphasizing their reconfigurable nature and parallel processing. It guides users in algorithm selection based on factors like processing speed, resource utilization, and adaptability, then highlights designing modular and scalable algorithms. The process includes simulation for verification, synthesis using tools like Xilinx Vivado and Intel Quartus Prime, interfacing with image sensors, and testing on real hardware. The conclusion underscores FPGA's advantages in image processing, presenting ongoing opportunities for innovation in diverse industries.


Shibboleths: The Perils of Voiceless Sibilant Fricatives, Idiot Lights, and Other Binary-Outcome Tests

Jason SachsJason Sachs September 29, 2019

Binary tests look simple until you try to pick a threshold, because false positives, false negatives, and base rate all collide. Jason Sachs uses a deliberately absurd detective story, then walks through the math of expected value, medical screening tradeoffs, idiot lights, and even a triage-style three-way decision. The payoff is a practical way to think about when a pass/fail signal helps, and when raw data or a second test is worth the extra complexity.


AI at the Edge - Can I run a neural network in a resource-constrained device?

Stephen MartinStephen Martin March 11, 20192 comments

AI at the edge is no longer science fiction, it can run on tiny, resource-constrained devices like Arm Cortex-M4 and M7 microcontrollers. This post introduces inference-only neural networks on MCUs, explains why edge AI matters for power, latency, and privacy, and points to practical toolchains such as STM32Cube.AI, Arm NN, and AWS Greengrass to get started quickly.


STM32 B-CAMS-OMV Walkthrough

Peter McLaughlinPeter McLaughlin April 30, 20231 comment

Want to prototype embedded vision quickly? This walkthrough shows how the STM32 B-CAMS-OMV camera module pairs with the STM32H747I-DISCO discovery kit and the FP-AI-VISION1 function pack to get you running in minutes. The video covers the camera connection interface, key software functions to control and process data, and the ISP features that let image processing run inside the camera. The STM32 H7 project with B-CAMS-OMV drivers is available on GitHub.


Getting Started With CUDA C on an Nvidia Jetson: GPU Architecture

Mohammed BillooMohammed Billoo March 28, 2024

In the previous blog post (Getting Started With CUDA C on Jetson Nvidia: Hello CUDA World!) I showed how to develop applications targeted at a GPU on a Nvidia Jetson Nano. As we observed in that blog post, performing a calculation on a 1-D array on a GPU had no performance benefit compared to a traditional CPU implementation, even on an array with many elements. In this blog post, we will learn about the GPU architecture to better explain the behavior and to understand the applications where a GPU shines (hint: it has to do with graphics).


Debugging DSP code.

Mark BrowneMark Browne May 1, 2019

Strange sinusoidal confidence scores from an HTM neural model revealed a familiar class of DSP bugs. Drawing from forum troubleshooting, the post maps common root causes: signed versus absolute value errors, wrong intermediate references, scaling mistakes, and sampling/stride problems in integer math. Embedded engineers will recognize the diagnostic clues and practical suspects to check first when DSP outputs vary with the input.


“Smarter” cars, unintended acceleration – and unintended consequences

Michael J. PontMichael J. Pont October 20, 2015

Smarter cars are arriving fast, but the software tricks behind them may be creating new safety and compliance risks. This post connects Tesla’s autopilot, the VW emissions scandal, and a reported Porsche throttle-delay case to ask whether automotive standards and regulations are keeping pace with increasingly intelligent vehicle control systems.


Unraveling the Enigma: Object Detection in the World of Pixels

Charu PandeCharu Pande February 8, 2024

Exploring the realm of embedded systems co-design for object recognition, this blog navigates the convergence of hardware and software in revolutionizing industries. Delving into real-time image analysis and environmental sensing, the discussion highlights advanced object detection and image segmentation techniques. With insights into Convolutional Neural Networks (CNNs) decoding pixel data and autonomously extracting features, the blog emphasizes their pivotal role in modern computer vision. Practical examples, including digit classification using TensorFlow and Keras on the MNIST dataset, underscore the power of CNNs. Through industry insights and visualization aids, the blog unveils a tapestry of innovation, charting a course towards seamless interaction between intelligent embedded systems and the world.