Exploring and Characterizing Large Language Models for Embedded System Development and Debugging

Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, Vikram Iyer
PDF Video Code
Illustration of human AI co-design for embedded system development where human developers collaborate with LLM-based AI agents to generate code and iteratively debug system behavior.

Abstract

Large language models (LLMs) have shown remarkable abilities to generate code. However, their ability to develop software for physical computing and embedded systems, which requires cross-domain hardware and software knowledge, has not been thoroughly studied. We observe through our experiments and a 15-user pilot study that even when LLMs fail to produce working code, they can generate helpful reasoning about embedded design tasks, as well as specific debugging suggestions for both novice and expert developers. These results highlight the potential to develop AI assistants to dramatically lower the barrier to entry for working with hardware. To evaluate the capabilities and limitations of LLMs, we develop an automated testbench to quantify LLM performance on embedded programming tasks and perform 450 trials. We leverage these findings to analyze how programmers interact with these tools including their productivity and sense of fulfillment and outline a human-AI collaborative workflow for developing and debugging embedded systems.