Revolutionizing products through embedding machine learning directly in the device.

By Zibo Su, Product Manager at Rutronik, and Daniel Fisher, Senior FAE EMEA at Gowin Semiconductor.

Creating machine learning (ML) applications involves coordinating various technical fields, yet many companies lack comprehensive in-house expertise. To address this, data scientists, ML engineers, and software developers are recruited to craft, refine, and evaluate ML models. A common hurdle is that these models are not typically designed for embedded hardware or mobile devices due to the engineers’ lack of familiarity with the constraints of such platforms. Optimization and quantization are necessary to deploy models on mobile System on Chips (SoCs), Microcontroller Units (MCUs), and Field- Programmable Gate Arrays (FPGAs).

Semiconductor manufacturers are tasked with designing products that meet evolving demands for performance, cost, and size, all within tight market deadlines. These products must be versatile in terms of interfaces, inputs, outputs, and memory to serve diverse applications.

TensorFlow Lite

Things have become somewhat easier in recent years thanks to Google’s TensorFlow Lite. This open source platform for machine learning now includes scripts that can be used to optimize and quantize machine learning models in a “FlatBuffers” file (*.tflite). It uses parameters configured for a certain application environment. Ideally, embedded hardware should directly import FlatBuffer files from TensorFlow, bypassing non-standard optimization methods. This allows engineers to efficiently utilize quantized and optimized FlatBuffer files on SoCs, MCUs, and FPGAs.

SoCs, MCUs, and FPGAs

Embedded hardware platforms only have limited resources, are not especially great for development purposes, and are complicated to use. But they do offer low power consumption, low cost, and modules with small dimensions as a reward. What do SoCs, MCUs, and FPGAs offer? SoCs offer the highest performance and many standard interfaces but are power-intensive and costly due to their chip space requirements. MCUs boast low power usage and compact size but have limited ML capabilities and specialized interfaces.

FPGAs strike a balance, offering a range of packages and flexible inputs and outputs, supporting various interfaces without excess chip space. Their configuration options also enable cost and power consumption to be scaled with performance and allow additional functions to be integrated. However, FPGAs lack support for SDK platforms like TensorFlow Lite.

ML FPGAs

To overcome this flaw, Gowin Semiconductor provides an SDK on its GoAI 2.0 platform that extrapolates models and coefficients, generates C code for the ARM Cortex-M processors integrated in the FPGAs, and generates bitstreams and firmware for the FPGAs. Another challenge lies in the substantial flash memory and RAM needed for ML models. Gowin’s hybrid μSoC FPGAs, like the GW1NSR4P, embed additional PSRAM to meet these demands.
The GW1NSR4P features a GoAI 2.0 coprocessor for accelerated processing and storage of folding and pooling layers working alongside its Cortex-M IP core to manage layer parameters and model processing. Gowin’s GoAI design services program helps users searching for a one-chip solution for classification or for assistance with implementation for tested, trained models “off the shelf,” but who don’t know how to communicate with the embedded hardware.

Conclusion

Local, embedded ML is a popular and constantly growing field for product developers. However, there are challenges as engineers are required to develop these solutions. Some providers of programmable semiconductors have responded to this need by using popular ecosystem tools for embedded hardware and by offering devices with flexible interfaces, expanded memory, new software tools, and design services.

www.rutronik.com