by Elia Shenberger, Senior Director Product Marketing at Ceva.

There’s lots to do, but with the right platform much of the work may already be done for you.

Like it or not, this is the age of AI; any product lacking an AI angle is doomed to be ignored. Yet AI isn’t just a buzzword. Now it is playing an important role in improving user experience in many products, none more so than in hearables: in earbuds, headphones, and hearing aids.

“How hard can this be?” you might think—just add an neural processing unit (NPU), a model to run on the NPU, and you’re done, right? If only it were that simple. AI is now playing diverse roles in hearables, in audio, sensing, and communication, requiring a more careful approach to design competitive systems for today’s and tomorrow’s markets.

Use model complexity

Part of the problem comes from the growing number of use cases in hearables that require AI support, and part comes from the growing complexity of the AI models. We’ll get to model complexity later. Let’s start with audio quality. In hearables this is heavily dependent on the ability to suppress background noise, especially for earbuds and hearing aids, which lack the big ear cushions we get with headphones. Conventional noise cancellation techniques can suppress steady levels of noise but not varying street (or home) noise or background conversation babble.

Adaptive noise cancellation is more intelligent, using microphones on the ear both inside and outside the device to support some level of adaptation. It is possible to use AI purely to enhance this objective; more importantly, AI support is essential to support transparency mode, allowing a single speaker to breakthrough when they speak to you directly. AI can separate speech from noise, passing only the speech signal through, offering a huge quality boost both for phone calls and hearing aids.

AI demands in hearables

A second application supports audio personalization. Everyone’s hearing is unique—even your left and right ears are different. For hearing aid users, audiologists run calibration tests with a patient to optimize device settings to that individual user. However, relatively cheap over-the-counter hearing aids become less attractive if you must also pay for an audiologist to tune them to your needs. Similarly, if you don’t have a hearing impairment and just want to wear earbuds, personalization can enhance your listening experience, but audiologist visits are not going to be attractive. The trend now is to self-assisted calibration—a user tuning their experience through an app or gestures, driving an AI-based learning system within the earpieces.

Meantime there is growing awareness that our ears may be better health and fitness sensing points than our wrists, to measure temperature and heart rate, to count steps and to detect balance issues. All these capabilities require sensing along with AI to intelligently process that sensing.

In gaming, wireless latency must be controlled to ensure seamless audio response, synchronized with video. In electromagnetically noisy or congested environments, AI systems can adaptively switch Wi-Fi traffic between different bands (2.4GHz, 5GHz, 6GHz) as the environment changes and the user moves around. Imagine the benefit of this kind of support in gaming headphones.

The list goes on. These examples already make clear that multiple AI models may need to be supported in hearables devices.

Representative AI subsystem

Software/model support

Long ago in AI terms, an edge AI model could be supported by relatively simple signal processing flow, plus a CNN to handle noise suppression, plus some feature detection. Models today span a much wider range. Some sensing may work OK with a CNN, but more complex sensing (for detecting balance issues) requires sensor fusion, while adaptive noise cancellation and speech filtering require a DNN or RNN at minimum, maybe even a transformer in some of the more advanced models. Personalization and communication latency optimization may also require in-device learning.

Point being that devices might need to support multiple models, maybe concurrently (noise cancellation and transparency for example). Model training may start on one of several possible platforms (e.g., LiteRT or Keras), while recognizing that the hardware might need to allow for parallel model processing later. Next, models will be translated through to the target hearable hardware. A good starting point here is to look for an embedded platform that already supports a rich ModelZoo for the framework and target application. Think of an AI equivalent to optimized software libraries. Developing these yourself will add cost and push out release schedules.

Next, you quantize each model, converting Bfloat16 to int8 for example, and build a runtime. The runtime also depends on a rich set of libraries for operators (e.g., ONNX) and for feature extraction from audio streams (in this context). Here you might want to add a custom C/C++ function for a specialized operation in the model. Now you are ready to compile to a virtual simulation model representing your AI model plus a virtualized instance of your target hardware.

This is the same flow you will use to compile to the ultimate silicon target; importantly, the simulation flow allows you to experiment with choices: quantization (including mixed quantization options), different codec options, even different architecture choices in your source network model. The simulator should give reliable estimates of performance and latencies, ensuring faithful equivalence in final silicon.

Hardware support

The first and most obvious constraint here is that the signal processing/AI hardware core must be both very small and very power efficient. Ideally it should be able to operate standalone without need for a supporting MCU or DSP, in which case it must be able to manage control and signal processing functions within the core. Especially in earbuds and hearing aids markets, time between recharges is critical for product acceptance.

Things to look for are an inherently low standby power, of course, but also aggressive power saving through power switching and DVFS. Another factor that may be less familiar to non-AI experts is sparsity in inference matrices. Taking full advantage of this sparsity can have a significant influence on power consumption (and performance).

A big part of the power-saving game in inference is minimizing accesses to DRAM. This demands sufficiently large (yet still cost effective) data memory/cache to handle the audio stream, feature processing, and model processing. In addition, parameter compression has become a standard expectation. Parameters are compressed offline during the compile step, then decompressed in the core before calculation. As parameter counts continue to grow, this compression paradigm has become unavoidable.

Other things to look for: If multiple models must run concurrently, the accelerator will need to support multi-core instances for signal processing and neural net processing; for future-proofing, all data inputs, models, and libraries should be based on widely accepted standards and should be extensible, as needed, for your applications. Models will continue to evolve; you need to be sure that what you build today can be upgraded to the latest and greatest model in the future.

There’s lots to do, but with the right platform much of the work may already be done for you. At Ceva, we provide a range of NPU IP embedded AI processors along with our popular wireless IoT solutions. We would be happy to discuss how we can help meet your hearables product goals.

www.ceva-ip.com