New digital microphones are ideal for use with GenAI system designs.
By Uday Mudoi, Vice President & General Manager, Microphone Business Unit at InvenSense, a TDK Group Company.
Advancements in artificial intelligence (AI) technology, especially in generative AI (GenAI), are driving the need for voice interfaces to become more capable, natural, and user-friendly. As voice input becomes an increasingly frequent human-machine interface (HMI), there’s a growing need for high-quality MEMS microphones.
Microphone innovations drive higher quality
Today’s electronics devices need microphones that are expertly equipped to capture sounds and voices with a high level of accuracy in any environment. With advances in micro-electromechanical systems (MEMS), we now have tiny microphones that come closer than ever to delivering studio recording quality while operating at ultra-low power levels, which is essential for battery-powered consumer electronics.
These MEMS microphones deliver high signal-to-noise ratio (SNR) for better sound quality, and high acoustic overload point (AOP) to operate in noisy environments. Digital MEMS microphones enable simpler designs because they incorporate codecs (coder/decoders), rather than requiring placement of a separate codec. Additionally, these digital microphones are more programmable, delivering a more customizable user experience.
The newest MEMS microphones offer various digital interface options in the market, such as PDM and I2S, offering system designers flexibility of application processor choices. For example, a microphone with an I2S interface (such as the TDK ICS-43434 or T5848) reduces processing requirements such as filtering the microphone output in the system hardware or software.
New digital microphones (such as the TDK T5838 and T5848) also integrate innovative features like acoustic activity detect (AAD) to enable more power-efficient always-on edge technology, which makes them ideal for use with GenAI system designs.
Acoustic activity detect: voice activation at lower power
The problem with many sound- and voice-activated devices is that the events they aim to respond to occur only occasionally, yet devices must remain on and operating at full power to detect these events. This is a waste of power under the best of circumstances, but for battery-powered products— including most IoT and edge AI systems—it can deplete the batteries quickly.
The always-on nature of IoT devices is supported by low-power, sound-activated MEMS microphones. TDK’s AAD technology allows microphones to rest in ultra-low power standby mode, where they can be triggered from a power- down mode operating on as little as 20μA.
ADD supports multiple modes of operation with choices that enable current consumption of 20μA or 137μA. While the analog mode operates at the minimal amount of power possible (20μA), the digital modes (including the 137μA mode) have more options for filters to reduce false positives. Design engineers can program applications to toggle between analog and digital modes, as well as to wake on sound with the clock off.
High SNR and AOP offer exceptional noise- cancelling performance
New IoT products also demand excellent performance in noisy conditions. If a microphone- enabled system is engineered to cancel loud, transient noises, it must
be able to first accurately capture the noise, exhibiting low distortion. The acoustic overload point (AOP) defines the maximum sound level that a microphone can faithfully capture, with ranges up to 133dB SPL.
High SNR is essential
for noise-cancelling headphones, but high SNR combined with high AOP enables robots and other AI-enabled products to operate in outdoor or noisy environments. Whether for a factory floor or an urban street, high SNR and high AOP microphones makes it possible to distinguish relevant sounds from background noise.
With an innovative AAD feature, MEMS microphones are poised to respond quickly and correctly, even on the merest trickle of power. The processor, which can be anything from a simple microcontroller up to an AI processor, can rest in a powered-down state, ready to immediately power back up and respond based on triggers from the always-on microphone. This configuration enables any voice-activated product to rest at minimal power levels, yet quickly and correctly respond to voice commands even in noisy environments. Power is consumed only if a noise worth processing is detected.
Applications for voice-or audio-activated MEMS microphones
Examples of the growing number of voice-activated products include TV remotes, cameras (including everything from security cameras to video doorbells), connected home devices, smartphones, true wireless stereo (TWS) earbuds, tablets, Bluetooth headsets, smart speakers, and notebook PCs. For all these use cases, the microphone must always be ready to accurately capture audio. By offering digital microphones with AAD at power levels as low as 330μA, TDK allows customers to provide lowest system level power implementation along with clearer audio capture.
A high SNR is essential for accurate voice or sound recognition and processing in Edge AI systems, whether assisted by cloud processing or not. A high SNR of 68dB, for example, enables identification of particular sounds, such as a specific individual’s voice. More accurate voice recognition leads to error reduction in AI interpretations and responses.
Accuracy in sound capture is also improved by installing an array of microphones. The direction of a source of sound can be determined by calculating the differences between the times of arrival and the strengths of the sounds received by the respective microphones. This technology, called beamforming, provides
a useful adjunct to noise cancellation. Beamforming can help distinguish target sounds from background noises coming from various directions.
In combination, the capabilities of wide dynamic range, high SNR, and beamforming can deliver a better user experience for systems such as conference room speakers, active noise-cancellation headphones, or any IoT device that needs to cancel or ignore loud extraneous sounds like doors slamming, wind noise, and alerts.
Future AI-enabled devices
As AI makes real-time voice transcription and real-time translation more useful and common, microphone performance and programmability become fundamental to audio or voice enabled AI applications. While microphones are already an essential part of consumer electronic devices, we’ll see an increasing number of products that embed multiple microphones.
To aid this transition, TDK supports a rich ecosystem of AI processor partners who have reference designs with ready-to- use software to enable quick time-to-market for system designers. Innovations like AAD, along with developer ecosystem support, will enable new user experiences that we can only begin to imagine.