How Acoustic Echo Cancellers (AEC) Work

Audio, Blog

Acoustic Echo Cancellers (AEC’s) are integral parts of our teleconferencing systems and are installed every day in conference rooms around the globe. AEC’s are systems that adapt to changes in the room to improve the audio experience for remote participants by preventing echoes of the remote talker’s voice from being sent back to the remote talkers by way of the local talkers loudspeakers and microphones. In this post we’ll explore how AEC’s work and highlight the primary rules to keep in mind when working with AEC systems.

As the remote talker’s audio comes out of the loudspeaker in the local room, that audio will bounce around the room as well as the local participants and then be picked up by the microphones. The goal of the local room AEC is to subtract that echo it has modeled from the microphone signal. Any residual echo that might not be subtracted is run through a non-linear processor that will decide whether to suppress the residual signal or let it through to the remote participants.

To make the AEC problem a bit easier to analyze, we consider acoustic echoes to be linear and time-invariant.

Linear means that the echoes that are heard by the local microphones are sums of individual echoes from the room. Think of an acoustic echo as a ray that reflects off the walls, surfaces, local participants, and bounces into the microphone. The sum of all these rays is the acoustic echo heard by the microphone.

Time-invariant means that for the echoes will be the same whether a remote user talked today or they talk in two weeks from now – basically the structure of the room and the surfaces don’t change. Although this assumption isn’t entirely true when there are local participants because people do move around, the temperature in the room changes, and drapes may be open or closed, but typically those effects are slowly time varying and the acoustic echo is still modeled as a linear time-invariant system – albeit slowly varying.

Due to these assumptions, most, if not all implementations of an AEC create a linear model of the room and continue adapt the model until the residual signal is small resulting in a the model closely approximating the room reflections as shown in the following figure and described next.

In a typical system, remote audio is received from remote participants by teleconference or videoconference or some other means. Local talkers in the room want to speak freely with the remote participants and the local microphones in the room pick up the local talker audio as well as the remote talkers echo. The AEC is the shaded area in the figure and is described below.

The ‘h’ block is the filter model of the room that represents the room conceptually as a finite impulse response (FIR) filter that is a sequence of amplitudes (i.e., echo strengths) and time delays. The receive audio from the remote participants feeds the amplifier in the room and also feeds the AEC as the ‘reference signal’. The audio that is picked up by the microphone has the output of the ‘h’ filter subtracted and the residual signal is run through a non-linear processing block (NLP) and the result of that is what is transmitted back to the remote participants. A performance measurement block will gauge how well the system is performing and drive the adaptation algorithm to perform better.

The first rule of AEC’s is that all the remote audio that will be played into the local room should be part of the local room’s AEC reference so the AEC knows what audio it should remove from the microphone signal and what it should keep. Usually this is an easy rule to follow as most DSP products allow you to build the AEC reference through a matrix where you have access to all the appropriate input signals from remote participants.

The (mostly) slow variation of echo paths in the room requires an AEC to adapt to changes in the room. As people move in the room, shades are opened, new participants enter a room, etc., the residual error will increase and that will cause the AEC to adapt to the new conditions. During this time, there may be more residual echo that is either suppressed by the NLP box or sent to the remote participants as a blast of echo when the remote participants are talking.

The AEC control block helps the AEC determine when it should adapt (i.e., when remote audio is played into the room while the local talkers are quiet) and when it should not adapt (there is no remote audio played in the room and there is only local audio in the microphone signal). Much of the performance differences between AEC’s are in the magic of when to adapt and how quickly the system can adapt to accurately model the room and successfully remove the echo from the transmit signal to the remote participants.

The second rule of AEC’s is that a microphone should not be in its own reference signal. If it is, then the resulting output of that microphone, after the AEC processing, will either be nothing (if the AEC is doing its job well) or some clipped and suppressed version of the microphone signal. Basically if a local microphone is in its own reference, the remote participants will not be able to hear and understand the local participants because the local AEC will be working to remove the local talkers audio from the transmitted signal. Usually this rule is easy to meet, except when there is also sound reinforcement going on in the local room – then care must be taken to ensure the reference meets rule number 1 and rule number 2. This usually entails creating a special mix of audio that is used as the reference for microphones.

Acoustic echo cancellers are adaptive systems that can work very well if setup properly. Understanding the principles of how AEC’s work can help you better understand more complicated systems.