Speech codecs in VoIP

Audio, Blog

One of the many advantages of VoIP, and in digital communication technologies in general, is having complete control over the audio quality and fidelity of the transmitted signals. This post summarizes some of the commonly used speech compression techniques in VoIP, provide audio examples, and offers some tips to integrators.

Toll Quality

Traditional analog voice telephony networks use the equivalent of 64kbps of data transmission to represent 3.3kHz bandwidth audio fidelity for transmitting voice information in each direction of the call. VoIP networks can use bit rates ranging from 5.3kbps to 64kbps or higher if desired. Due to tremendous advances in the field of digital signal processing, speech and audio compression technologies enable quality improvements and bit rate reductions that can support than ‘toll’ quality fidelity at 64kps bit rates and lower. Common examples of this include G.722 7kHz audio compression and newer standards such as G.719 and the Polycom Siren 22 and Siren 14 codecs. There’s a significant body of literature espousing the benefits to listeners of higher fidelity audio compression schemes for collaboration. Higher fidelity audio improves intelligibility and lower listener fatigue.

Advances in speech compression have made very low bit rate transmissions that preserve much, but not all, of the narrowband 3.3kHz audio fidelity possible. Low bit rate connections are used to reduce telecommunication costs or to make services available that weren’t possible before. The speech quality trade-off can be significant or modest depending on the bit rate and speech coding technology.

For integrators installing VoIP solutions, it’s important to understand that the codecs selected are negotiated between two endpoints that are initiating a call. If one end of the call only supports low bit rate transmissions, then that’s the quality of the call. If that quality isn’t sufficient, then the only recourse is to work with the local telecom teams to select a higher quality speech compression system. The few times I’ve seen this really be an issue is because G.723.1 at 5.3kbps was the required speech compression system and the resulting fidelity did not make the system sound good to the remote participants. When that happens, the integrator has to work with the IT team because it’s likely not the room generating quality issues at that point, it’s the speech compression scheme that’s introducing artifacts into the system.

Audio Examples

Here are some examples of the resulting audio fidelity at various bit rates and speech compression systems.

G.722 (7kHz bandwidth, 64kbps bit rate)

G.711 (3.3kHz bandwidth, 64kbps bit rate)

G.729 (3.3kHz bandwidth, 8 kbps bit rate)

G.723.1 (3.3kHz bandwidth, 5.3 kbps bit rate)

The fidelity is very high with G.722 and the bandwidth is noticeably higher than the other samples. The bit rate and subjective quality reduce through each subsequent example. Finally with G.723.1 you can hear audio artifacts.

What’s important to remember?

Understand that there are different types of audio codecs that may be negotiated between VoIP endpoints. It’s helpful to understand with the products you use how to determine which codecs are available and which have been used in calls in case you need to troubleshoot audio issues.

If you find that the codecs selected are always very low bit rate codecs such as G.723.1 and users are complaining of audio issues, ask if a higher bit rate codec such as G.729 (8kbps) or G.711 ulaw (64kbps) is an option.

For a higher quality experience, G.722 at 7kHz bandwidth and 64kbps provides outstanding intelligibility. The only problem with higher than 3.3kHz bandwidth is that if your calls are routed onto the PSTN network any fidelity above 3.3kHz is lost.