AC-3 Audio Codec

	Introduction
	Psycho-Acoustic Model
	Subband Filter Bank in AC-3
	AC-3 Audio Codec

Introduction

　　　 AC-3 audio coding scheme is developed by the Dolby Company. Dolby Digital, the audio standard used in film industry, DVD, multimedia, HDTV, Dolby Surround Digital（the audio standard used in Home Theater System (HTS)）, and Dolby Net（used in the internet environment）, all refer to the same kernel -- the AC-3 audio coding technology. To feed these quite different demands of all these different areas, the target bit-rates of AC-3 ranges from 32k bps to 640k bps.

　　　 AC-3 allows encoder to provide special bit stream information for decoder to reconstruct the original "audio environment" of that audio sequence. For example, a dialogue parameter is used for sound level uniformity if several different AC-3 sequences are needed to playback. A room type parameter allows decoder to simulate the surround effect of the original space. Parameters of dynamic range control and the sound level compression are also used for specific playback.

　　　 AC-3 is a perceptual audio coder (PAC), that is, AC-3 uses human psycho-acoustic features to "mask" the inaudible audio signals, so the bits are saved for really important signals. Up to 5 full-bandwidth channels and one subwoofer channel (cutoff at 120Hz) can be contained in an AC-3 bitstream (this is so-called 5.1 channels). The time-frequency transform used in AC-3 is Analysis/Synthesis Filter Bank with Time Domain Aliasing Cancellation. A parametric bit allocation process is applied in AC-3, so flexibility and efficiency can be achieved at the same time. With sophisticated psycho-acoustic model and the decorrelation between neighbor frequencies and different channels, AC-3 provides low-bandwidth but high quality perceptual audio sound.

Psycho-Acoustic Model

　　　 AC-3 designed a parametric psycho-acoustic model such that flexibility, efficiency, and low bandwidth requirement can all be achieved in the bit allocation process. When considering the masking effect, AC-3 only adopts the upward spectral masking effect since the spectral spreading function is skewed toward the high frequency side (Fig. 1) and the temporal masking is not so important and is considered inapparently in other processes of coder.
　

Figure 1: The spectral masking curve (1kHz, 60dB)

　　　 The different behaviors of tonal and non-tonal signals are not separated for simplicity and efficiency; instead, AC-3 compares signals with a given threshold. Signals above the threshold are given more bits. AC-3 takes quantization into account so that the distortion due to signal level loss is inaudible. Coupling strategy is used to take advantage of the insensitivity to high frequency of the HAS localization. Coupling is not combined into the psycho-acoustic model in AC-3.
　
　　　 Different banding structure and psycho-acoustic model can also be used in this model with some overheads in the encoder. Encoder calculates both the original masking curve and the new masking curve, packing the difference in the bit stream. Decoder decodes the difference and adds it to the calculated original masking curve.
　
　　　 Six steps are contained in the bit allocation process. All the operations needed are only addition, maximum, shift, and table-look-up in all these six steps.

Figure 2: The block diagram of AC-3 bit alocation process

Subband Filter Bank in AC-3

　　　 In PAC encoders, a "block" of audio signals must be windowed, and passed through an analysis filter. With the filtered data, the bit allocation process can understand the frequency behavior of this audio block and decide to what detail should one coefficient be coded. On decoder side, the frequency data would be reversely windowed, and passed through a synthesis filter. Then, the reconstruction of the time-domain data would be accomplished by the final overlap-and-add process.

　　　 In this analysis/synthesis filter bank structure, error occurs because of the filters is not perfect; sidelobes always introduce alias. Error also introduced because of quantization due to bit allocation. If quantization error is ignored, Perfect Reconstruction (PR) does really exist. Frequency Domain Aliasing Cancellation (FDAC) and Time Domain Aliasing Cancellation (TDAC) filter bank structures are the most commonly used. Alias in FDAC and TDAC are both canceled after the final overlap-and-add process. In FDAC, the alias due to the analysis filter is compensated in the frequency domain, i.e., the high frequency component of low band is canceled by the low frequency component of high band. TDAC compensates the alias in the time domain, i.e., the alias of the previous block is canceled by the alias of current block.

AC-3 Audio Codec

Flow charts of AC-3 encoder and decoder

Digital Signal Processing Group