68
| Appendix B
Important!
Due to a perceptual coder's reliance on precisely modeling principles of human perception, audio
to be coded should not be processed with any non-linear dynamics processing, such as clipping,
multi-band compression, or hard limiting. Wideband compression, or AGC, is acceptable, and may
be desirable if consistent level cannot otherwise be achieved.
The same is true of audio that has been decoded, after passing through a perceptual coding cycle,
but to a much lesser degree.
For more information on this, follow the link to our Omnia Audio website that has a paper delivered
at the AES by Frank Foti on this topic: http://omniaaudio.com/techinfo/default.htm
The steps involved in the perceptual coding process are shown below:
The components work as follows:
The analysis filter bank divides the audio into spectra components. Sufficient frequency resolution must
♦
be used in order to exceed the width of the ear's critical bands, which is 100Hz below 500Hz, and 20% of
the center frequency at higher frequencies.
The estimation of masked threshold section is where the human ear/brain system is modeled. This
♦
determines the masking curve, under which noise must fall.
The audio is reduced to a lower bit rate in the quantization and coding section. On the one hand, the
♦
quantization must be sufficiently coarse in order to not exceed the target bit rate. On the other hand, the
error must be shaped to be under the limits set by the masking curve.
The quantized values are joined in the bit stream multiplex, along with any side information.
♦
Doing audio coding effectively means managing several tradeoffs. Most important is the number of samples coded
together in one frame. Long frames have high delay, but are more efficient because the header and side informa-
tion is transmitted less frequently. Longer frames offer the possibility to use filter banks with better frequency
resolution. A fundamental principle in signal processing is that spectral splitting filters may have either good time
resolution, or good frequency resolution, but not both. This makes sense when you consider that a longer time
window means that the analyzer has more complete information – more full audio cycles to work with.
In the case of rapidly changing input signals (transients), long frames are poorer than short ones because the time
spread will lead to so-called "pre-echoes." For such signals, the size of the frame should correspond to the temporal
resolution of the human ear. This can be achieved by using short frames or by changing the frame length according
to the immediate characteristics of the signal.
Need help?
Do you have a question about the Z/IP ONE and is the answer not in the manual?