Audio Wave and Spectral Examples

These images are waveform and spectral examples to go along with the audio encoding papers. Your "scopes" (graphs) are your friend. Learn how to properly use them and it will be hard to not produce good sounding audio.

These pictures are generally too big to show in line, so view each one as you read each section.


Generally speaking, this is what you should be striving for doing a final encode. The levels are high enough to use the full bandwidth (but not clipping) and there is moderate dynamic range (but not too low).


This came from a long news type documentary program. There is mostly narration in it. Notice how flat the ranges are from the compression (probably from the source)? This is correct for this type of program and will not sound bad at all.


This came from a drama episode and is mostly correct. This clip shows about the limit dynamics range should be. The peaks are high but not clipping. The lower general dialogue is still high enough to be heard, but it could be a little higher to keep it out of the mud. It is desirable to have dynamic range, but not too much nor too little. Too much goes from screaming to unheard whispers and too little just sounds flat all the time. Few producers seem to be smart enough to get the balance right.


This is a funky anomaly dating back to the "old days" of television where part of the radio frequency broadcast would be smashed so the pencil necks could sell a small sub channel to who knows where. Defective audio from those days typically has a wave form like this as the TV or VCR was "supposed" to automatically fix this. Inertia from moving speaker cones will sort of hide this, but the audio will still sound a little funny.


While on the subject of old, high fidelity systems just weren't around in the days of black and white video. This is actually a pretty clean capture, but it is lacking quite a bit in the way of high frequencies (as noted from the green bar). While the audio sounds ok, it isn't overly clear as higher frequencies make for clarity.


This is part of the reason I despise AC3 so much. The format isn't bad but the people who encode it are morons. This is a raw capture from digital cable and is typical of nearly all the channels and many DVD's I've tried to play back (some are even worse).

Here I start my rant. The dynamic range here is -10,000 to +10,000. True 16bit audio (as heard on CD's) has a range from -32,000 to +32,000. This encode is just above 14bit audio...and I'm greatly offended. These people (and many others) are wasting 2/3's of the audio bandwidth's clarity. The audio is also in the mud. This means more hiss from the signal and more hiss from having to crank up the amplifier so loud just to hear it.

NORMALIZE THE DAMN AUDIO BEFORE FINAL ENCODE. It's trivial do to and doesn't take that long. Why is this so hard for people to understand and do??? Do these people even understand how to be a professional and do their job??? (ok... breathing and popping a happy pill now...)

On a side note, there is a blip in this capture that has to be zeroed out first before the audio can be normalized. On a second note, the stupid digital cable box which insists on using compression will still play this audio too low, hit the blip, then nearly turn the audio off for a couple seconds until it recovers from the mathematical weighting. If the audio were full to begin with, the "turn off" would have never happened.


This is very similar to above but more zoomed in and the blip is far easier to see.


Coming back to frequencies, this is about what an audio spectrogram should look like. This one is still clipped a little at the top, but is generally pretty good for a compressed audio stream.

Also note the horizontal line at 16kHz. This is a high frequency buzzing noise somewhere in the original analog to digital capture. Unfortunately, these are surprisingly common. Thankfully they are usually very quiet and generally unheard.

Between the vertical peaks is some mild noise (in dark blues). If there were a lot of hiss and such, the darker background would be turning to yellow and red carpets in this example.

When archiving this to the MP3 format, 128kbit stereo is a common setting for 44.1kHz sampling and 160kbit stereo is a common setting for 48kHz sampling. These rates tend to chop everything above 16kHz using the default settings. Vorbis, MP4/AAC, and AC3 are more efficient at encoding high frequencies and could use a lower bit rate. If in doubt, encode the audio then decode it to compare the spectrogram to the original spectrogram. Make adjustments as necessary.


This probably came from an MP2 file as the MP2 format tends to chop everything above 16kHz. I sometimes see this in AC3 encodes which most likely came from an MP2 source (Video CD or some DVD's).


This is the same as above but brutally chopped much harder than normal.

Over the air radio broadcasts often look similar to this. Since broadcast TV audio is also a modified FM radio, it will look similar.


I've never seen frequency intensity stair stepping until I saw this one. I don't know what to make of it. Perhaps it was encoded multiple times using different settings?


This shows some stair stepping and higher peaks that happen with VBR encoding. Also note the blue specked pattern of low level hiss during the fade out.


Notice how this CODEC encodes less and less of the higher frequencies as the bit bucket runs out.


This one has the 16kHz buzz but it subtracts out of the signal. This also seems to be somewhat common.


This also has the subtracting 16kHz buzz along with a lot of hiss. This came from an analog cable source into a capture card.