MP3 Encoding Test Page

Setting up a good encode involves understanding how the audio works and why it works. This is a moderately in depth study of MP3 audio encoding for final distribution focusing on the LAME encoder (one of the best free ones for MP3).

The concepts mentioned here will also apply to other CODEC formats (m4a, vorbis, mp2, flac). The command syntax will be different for the others, but the concepts will remain the same. For mass distribution, pick the CODEC(s) that's easiest played by EVERYONE. This is still generally MP3 for world wide. Sometimes offering mulitple CODEC's and encoding levels is nice for the end consumer if the servers have the space. Keep in mind that not everyone has the technical skill set to install plug-in's and play some of the more exotic CODEC's. Most portable music players can only handle MP3 and WAV files.

While this study is mainly geared towards distribution of voice recording, many of these steps and concepts apply equally as well to music.

Nearly all of the techniques and concepts here can be applied to live broadcasts, too. The main issue is that the difficulty goes up for real time. Just because something gets hard doesn't mean that the rules should be broken.

Following these steps and recommendations may seem long and like a lot of work (which they can be), but by understanding and following, it will be very hard to produce something that is trash (unless the source audio is trash). These steps are what separates the amateurs from the professionals.


Example Test Files


Quick Reference: Program Install

main site:  http://lame.sf.net
windoze site:  http://mitiok.cjb.net/ or http://jthz.com/~lame/

Linux Compiling:
tar zxvf lame*.tgz
./configure --build=i686 --prefix=/usr --disable-debug --disable-efence --without-dmalloc
make
make install

windoze install:
make a directory.
copy lame.exe and lame_enc.dll to it.
open a DOS window and execute the command accordingly.

Quick Reference: Encoding

Better than Radio Quality but lower than CD quality:
lame --abr 192 -q0 -p -m s sample.wav sample.mp3

Slightly better than Radio Quality (or Radio Quality if the sample rate is 48kHz):
lame --abr 160 -q0 -p -m s sample.wav sample.mp3

Radio Quality:  (common industry encoding level)
lame --abr 128 -q0 -p -m s sample.wav sample.mp3

Modem Rate:  (roughly AM radio quality, suitable for speech)
lame --resample 24 --abr 32 -B 48 -q0 -p -m m --lowpass 8 sample.wav sample.mp3
With ID3 tags:
lame --resample 24 --abr 32 -B 48 -q0 -p -m m --lowpass 8 --add-id3v2 --tt "Sermon Title" --ta "Preacher Dude" --tl "My Church" --ty "2004" --tc "2004-06-18 Morning Service" test-sermon.wav test-sermon.mp3


Batch Encoding Under Unix:
find /path/to/dir/*.wav -type f | xargs -l1 --replace=FILENAME nice lame --abr 128 -q0 -p -m s FILENAME FILENAME.mp3

Constant bit rate for AVI muxing (with mediocre resampling):
lame --resample 44.1 --cbr -b 128 -q0 -p -m s sample.wav sample.mp3

Medium quality resample (to 48kHz):
sox -VS infile.wav -r 48000 outfile.wav resample -ql

High quality resample (to 48kHz):
sox -VS infile.wav -r 48000 outfile.wav polyphase

Light compression for variable music (adjust for noise floor, 61-60):
sox -VS infile.wav outfile.wav compand 0.05,5 -61,-61,-60,-15,-15,-15,0,0 0 0 0.05

MP2 file decoding:
lame -q0 --decode file1.mp2 file1.wav

MP4 quick 96kbit decoding and encoding:
faad --outfile=myfile.out.wav myfile.m4a
faac -q 100 -c 22000 -w -b 96 myfile.wav

OGG/Vorbis quick decoding and encoding:
oggdec --output=myfile.out.wav myfile.ogg
oggenc --bitrate=64 --output=myfile.ogg myfile.wav

FLAC quick decoding and encoding:
flac --decode --decode-through-errors --force --output-name=myfile.out.wav myfile.flac
flac -8 --output-name=myfile.flac myfile.wav


Encoding Explained

Properly prepare the raw audio. This typically involves normalizing (both channels), compression, manual amplitude envelopes, and so on. (A free program that can do most of this is at http://audacity.sf.net.) The wave file should be consistently hitting very near 100% (0db), essentially being at full volume. Any of the "quiet" sections should not be below -10db. From there the audio starts getting in the mud of both the transmission format and the sound system used to play it back. If most of the audio is below -10db with only periodic 0db sections, it needs to have the quiet sections normalized separately. I also get highly irritated when listening to something that gets so soft I can't hear it, so I turn up the volume, then it gets so loud it makes me deaf and damages my speakers. Something like that is just flat out sloppy, incompetant, and it hurts. It's a great way of being negative towards the target audience.

Don't be like AC3 files where everything is encoded in the mud. Prepare the encode for 100% volume and let the end user control how loud it is. That's the end user's job to begin with. Keeping all encodes at 100% also keeps volume levels consistent between tracks/files. It's really amateurish to have one track at 100%, the following track at 60%, then the next track at 80%. As you can see, this creates problems when the three of these are played back to back.

By this point some will argue that the final audio should always have some headroom and never be at 100% volume. This was true back for the old analog broadcast days. Analog pre-amps, amps, recorders, players, transmitters, repeaters, and receivers often had poor settings and tolerances and needed the extra headroom to avoid clipping. This is not true for digital. By definition, digital transmission signals are either "on" or "off" and are precisely defined. Within the digital transmission is the encoded digital audio (essentially a data layer above the hardware layer). Since all this is digital, the receiver gets an exact copy of what gets sent out from the source (barring any transmission corruption). Even if there is corruption, volume level will not matter as the digital packet will get thoroughly trashed and scrambled. It will play back as a sudden and loud hiss if the player doesn't catch it and mute it.

If the audio contains blips, pops, coughs, or sneezes, those can be cut out if the source does not need to be synchronized with something else (like video). If the source does need to be synchronized, zero out or silence the blip doing a short fade out/fade in at the edges of the forced silence.

If there is supposed to be a "shot" or "cracking" sound that pegs the wave, use the manual compression trick to bring it down. Zoom in to the sound, highlight the highest parts of the wave, then do something like a -3db drop on the highlight. This will often produce better normalization since the excessive peaks are removed.

If the entire wave file will be looped, BE SURE to fade in and fade out the beginning and end of the wave (this is a good idea regardless). The fade doesn't have to be long. This will prevent popping sounds when the wave loops.

If the original audio contains a lot of "hiss" (like an audio cassette tape), use high quality noise subtraction to remove it. Remember that noise is still sound and the encoder will waste bits trying to encode the noise along with the audio content. Excessive hissing noise often produces noticable and distracting high frequency aliasing on low bit rate encodes.

Side Note Rant. If audio is left at such a low level as to be in the mud, then that audio is actually using less than the normal 16bits commonly available (COUGH, COUGH, TV Broadcasts, MP2, AC3, GASP). It is very possible (and unfortunately common) to have 14bit audio in a 16bit container. How do you know what it really is? If the audio editor shows the peaks hitting sample level 8,191 or less, then the audio only needs 14bits (as in 2^14) to represent itself. The same goes for 15bit audio at 16,383 peaks. Most TV and AC3 audio I've (unfortunately) seen averages around 10,000 on the peaks putting it in the low 15bit range. This is roughly 1/3 the potential clarity of 16bit audio (wannabe audiophiles should freak at this). This is a massive waste of potential clarity (as digital audio is OFTEN SOLD TO BE) and significantly increases the noise floor as the analog amplifiers for the speakers has to be cranked up so high. If I went to sell to a reasonable person a simple solution that would triple their audio clarity for essentially nothing in cost, don't you think they would rabidly jump on it??? It's not the job of poorly standardized players to do what the lazy engineer should have properly done in the first place. To freak out people even more, even DDD recorded CD's have a very audible noise floor when soft recordings are brought up to more normal levels (try using the sox compression command above on a classical CD). (rant mode off)

MP3 encoding. NEVER, EVER use "joint stereo" to encode an MP3 (or even another CODEC). This mode tries to mix common parts of both audio channels to mono and ends up creating more of a mess than saving space (aliasing and phasing problems). Use "stereo" mode for encoding. This mode will share bandwidth between channels if one channel is lower and the other channel needs more bits. For hard bit rate limiting for both channels, use "dual" mode. Each channel will get a fixed half of the encode rate.

Encoding bit rate is dictated by distribution medium. If the encode will go out over Internet, it must be lower to account for bandwidth transfer limitations. A modem is fixed at a very limited rate while most broadband connections are several times that rate. If the encode is distributed on CDROM, it could be quite high. If many hours are supposed to fit on the single CDROM, then the bit rate may have to be lowered so all the files will fit. If a lot of media needs to be served off of an Internet site with limited space, then the bit rate may have to be lowered.

Lower bit rates will transparently do more high frequency chopping than higher bit rates (frequency mowing in the spectral view). The automated frequency cuts may or may not work cleanly, and a manual setting may have to be used. Frequency cuts are used to limit the amount of data going into the encoded file. A certain bit rate can only hold so much data. If too much is smashed into it, it will have bad aliasing and artifacting and generally be of very poor quality. Most videos (AVI files) typically are encoded at 128kbit or 160kbit for audio, but don't be surprised at other rates. I've heard digital satellite TV with encoding rates that sounded like 96kbit (this is absurd and insulting). Radio quality MP3's are considered to be 128kbit. This rate is also the "standard" encoding rate for the industry. High quality audio encodes typically start at 192kbit or higher. 320kbit is the current limit for MP3 encodes and is typically 1/5 the size of raw CD audio. Keep in mind that MP3 is a "lossy" compression format and will ALWAYS cut something out or change the original wave file (even at 320kbit encodes).

Constant Bit Rate (CBR) Vs. Average Bit Rate (ABR) Vs. Variable Bit Rate (VBR). Constant Bit Rate (CBR) is as it sounds and forces the entire encode to one locked bit rate regardless of signal complexity. This is required for inclusion with normal video files, but does not always produce the best quailty. The key with constant is that it is easily predictable and will stay in sync.

Average Bit Rate (ABR) is the highest quality and will encode some pieces that need it at higher levels and other lesser pieces at lower levels. This can allow for increased clarity where it is needed. The encoder will keep track of what has changed and make the ending file size very close to the average number encoded at.

Variable Bit Rate (VBR) is very similar to Average Bit Rate except that it is more convoluted and often has problems. Variable Bit Rate can be encoded along with video files but must have a special synchronization table created that indexes the audio stream. Even with this table, it can still have synchronization troubles so it should never be used.

ABR and VBR have encoding rate limiting options on the high and low end (LAME -b and -B options). On lower bit rate encodes, these numbers shouldn't be too high or too low (keep them at a nice balance). Some portable MP3 players may not be able to handle the lowest of the bit rate options (where the -b option comes in). If someone is streaming by modem, a suddenly high bit rate would cause the audio to stop while the modem struggled to catch up (where the -B option comes in). Obviously pure silence will allow an encoding rate of 0, and this is fine. An encoding rate too high will produce sudden "bright" or "clear" spots that may be distracting.

On a side note, Average Bit Rate functions can display a histogram of encoding rates used as it encodes. This is useful in determining if a bit rate is suitable for a particular source file. If the histogram shows a small bell curve, then the encoding rate is good (there is working space above and below). If the histogram shows a smashed curve, something is wrong and encoding methods need to be reevaluated.

Quality level of the encode. Nearly all the time you should use -q0 (best) or -q1 (next to best) for the encode. This forces the encoding program to take extra time to make sure what it is producing is as accurate as possible to the original. This will often help cut out aliasing and various artificts (lowers distortion). Some people will try to sacrifice quality for speed. This is generally a terrible idea and falls under the Moron Multiplication Law: If a person produces a sloppy encode and distributes it to 1000 people, those 1000 people will either be forced to fix it, find something else, or just live with the incompetance. They each have to spend the same amount of time fixing it as the original encoder would have. This is a waste of time multiplied out to a collectively grand scale. On the flip side, if a person produces a superior encode and distributes it to 1000 people, those people will enjoy it without having to be frustrated or waste their time. Consider that very carefully and be excellent in your encodes.

Protecting the bit stream. The -p option adds CRC error detection to the encode. Should the encode become corrupted by some means, the player can detect this and handle it accordingly. If the player is intelligent, it will skip over it instead of playing sudden trash. Some people argue that this takes up too much space in the bit stream and reduces encode quality. In reality the CRC data takes up less than 0.01% of the data packet and greatly increases playback capabilities during bad times or corruption.

Stereo Vs. Mono (and modems). For any high bit rate encode, stereo should be used almost without question (unless the source audio is mono). For a 128kbit encode, each track will get 64kbits of bandwidth (half each). This means a mono radio quality encode requires 64kbit of audio. For slow Internet connections (modems), stereo provides more data than the available bandwidth allows. Many people have trouble understanding this, but if you squash too much into a limited bit stream, it will sound BAD, DISTORTED, and HARD TO UNDERSTAND. There is no exception to this rule. Part of the way of transferring clean audio over modem rates is to convert it to mono. This essentially cuts the original data stream in half, which is a huge savings. I would by far rather hear a clean mono file than a smashed stereo file that is aliased and artifacted to the point where it is beyond distracting and difficult to understand. Many people listening to an Internet broadcast will be doing so over mediocre tiny PC speakers and won't be able to tell stereo from mono to begin with. Many other people often listen in a car, bus, or train on the way to work and also will not be able to tell the difference.

Sampling Rates of 44.1kHz Vs. 48kHz. Many people are under the mistaken impression that encoding 48kHz audio into a 128kbit MP3 will produce higher quality as opposed to the usual 44.1kHz sampling rate. This is flat out wrong as the container (the 128kbit sampling rate) stays the same size as the amount of data increases. This means that the encoder has to smash even harder to fit it all in. 48kHz audio is 8% larger than 44.1kHz audio. 8% is a significant size increase when it comes to MP3 encoding. Most humans can't hear past 20kHz (44.1kHz sampling) anyways. The only time when the higher sampling rates are really useful is with very low level and quiet stuff. If you are encoding a lot of this, you need to go re-read my clean up section. Also, the really quiet stuff still has a potential to be cut and dropped by the encoder depending on the settings (this saves space by dropping stuff that cannot be heard). MP3 is lossy and may also run the risk of cutting extra high frequencies to force the higher sampling rate into fixed encoding rate (less quality and more distortion). Think of all this as trying to put 8% more stuff into a cardboard box without breaking it.

Sampling and frequency cutting for low bit rates. The basic human voice range is about 100-6000Hz. This is a fairly small range in the spectrum and telephones tend to cut this even more ("Can you hear me now?" - "Yes." ... "Can you understand me?" - "Absolutely not"). The full human voice range is about 50-13000Hz. The Nyquist Theorem states that for a base sampling rate you should use two times the maximum frequency that you will be recording. For a maximum voice frequency of 13kHz, this means the sampling rate can be 26kHz (as opposed to 44.1kHz on a CD). For MP3 voice encoding, this is a significant reduction in data and can be an actual increase in clarity because of that. It may seem odd for low bit rates that a reduction is an actual increase in quality, but if you think about it from the data stream point of view, it makes perfect sense.

When sampling rates are changed, high frequency aliasing and artifacting may take place at the cutoff point. This is because the encode wasn't given a high enough bit rate to handle the extra audio data. For non-modem distributions just increase the bit rate and it should go away. For modem and low bandwidth distributions, this becomes a little more of a problem. The best way to handle it is to use a frequency pass filter and manually cut out the problem frequencies in question. You want to cut out the high frequency trash but still have it clear at the same time. This often takes some guess work and a few tries to get something that is acceptable to listen to. Remember that added noise into the encode DOES NOT increase clarity. Also keep in mind that aliasing and artifacting takes away needed bits (as trash noise is still sound) that could be better used on real audio data. This comes back to that "less is often more" thing.

Decode the final MP3 to see what is really happening. This is often a good idea to make sure real data agrees with what you think you are hearing. Remember that ears get fatigued and can play tricks. The first thing to check is the amplitude to make sure it is still at full volume. This is rarely a problem. The best way to check an encode is to use a high resolution spectrogram to see what is really happening. This will show what frequencies got cut. When comparing spectrograms of the original wave to the MP3, you will initially be shocked. Keep in mind that MP3 is a lossy format and there really isn't any other way to fit audio over slow connections. Another thing to remember when checking the encode is to make sure that your speakers are properly balanced and can accurately reproduce the full frequency spectrum. Tiny and mediocre PC speakers are not the way to go here. You might as well verify it over the telephone. Big speakers that cannot accurately balance high and low frequencies are just as bad. Headphones are no exception to any of this. If you just need to hear sound out of something, cheap speakers can be perfectly adequate. If you need to do real production work, get real speakers or headphones and learn how to use them. This comes back to that Moron Multiplication Law mentioned above: "It sounds great on my dinky speakers, why do yours have such a problem?"

Dedicated hardware MP3 players (MP3 players that are not in a computer). These are often dominated by hardware manufacturers who are clueless and should be banned from their industry (read the Moron Multiplication Law above). While some manufacturers actually care and produce a very good product, many do not even have a clue about how to follow an established industry standard. All hardware players are NOT created equal. While a hardware player should play a particular MP3 encode, these players are far less flexible when compared to a software player on a computer. If the target market with these defective players is large enough, you must adapt so that the encode will play. In reality, this may or may not be possible. There are extended settings to help force "industry compatibilty" in LAME. Depending on the option, these settings may or may not work very well with the other encoding options. Each encode will have to be tested out on a case by case basis to determine viability. Be warned that forcing a compatiblity mode will often lower overall quality. This may not be desirable but may have to be traded off.

Option to force standard frame sizes:
--strictly-enforce-ISO
Option for constant bit rate (instead of average):
--cbr -b 128
Option for stereo encodes to force dual channel:
change "-m s" to "-m d"

File Naming for Distribution and Archival


The best file naming is something sortable by location, date, who did it, title, encoding bit rate, and CODEC extension. The thing to remember is to start with the least specific and most general piece of information and end with the most specific.

If a person has a lot of recordings, it may be desirable to create a directory using that person's name and moving all those files into there. This creates a clean grouping for organization.

If the files are going to be distributed by Internet, stick with web friendly characters. This includes alpha, numeric, dash, underscore, and dot. This does NOT include spaces, quotes, slashes, parenthesis, brackets, squigglies, or anything else above the number keys. Web friendly characters will prevent browser translation into the hard to read hex garbage like this: "often%20seen%20like%20this".

Example:

Somewhere_2004-07-11_SomeoneImportant_Title.128k.mp3
Somewhere_2004-07-11_SomeoneImportant_Title.32k.mp3
Somewhere_2004-07-18_SomeoneElse_Title.128k.mp3
Somewhere_2004-07-18_SomeoneElse_Title.32k.mp3

This will allow for hundreds of files to be easily sorted and viewed without any extra effort. This also allows for easier searching as all the names are standardized. For archival on CDROM, keep the file names to 64 characters or less or they might be chopped depending on the file system.