Anatomy of a Waveform

What we perceive is sound is a vibration. The motion of air on our eardrums is translated by our brain as speech, m usic or just noise. Larger motions of air are perceived as louder, faster motions of air are perceived as higher pitched.

Audio engineers call these vibrations waveforms. In its simplest form, a waveform looks something like this:

This is what musicians call a pure tone, and those who remember their mathematics will recognise it as a sine wave. The waveform can be defined by two characteristics - the height of the peaks, known as the amplitude (measured in decibels (dB)) and the rate at which the peaks occur, known as the frequency (measured in Hertz (Hz)). From the listening perspective, the amplitude effects how loud the sound is and the frequency affects its pitch.

Note: contrary to what you may hear in some reports about noise levels, dB is not an absolute value - 100dB doesn't actually mean anything in itself. Instead decibels are a relative measurement such that if one thing is 3dB louder than another it implies a doubling of absolute volume. You will, however, sometimes see measurements in either dBU or dBV - these are decibels relative to the maximum level of a device or to the positive voltage rail respectively.

Most sounds aren't quite as simple as a pure tone. They look more complex, like this:

However, even something as complex as is can actually be viewed as lots of simple waveforms added together. To understand these waveforms we speak in terms of frequency response which is the range of frequencies that can be represented (from lowest to highest) and the dynamic range which is the difference (in decibels) between the loudest and quietest sounds.

As we said back at the beginning, sound is basically vibration. This means that it can be recorded by transferring the vibration to some kind of storage device. This is known as analogue recording (analog if you're American). The classic example of this is the phonograph or record - ignoring the electrical side of things, essentially all this does is to record the vibrations by carving the waveform into a sheet of plastic. A record groove, viewed close-up and in cross-section, would look much like our waveform above.

Note: during the pioneering days of record manufacture it was realised that the pressing process was far from perfect. The fine detail of high frequency sounds was often lost and, even when it was correctly recorded, was much more prone to wear. The solution to this was to alter the sound so that the higher frequencies were recorded at a higher amplitude than the lower ones. This is why you need a pre-amplifier to listen to a record player and why records sound so tinny if played without one.

Digital recording is completely different to its analogue counterpart. In digital recording the waveform is turned into an electrical voltage which is then sampled (essentially measured) at a fixed speed. The number of samples taken each second is known as the sample rate and the dynamic range is restricted by the precision of the numbers used for sampling. Because of the way in which numbers are stored in a computer, we refer to this as the number of bits per sample. A 16-bit sample can represent sample values from -32768 to +32767, which gives a dynamic range of 20 * log10(32767) - approximately 90dB. A sample rate of 44KHz has a frequency response of 0 - 22KHz. The sampling technique most frequently used in modern audio equipment is known as Pulse Code Modulation or PCM. You may also hear of Delta Sigma Modulation, which works on the principle of using one bit and a very high sample rate, the theory being that this is more consistent from an electrical viewpoint. DSM is the format used on Sony's SA-CD discs.

The advantages of digital and analogue recordings are something of a balancing act: analogue recordings can, in theory, store an absolutely perfect copy of a performance. In practice, this is limited by the quality and durability of the recording medium. Digital recordings, meanwhile, are guaranteed to be consistent and do not, like their analogue counterparts, deteriorate over time. They also have the advantage of being accessible for computer processing and storage. However, digital recordings are limited by the choices made for sample rate and bits per sample, which means that frequencies above the maximum supported frequency (known as the Nyquist frequency and equal to half the sample rate) are at best lost or at worst missampled so that they affect the lower frequencies. Similarly, there are limitations in dynamic range which mean that very quiet sounds can be lost entirely and very loud ones can be cut off sharply - the effect known as clipping. For all these limitations, however, digital has proven to be the most versatile recording format for the modern world.

Interesting Note: The audio specification for CD's were allegedly chosen in order to make sure that all 74 minutes of Beethoven's Ninth symphony could be stored on one CD. Since human beings can't generally hear sounds much above 15KHz, this was considered to be an acceptable format. Audio purists, however, claim that sounds beyond that which can be stored on CD affect alpha waves in the brain and give good vibrations - making music happier. This is why some people insist on sticking with vinyl and why the newer DVD-A and SA-CD formats support audio frequencies up to 48KHz - which even a dog can't hear.

Whether stored in analogue or digital format, audio is usually recorded separately for each speaker. These separate streams of audio are known as channels. A mono source uses one channel, stereo uses two and surround can use anything from 4 (for Dolby Pro-logic) to 8 (for MPEG2 7.1) Sounds can be made to appear to come from different positions by playing the same sound at different amplitudes from the different speakers - in a stereo system, if we play a sound only on the left it will appear to come from the left; if we play it only on the right, it will appear to come from the right and if we play it equally from each speaker, the sound will appear to come from halfway between them.

Interesting Note: the effect that allows the use of two speakers to give a sense of position is actually an audible illusion. If the two speakers play the same sound at more or less the same time, then the human ear interprets them as a single sound. If, however, the sounds are played more than 10ms apart, you will hear them as two distinct sounds.

In the internet age an important aspect of digital audio has become the amount of bandwidth it takes to transmit in order to play audio in real-time. For this reason, we usually talk about the bitrate of an audio file or stream. For uncompressed audio this can be simply calculated:

Bitrate = sample rate * bits per sample * channels

For CD audio this is approximately 1440Kbps (kilobits per second).

Tip: when recording audio to your computer it's important that you make sure that the audio is at its best when it reaches your soundcard. If your recordings are quiet, it is best to turn up the volume at the source as far as you can without clipping. Otherwise, although you can amplify the sound on the PC later, the dynamic range will already be lost.

Anatomy of a Waveform

Related Articles