
What we perceive is sound is a vibration. The motion of air on our eardrums is translated by our brain as speech, m usic or just noise. Larger motions of air are perceived as louder, faster motions of air are perceived as higher pitched.
Audio engineers call these vibrations waveforms. In its simplest form, a waveform looks something like this:

This is what musicians call a pure tone, and those who remember their mathematics will recognise it as a sine wave. The waveform can be defined by two characteristics - the height of the peaks, known as the amplitude (measured in decibels (dB)) and the rate at which the peaks occur, known as the frequency (measured in Hertz (Hz)). From the listening perspective, the amplitude effects how loud the sound is and the frequency affects its pitch.
Most sounds aren't quite as simple as a pure tone. They look more complex, like this:

However, even something as complex as is can actually be viewed as lots of simple waveforms added together. To understand these waveforms we speak in terms of frequency response which is the range of frequencies that can be represented (from lowest to highest) and the dynamic range which is the difference (in decibels) between the loudest and quietest sounds.
As we said back at the beginning, sound is basically vibration. This means that it can be recorded by transferring the vibration to some kind of storage device. This is known as analogue recording (analog if you're American). The classic example of this is the phonograph or record - ignoring the electrical side of things, essentially all this does is to record the vibrations by carving the waveform into a sheet of plastic. A record groove, viewed close-up and in cross-section, would look much like our waveform above.
Digital recording is completely different to its analogue counterpart. In digital recording the waveform is turned into an electrical voltage which is then sampled (essentially measured) at a fixed speed. The number of samples taken each second is known as the sample rate and the dynamic range is restricted by the precision of the numbers used for sampling. Because of the way in which numbers are stored in a computer, we refer to this as the number of bits per sample. A 16-bit sample can represent sample values from -32768 to +32767, which gives a dynamic range of 20 * log10(32767) - approximately 90dB. A sample rate of 44KHz has a frequency response of 0 - 22KHz. The sampling technique most frequently used in modern audio equipment is known as Pulse Code Modulation or PCM. You may also hear of Delta Sigma Modulation, which works on the principle of using one bit and a very high sample rate, the theory being that this is more consistent from an electrical viewpoint. DSM is the format used on Sony's SA-CD discs.
The advantages of digital and analogue recordings are something of a balancing act: analogue recordings can, in theory, store an absolutely perfect copy of a performance. In practice, this is limited by the quality and durability of the recording medium. Digital recordings, meanwhile, are guaranteed to be consistent and do not, like their analogue counterparts, deteriorate over time. They also have the advantage of being accessible for computer processing and storage. However, digital recordings are limited by the choices made for sample rate and bits per sample, which means that frequencies above the maximum supported frequency (known as the Nyquist frequency and equal to half the sample rate) are at best lost or at worst missampled so that they affect the lower frequencies. Similarly, there are limitations in dynamic range which mean that very quiet sounds can be lost entirely and very loud ones can be cut off sharply - the effect known as clipping. For all these limitations, however, digital has proven to be the most versatile recording format for the modern world.
Whether stored in analogue or digital format, audio is usually recorded separately for each speaker. These separate streams of audio are known as channels. A mono source uses one channel, stereo uses two and surround can use anything from 4 (for Dolby Pro-logic) to 8 (for MPEG2 7.1) Sounds can be made to appear to come from different positions by playing the same sound at different amplitudes from the different speakers - in a stereo system, if we play a sound only on the left it will appear to come from the left; if we play it only on the right, it will appear to come from the right and if we play it equally from each speaker, the sound will appear to come from halfway between them.
In the internet age an important aspect of digital audio has become the amount of bandwidth it takes to transmit in order to play audio in real-time. For this reason, we usually talk about the bitrate of an audio file or stream. For uncompressed audio this can be simply calculated:
Bitrate = sample rate * bits per sample * channels
For CD audio this is approximately 1440Kbps (kilobits per second).