home | et40 - digital audio fundamentals -2014 edition prev | next

David Javelosa

Copyright © 2003 - 2014 David Javelosa unless otherwise stated.

week 01 - introduction to digital audio and overview

What is digital audio?

We now come to the other side of the digital realm, perhaps the clearest definition of what audio can be. Digital audio or "sampling" technology, once again, differs from MIDI as a tape recording does from a player piano roll, as a photograph of a painting does from a "paint-by-numbers" sketch. If MIDI is a precise machine-language instruction for reproducing a musical performance, then a digital audio sample is an exact audio reproduction of that performance containing every acoustical nuance to the highest resolution available.

That availability of resolution is exactly the main issue in dealing with digital audio technology. As with graphics and photography, resolution means quality. How well does the digital grid represent a curved surface before getting the "jaggies"? How few colors can you get away with on a screen before it starts looking like a cartoon with the tint cranked up all the way? These are some of the visual analogies used to describe how sound and music is handled in the digital domain. We are all used to the wonderfully crisp, clear sound of an audio CD on our home stereo, unaware of how much data resolution is required to keep it just in the perceptual range of sounding good.

The degradation in dynamic, or "volume" resolution, as well as the lowering of sample rate or "frequency" resolution in the audio world results in just one thing: noise. Raising these levels of resolution, as in graphic and photographic qualities, results in another thing: size. Unfortunately, in producing for digital media size IS everything, everything you want to avoid as much as possible and for as long as possible. Data equals size and audio data is BIG data. In the politics of assembling a game or software-based product there is always shake-down for delivery real estate. Being aware of that battle, along with the issues and limitations in down-sizing audio data, is the main task of the digital sound designer, the software audio engineer, and the interactive media composer.

Describing Audio In A Grid

Like all things mathematical, digital audio is described in the X-Y axis of a grid, with X axis being time and the Y axis being the amplitude. Frequency of sound is recorded in time by a sample rate of twice the desired frequency to be captured, typically 44 kHz which is twice that of the normal limit of human hearing, around 22 kHz. Data is conserved by reducing the sample rate and thereby reducing the sound quality. The amplitude data is stored in 16 bit values but can also conserve data by capturing the data as an 8 bit value, also greatly reducing the quality. The more professional digital audio editing tools allow a graphic interface showing the data as wave shapes with cut and paste functions.

Fig. 09.01 The X-Y Grid of Digital Audio

Referring to Fig. 09.01, the vertical axis representing dynamics can be thought of as a volume knob that is quantized in incremental clicks like some high-end stereo amplifiers. I always found these annoying myself for the very reason that you could not set it exactly in between the set volumes of one click and another. At any rate, imagine describing the dynamic activity of a sound by manually moving this knob from low to high. The more clicks (or the lack thereof) the smoother the dynamic activity. The fewer the clicks, the clunkier it is to twiddle. Now imagine this being automated and moving as fast as audio normally travels. The clicks, if there are few enough, will become quite evident in a stair-stepping effect contributing residual noise to the over all effect.

16 bit audio, or a dynamic range that is represented by 16,384 clicks of volume, is a high enough resolution for us not to be able to hear the changes as they happen from one split-second to another. 8 bit audio, which would take up half the data space, actually delivers a fraction of the resolution at 256 clicks. In the multiplicative world of hexadecimal logic, bit values increase algorithmically resulting in the more you give, the more you get. The point of all this being, if you are dealing with lower end hardware that only can play 8 bits (getting more rare), the quality will only be that of somewhat noisy radio, even if converted from a higher quality.

If you have the hardware playback capability AND the storage space for twice the data at a 16 bit depth, you gain quite a bit more than twice the resolution. Common practices include sampling at the highest bit rate available and digitally reducing the bit depth as it is necessary per application. It is also suggested that you "normalize" your data, that is utilizing the entire spectrum of the dynamics for the loudest point in the audio file. This allows your sound to take advantage of all available resolution with the actual playback volume adjusted from the playback system. At this stage of the game, practically all new desk-top systems and dedicated game players are supporting 16 bit sound but there are still enough applications for the noisier 8 bit stuff.

Fig. 09.02 Manipulating Dynamic Ranges For Bit Resolution Reduction

On the horizontal axis of our grid we have that ole' demon Time. With the clocks of most computers well past a hundred mega-herz (millions of clicks per second) it would seem that there is sure plenty of resolution for the frequencies that we can hear. As mentioned above, the normal human ear will only hear up to 22 kHz frequency, whether or not there are perceptual artifacts beyond that. The rule of sampling at twice that rate is to avoid the aliasing of the two frequencies, the sample rate and that which is being sampled. A simple analogy would be the effect of video taping a computer screen. The video camera is capturing a series of still frames at one rate and the computer is refreshing its screen at another. The two are basically in the same ball park but have no synchronous ties between each other. The result is a distortion of lines moving down the screen as the two rates phase in and out of each other.

ProTools, Multi-Track And Hybrid-Tools

Leading the pack, Digidesign came out with ProTools in 1992. ProTools is the multi-track version of the older SoundDesigner with an open-ended design that once again makes Digidesign products that which all others, on both platforms, are to be compared. ProTools takes the graphic interface of SoundDesigner and creates a multi-level time line of several sound files that can be cross-faded, edited and repeated in a non-redundant and non-destructive environment. Later features include a virtual mixer and the ability to plug-in several DSP functions such as reverb and 3D spatialization. Originally the system was based on the ProTools hardware (a couple of different compatible models) and a hard drive bus based on the faster SCSI 2 protocol. This acceleration allows for the streaming of multiple audio channels simultaneously from the hard drive. Currently the hardware has upgraded to Firewire or MLAN connection allowing for a faster and smoother transfer of a broader stream of data. Also ProTools is available in a software only version; meaning hardware independent.

In addition to the ProTools hardware and SCSI accelerator, there is provisions for hardware co-processing dedicated to supporting the higher-end DSP functions. These add-on cards are known as "DSP farms". This array of recording, editing, processing and mixing all in the digital domain are all part of Digidesign's TDM specification (One rep was reported to say that this stood for "totally digital, man). With the advent of G3 or better processors on the Mac, much of this DSP processing is now handled entirely on the main CPU.

Other developments in this area have included OSC's Deck for the Macintosh that not only supports the lower end AudioMedia multi-channel card but will drive multiple channels of audio on the native 16 bit hardware built in to the new Power PC Macs. The number of channels available in this scenario is dependent on the power of the main CPU. This trend of "native signal processing" has led to an increased development of hardware independent software in both the 2 channel and multi-channel digital audio editing market. One example of this is Digidesign's Session, originally released as a Windows based multi-track system with it's own dedicated hardware. With the competition of lower cost sound cards and faster PC processors supporting other multi-track programs, Session is now a software only application that can run on a number of different hardware configurations.

Multi-track digital audio is now becoming a common tool for composing as well as sound design. A new generation of musician is emerging that has never recorded or edited a MIDI file, but rather assembles their music as individual files and samples in a multi-track editor. This is also becoming the standard environment for industrial and commercial audio post-production, bringing all the elements of dialog, sound effects and score together, in a non-linear, non-destructive environment linked to video playback. Other variations on this technology are featured in the growing area of Windows-based programs that include both multi-track digital audio and MIDI sequencing. Tools such as Logic, CueBase, and several others are moving the center of gravity to Windows as the platform of choice for the digital audio composer. Leading examples of this technology for the Mac can be found in Opcode's Studio Vision and Mark of the Unicorn's Digital Performer.

There is a debate going on, discussing if digital audio and the various sample based technologies will replace MIDI as the way sound and music is done in digital entertainment products. By taking a close look at the features and funtionalities of both, it is hard to see one existing without the other. Both have there strengths and weaknesses as far as delivery, feasibility and performance. Both are the mandatory tools for the computer musician involved in developing for interactive entertainment. Of course with the advent of MP3 compression and similar technologies, the use of digital audio is becoming more and more integrated into all forms of digital media, from hand held devices to the Internet.

Reading Assignment

Review: Pro Tools 8 for Macintosh & Windows OR Complete Pro Tools Handbook

  • Software Basics
  • The Mix and Edit Windows

Copyright © 2003-2014 David Javelosa