Many law enforcement laboratories are equipped with at least one
sound spectrograph, although there are several types to choose from.
This machine plots the frequency of a complex sound according to
time and intensity. Its function is based on the idea that the
human voice is produced by a combination of physiological structures
and harmonics.
The vocal column begins in the vocal folds and ends at the lips.
The vocal folds function acoustically as a closed end so that the
vocal column becomes a closed-tube resonator. The tension of
the vocal folds determines the vibrational frequency. When a
sound is produced, those harmonics nearest the resonant frequency of
the vocal column increase in amplitude. If the shape of the
mouth, throat, or lips changes, the frequencies vary with the
change.
The sound spectrograph converts the sound of a voice into a
visual graphic display known as a voiceprint. The analog
spectrograph has four parts: a magnetic tape recorder unit, a tape
scanning device, a filter, and an electronic stylus that writes the
information onto electrically sensitive paper.
A high-quality tape is fastened to the scanning drum, which holds
a 2.5 segment of tape time. The process takes about eighty to
ninety seconds to complete. As the drum revolves, an
electronic filter is activated that allows only a certain band of
frequencies to get through to the recorder. These frequencies
are translated into electrical energy that gets recorded by the
stylus. As the process continues, the filter moves into
increasingly higher frequencies and the stylus records the intensity
levels of each defined range. The final print shows a pattern
of closely spaced lines that represent 2.5 seconds worth of all of
the distinguishable frequencies of that person's voice as it was
taped.
The horizontal axis on a voiceprint represents the parameter of
time, registering how high or low a voice is. The vertical
axis is the frequency. The degree of darkness within each
region on the graph illustrates the degree of intensity, or the
voice's volume.
Two kinds of prints can be made: bar prints, which are utilized for
identification, and contour prints, which help to file the prints in
a computer.
Recent developments include digital spectrographs that can be used
with a computer for enhanced comparison and measurement, but some
specialists still prefer the older analog model.
Comparisons are made between voice samples and when sufficient
similarity exists between one pattern and another, the voices are
believed to have a high probability of originating from the same
person. For forensic purposes, the voiceprint interpreter
needs a recording of the suspect's voice (e.g., from an interview)
to compare to the sample made in the context of a crime, such as an
obscene phone call or taped conversation. Other people's
voices, unrelated to the crime, are used for elimination factors
(points of dissimilarity).
Interpreters use two methods of identification:
Aural: listening to the voice on tape to compare single sounds
and series of sounds for similarities and discrepancies; the
examiner also listens for breath patterns, inflections, unusual
speech habits, and accents.
Visual: reading the voiceprints to compare their appearances.
First, the examiner evaluates the recording of the unknown
suspect, to make sure it has sufficient quality and clarity for
analysis. Then the examiner turns to the voices of the known
person to ensure that the recording has similar clarity. The
best test cases have the suspect repeat what was said on the
"unknown voice" tape, or at least include as many of the
same words as possible.
The aural and visual methods are combined to come up with one of
five conclusions:
- positive identification
- probable identification
- positive elimination
- probable elimination
- no decision.
The highest standard requires the identification of twenty speech
sounds that possess similarities. "Positive
elimination" derives from twenty or more differences, and the
rest fall on a spectrum in between.
Some critics of this technology claim that it has never been
adequately developed to prove that voiceprints are as individual as
fingerprints. However, those who work closely with it on a
regular basis insist that the spectrograph is highly accurate.
Tom Owen, who runs Owl Investigations, Inc.,
has thirty-five years of experience in the recording arts and is a
certified Voice Identification Examiner. He teaches at the New
York Institute of Forensic Audio and offers specialized courses on
audio and video analysis. He also consults with law
enforcement agencies around the world on specific cases, and for
more than twenty years has served as an expert witness in both
criminal and civil proceedings. His agency has a fully
equipped processing laboratory, which includes five different types
of spectrograph machines for voice identification and speech
enhancement.
With Michael McDermott, Owen has written an extensive article on the
history, methods, and forensic applications of voice technology, and
he takes on fifty to sixty cases each year. "It's not
uncommon," he says, "that at a murder scene or shooting,
you have a tape made from a 911 call where the victim might have
been calling for help, or else the person might have been on the
phone talking to a relative. Someone shoots the person, the
victim dies and the shooter doesn't realize that the machine was
recording. I would get that tape and see if the intruder said
anything before he shot the person. Sometimes we get results.
Then there are civil incidents, like someone calling to threaten
you. If you don't pay this money, he's going to damage your
car or kill your pet. You also have divorce proceedings where
recordings get made, and you have people who keep calling to say
something and then hanging up. We can analyze those
calls."
In fact, in one case, a murderer himself called the police to offer
the location of the body. He said that he was an acquaintance
and used another man's name. That man was eliminated and the
murderer identified through voiceprint analysis.
Owen uses the full range of spectrograph analysis, but he admits
that the technology could still be better. "You can't
accurately print all the 256 shades on the gray scale," he
says. "The printers have gotten better, but only the most
expensive ones really get the full range of resolution, and it's
often not worth the cost of such a machine."
Recently he completed a study on twenty-five female voices of
varying races and ages, doing a one-to-one analysis to determine the
degree of error. The results were striking: "When
you're comparing a known and an unknown voice using a verbatim
exemplar [the samples contain the same verbal communication], there
are no errors. That's ninety-nine percent of what we do today.
We don't try to pick a voice out of a pack."
Because voiceprints are generally used in cases where the accuracy
rate is so high, Owen is confident that they make a real
contribution to the legal process. However, the history of
admissibility of voiceprints has mirrored what has happened in court
with other technologies in their early stages. Courts are
conservative and the sound spectrograph has had to prove itself.
|