How Audio Forensics Reveals Voices' Secrets
Audio forensics can check if a recording has been edited, pull up background conversations and ID people by their voices.
Some of the hottest pieces of evidence in the Trayvon Martin case are the 911 calls neighbors made during the March 26 confrontation between Martin and George Zimmerman, who fatally shot him. The Orlando Sentinel asked an audio expert to analyze the recording of a call made to police to try to determine whether a scream in the background came from Zimmerman, who claimed he had fired in self-defense, or from Martin.
Audio forensics is one of the newest branches of the old science of analyzing crime scenes. It depends on a mix of high-tech software and human judgment. How exactly does the latest technology help analysts sharpen unintelligible recordings and peg voices to people?
Identifying people by their voices
According to two practicing audio forensics experts whom InnovationNewsDaily contacted, an audio examiner’s job involves three parts: He or she enhances tapes so they’re easier to hear, checks the tapes to make sure they haven’t been tampered with, and identifies the people speaking on them.
For the Orlando Sentinel’s audio analysis, voice identification was most important part. Forty years ago, analysts would have had people recite the phrase that was spoken in the recording in question: “Don’t tell anyone, but I’m plotting to kill the president,” for example. Then they would use a computer to chart the voices as sounds waves, and compare the graphs visually. That was the beginning of voice analysis, said Kent Gibson, a freelance forensic audio examiner in Los Angeles.
Now examiners use software that takes the recording in question, plus a recording of a known person’s voice, and compares the two using three tests, Gibson explained. Usually the recording in question has to be at least seven seconds long. The software does a spectrograph analysis, an average pitch analysis, and a statistical analysis involving a database of millions of voices.
“So you run the two samples through the program,” Gibson said, “and it gives you a percentage from 0 to 100 percent of the likelihood that they are the same.”
Typically, he looks for a match of at least 60 percent to say that the voice on the recording in question matches the recorded known person. He then uses his judgment in comparing the accent, syntax and breathing patterns in the recordings, which the program does not analyze. (Not all audio forensics people do so; Gibson said he does because he earned his bachelor’s degree in linguistics. Other cases may hire a separate linguist.)
In the Orlando Sentinel analysis, the recorded scream was a 48 percent match to Zimmerman’s voice. The Sentinel’s analyst, Tom Owen, said the scream could not have been made by Zimmerman.
Gibson, however, called that judgment difficult to make. The scream was picked up through a telephone far from the originator. The program Owen used would have taken pitch into account, but the screamer’s pitch may have been higher than usual, as he was under duress. No one has studied how duress affects voices, Gibson said.
“So there’s some issues,” he said. “I’m afraid it’ll never be a cut-and-dry case.” He said Owen is a reputable examiner and the burden of proof for “not a match,” which is the judgment Owen made, is lower than for a positive match.
A recording of Martin’s voice, recent enough to account for the changes of adolescence, would help bolster Owen’s findings, Gibson said. News reports haven’t mentioned whether Martin’s voice is available for analysis.
Gibson has been part of a trial where voice identification played a major role. He provided a declaration in the trial between actor Mel Gibson (no relation) and the actor's then-girlfriend, Oksana Grigorieva. After about a week of work on nine recordings, the analyst found that the recordings of the actor violently berating Grigorieva were authentic and hadn’t been tampered with.
The 911 calls, too, will be checked for their authenticity, said Marisa Dery. The Massachusetts resident, who appears in the American College of Forensic Examiners International’s specialist database, provides audio enhancement for criminal cases and the entertainment industry.
One way of checking the recordings is to look for the constant hum created by the electrical grid. “You might not hear it, but it’s deep inside the tape,” Dery said.
Power companies try to keep that hum at a steady 60 Hertz, but it varies in smooth, steady waves over time. If someone edits a tape to try to hide something, however, there may be an abrupt change in the background hum. About 95 percent of recordings have that hum, called the electronic network frequency, Gibson said.
Dery also looks at the graphic representation of the wind in the background of recordings, to see if there are any abrupt changes that might indicate an edit.
In Europe, analysts have access to a database of constant electronic network frequency activity from power companies. So they can determine, from a background hum, exactly what day and what time a recording was made. That’s not available in the U.S., but it undoubtedly will be in the future, Gibson said.
Analysts use software that guides them to potential edits in a recording, though those may present false positives, Gibson said. So they look for agreement from several different methods to determine that a recording has been tampered with.
Once a recording has been declared authentic and unedited, it still may not be very useful if the judge and jury can’t hear what people in it are saying. Dery specializes in enhancing audio recordings for legal cases. For example, she may look for other voices recorded in a 911 call. People who call 911 are usually agitated and yelling, overwhelming any other sounds in the call.
“Our job is to bring down a bit the volume of the person on the phone and try to bring up what is in the background, which can be tricky,” Dery said. As she raises the volume of background speakers, she introduces meaningless noise into the recording, which she then has to erase without erasing evidence. She uses software specifically made for forensic analysis. “It takes longer than 'CSI' shows,” she said, referring to the television series.
In the last phase of an audio analysis, recordings even may be used to try to detect lies. The audio engineer measures the spacing between words in a person’s speech, then compares that rhythm to the person’s rhythm when he is answering basic questions such as, “What is your name?” and “Where do you live?”
Altered speech rhythm indicates stress, but it isn’t necessarily from lying, Dery said. People may alter their speech rhythm when they’re worried or when their blood sugar is low from not having eaten recently. This audio analysis usually works alongside an examination of the person’s body language, done by other experts.
“Scientifically we can do a lot. But we always have to remember the human element in things,” she said.
Toward scientific standards
As with all evidence presented in court, any audio analysis used in a trial about the shooting of Martin would be open to challenges about its methods and reliability.
Not all analysts follow the exact procedures, Dery said. The Scientific Working Group on Digital Evidence, led by representatives from the U.S. Secret Service and the FBI, provides peer-reviewed guidelines for audio and video forensics engineers, and the Secret Service is calling for standards that all audio forensics labs must meet.