The Human Voice and the Frequency Range
The voice can be one of the most complicated parts of audio to get right. Our hearing system has evolved to keep us alive and thriving, and while listening out for predators and prey has been essential, hearing and understanding our fellow human beings is absolutely crucial too.
Because of this, our hearing system is so expert at paying attention to the human voice, it’s quite obvious to us when something’s amiss. If you’re tampering with voice audio as part of a video project, and you still want it to sound realistic to people’s ears, you have to proceed with care.
Audio frequencies are a measure of how low or high the pitch of a sound is. A bass sound or a rumble is described as having a low frequency, whereas a flute or a scrape is far more of a high frequency. All these sounds will contain acoustic energy at many frequencies, rather than at just one, but the ‘dominant’ and ‘important’ frequencies usually help us describe its quality as low- or high-pitched.
When applied to speech, the human voice spans a range from about 125Hz to 8kHz. It’s no coincidence that this range puts the voice in the center of the range of human hearing – early humans who were more attuned to this frequency range were more able to understand others and more likely to survive and thrive.
Different areas of this frequency spectrum fulfil different functions in our speech: the lowest parts help us to identify pitch and therefore speech intonation; the consonants change the meaning of a word (‘cat’ and ‘bat’ are different things); and the vowels help us recognise one speaker from another, as well as changing the sounds of letters, of course.
The lowest frequency of any voice signal is called the Fundamental Frequency. As well as being the lowest, it also conveys a lot of the information about the pitch of the voice at any given point, and therefore about the overall intonation of speech.
The average fundamental frequency for a male voice is 125Hz; for a female voice it’s 200Hz; and for a child’s voice, 300Hz. Remember, as speakers change the pitch of their words, this fundamental frequency will change too, so think of fundamental frequency as a range, rather than an absolute value. These differences are due to the speed of movement of a speaker’s vocal folds (aka vocal cords), as well as their size, and how they’re used.
Consonants take up space between 2kHz and 5kHz. These sounds pass quickly and can help make speech more intelligible – think of a bad public address system which makes it hard to tell the difference between S, T and F sounds, for example.
Consonants are made in a variety of ways, but usually involve sudden air movements from the mouth and through the teeth. If consonants are too loud, they can become sharp and grating; if they’re too quiet, the speaker can sound like they’re lisping. To get consonants right, you could try a helping hand from the ERA De-Esser plugin.
Vowel Sounds are found further up the voice’s frequency spectrum. They’re most prominent between 500Hz and 2kHz, but they tend to affect the whole voice.
Each vowel is characterized by its own resonance pattern, created by the speaker positioning their mouth and tongue. Because vowels are defined by sound resonance rather than sound creation, they will remain the same while the fundamental frequency changes. Vowel resonances do vary from person to person, though.