Using a De-Esser to Remove Harshness and Sibilance from Voice and Dialogue Audio
Tame those harsh “Ess” sounds with this guide to combating sibilance with a de-esser.
Have you ever listened to a recording or voiceover of someone saying...
“Silly Susan sat steadily while she slurped her supper of seasoned squash she summoned from the store salesman standing stupidly in the scrumptious surroundings.”
...and noticed an unpleasant, harsh sound every time the ‘s’s are pronounced?
No? Never heard that sentence before? Well you’ve probably noticed it in other places.
Sentences sometimes sound sharp when spoken if they’re scattered with ‘s’, ‘sh’ and ‘z’ sounds; this is called sibilance and can be very unpleasant for listeners. Sibilance occurs in the upper mid frequencies between 4k-10kHz, which is an area of the frequency spectrum that can sound particularly harsh.
Luckily for us there’s a specific tool to counter this unpleasant sound: a De-Esser.
Sibilance is the term used to describe a phonetic characteristic of speech. Sibilants are consonants with high amplitude and pitch, they are produced when air is directed towards the teeth by the tongue.
You can hear it clearly in words such as ‘sip’, ‘zot’ and ‘shard’. These words all begin with sibilants – if you sustain the first consonant of each word the sibilance will be emphasized. Notice the varying sonic qualities of each letter, how consonants differ in pitch and frequency content. To get a clearer look, try recording yourself and using a frequency spectrum visualiser to analyse the frequencies for each consonant.
The sibilant frequencies are generally between 4kHz and 10kHz, however there will be variation between each speaker. Since the sound is caused by air moving through the teeth and lips, guided by the tongue, the varying shapes and sizes of people’s mouths will have an impact on the exact nature of the sibilance – factors such as teeth gaps, respiratory health, and posture can have an effect too.
Sibilance appears when recording
In day-to-day life, sibilance isn’t that noticeable. When you are listening to people in normal, unamplified conversation (eg, friends sitting around a table), these high frequencies don’t seem as harsh as when recorded into a microphone. Why?
In the real world, harsh high frequencies will have died down by the time they reach your ear, having actually been absorbed a little by the air itself. The reason sibilance arises in recorded audio and voiceovers is due to the microphone being so close to the mouth. When you're listening to people talk, you don't put your ear right next to their mouth – but that’s exactly where you put the microphone.
If you have a friend with you, get them to say the word “sentence” repeatedly. Start by standing on opposite sides of the room, and slowly move closer together until your ears are as far from their mouth as you would place a microphone. Notice how as you get closer, the sibilant letters become more noticable. Eventually you’ll feel a burst of air on your face with each sibilant consonant.
The reason you need to address sibilance is that it can be really annoying or even painful to listen to, making your projects sound unprofessional. Taking the time to learn the de-essing process, and good vocal treatment techniques, will make your voiceover audio sound smooth and polished.
Now we understand what sibilance is, we can start looking at tools to correct it.
The most effective tool to remove sibilance in audio is a De-Esser. A De-esser is a dynamic audio effect similar to a compressor. However, instead of the gain reduction being activated by the whole frequency spectrum, De-essers have controls for targeting specific frequency bands. This means you can decide which frequencies activate and are attenuated by the compressor, allowing you to reduce only the sibilant parts.
De-essers are great for working with voices, but you can also use them in many other applications. Audio Engineers have been known to use them for taming high frequencies in drum and cymbal recordings, reverbs and even in the mastering stage.
To understand how de-essers work, it’s worth looking at controls they usually have, and de-esser settings can be used to reduce sibilance.
This is our own ERA 4 De-Esser Pro. The green parts of the waveform show the sibilant points in the audio.
FOCUS: Frequency control. This lets you select which frequencies the de-esser is targeting, getting this right will ensure that the de-esser is only targeting the sibilant consonants.
PROCESSING: Reduction Control. How much de-essing is applied, similar to ratio on compressors. Increasing this value will decrease the amount of sibilance.
SHAPING: Tone shaping of the processed esses. This will result in tighter or softer processing with a smooth or sharp outcome.
DIFF: Allows you to hear only the removed sibilance. This is useful to check you’re targeting the correct frequencies by isolating the targeted sound.
Most professional De-essers will have these controls, but different de-essers’ settings may be labelled differently.
If they have in-built frequency-dependent sidechain functions some compressors can be converted to De-essers by using the sidechain to target only high frequencies. Ableton Live’s compressor is a good example of this.
When you’re trying to de-ess, it’s best to monitor your audio on the most accurate speakers or headphones you have access to. Listening on low quality speakers (phones, TVs, hi-fi or laptop) can make sibilants sound louder due to imbalances in frequency representation. This can make it harder for you to make out what is actually happening with your audio.
Step 1: Begin by inserting a De-esser on your vocal channel.
Step 2: Put the De-esser’s gain reduction/processing control to the maximum so you can hear it working excessively, making lisped, under sibilant audio.
Step 3: Next, experiment with the frequency targeting control to work out where the sibilance is occuring. The exaggeration of the max reduction will help you find the sweet spot, not cutting too low, but not reducing any excessive frequencies.
Step 4: Once you’ve found the rough frequency area to target, turn the gain reduction/processing back down.
Step 5: Now you’ve worked out where the problem frequencies are, it's time to adjust the amount of reduction by adjusting the ratio, threshold, attack and release times. Your De-Esser may not have all of these options available but you should still be able to effectively curtail the harshness of the sibilance.
To get this right, listen to the sibilants as the reduction is applied. Are they still too noticeable? Apply more reduction. Does your voice suddenly have a lisp? Back off the reduction.
Throughout the process, you should A/B bypass the de-esser to compare it to the original signal. Tweak the controls whilst comparing until it sounds right. Make sure it's not reducing too much or hitting unnecessary frequencies, creating a strange vocal tone.
After some tweaking, your vocals should now be balanced, with any sibilants being tamed by the De-esser.
Remember you will need to use a different de-esser setting for each unique voice – one size is unlikely to fit all. If you are working on a recorded interview where multiple voices are recorded to one audio track you should separate each voice to its own track and have a different instance of the de-esser plugin for each track. It won’t work well having everything run through the same settings.
Below we have some audio examples to demonstrate how to find the sweet spot when de-essing.
Here is our original audio recording:
And here are just the sibilant frequencies that we want to remove:
Here’s an over processed version of the audio. Notice how it sounds as if the speaker has a lisp:
And now here’s our audio de-essed to a pleasant level. We’ve settled on processing at 50%:
If you don’t have a de-esser, there are some other solutions you can try.
You can use EQs to counter sibilance, but this isn't an ideal solution as EQs are static whereas de-essers are dynamic. This means that when you cut the sibilant frequencies, the reduction will be applied throughout the whole recording, making it sound muffled. Really you only need to cut those frequencies when the sibilant sounds are being made, not all the time. This is why a de-esser beats an EQ for correcting sibilance problems, it only processes when needed, not continuously.
Spectral editing is a function built into software like Reaper, which allowing you to turn down the problem frequency areas, however just like an EQ, the processing will be static unless you automate the regions – this, however, is very time consuming.
That being said, spectral analysis can be useful for seeing where the sibilant frequencies occur in the first place, helping you to target them better.
Manual Gain Control
You can also manually reduce the gain at the points where sibilance occurs. This is the same as manually riding the mixing desk faders over problem areas. Again this is time consuming and doesn’t target the specific problem frequencies.
In an ideal world, you wouldn’t need to use a De-esser, as your recording would be perfect in the first place (obviously this isn't always possible)
Here are some things to consider to reduce esses while recording:
Distance: Don’t place the microphone too close to the mouth. Try to leave an 8 to 12 inch gap.
Angle: Try angling the microphone down 10 to 15 degrees to align the 0-degree axis toward the throat instead of mouth.
Warm Up: If possible try to get your speaker to warm up. Vocal exercises can help with speech control and balancing essyness. Excessive saliva in the mouth can cause more problems, so vocalists should occasionally rinse their mouth out with water (or spit… gross) to reduce saliva.
Disruption: Whilst pop-shields won’t do much to sibilance, strapping a pencil across the centre of the mic cartridge will disrupt the essy high frequencies reducing sibilance.