How to Record, Mix, and Master the Vocals of a Song

Presentación Dossier Musical moderno en turquesa y negro (2).png

In this project, we'll look at how to record, mix, and master the vocals of a Rap/Pop song. We'll study the project for the song HUNDIDO by singer AGRAV, so we can hear the results of what we've explained at the end of the process.

First of all, I'd like to clarify that this is my personal approach to this process, a completely self-taught experience, and that there are certainly many, much more professional ways to do it.

This project can be considered a continuation of a project done by another colleague called "CREATING a SONG IN FL STUDIO," which focuses on how to create the instrumental for a song.

In this project, we'll start from the point where the song's instrumental is already complete and have it in a single-track format to focus exclusively on the vocals.

Supplies

In this project, we'll be using the DAW Ableton, but everything I'm about to explain can be done in any other type of DAW, just as all the plugins used can be from any brand. For example, I'm using FabFilter for the EQ and saturator, Wave for the compressor, and Valhalla for the reverb and delay, but you could use any other brand or even plugins that come built into your own DAW. In the end, they always have very similar parameters that allow us to obtain the same results.

The first steps to follow are the following:

-Have the instrumental as seen in the image, added to the project.

-Assemble the recording tracks in the DAW.

-Configure the microphone gain so that we have a signal input level high enough to avoid having to digitally boost the gain too much (which also increases noise that has crept into the recording), nor high enough to cause the signal to saturate and clip input peaks.

This must be avoided at all costs, as it is information that we cannot recover.

It's preferable to have a low gain and then increase it digitally.

As you can see in the image, this is the gain we're roughly looking for, somewhere in the middle of the track's representation.

For the next step, we have two options that depend on the singer: whether they want to hear themselves in real time while recording the song or not.

This option in Ableton is activated by pressing the IN button indicated in the image.

Once activated, the singer will receive a feedback signal with all the effects applied to this track, and they will be able to hear themselves.

If we're in a situation where the artist, either because they don't know how to sing in tune or for artistic reasons, wants to use pitch correction plugins like Autotune, before starting to record, we should select the note on which the instrumental is playing in Autotune.

This can be done by ear, if someone is able to identify the note, or with plugins that detect the note on which the bass is playing.

For example, I use an Antares plugin that detects the base note and sends it directly to the autotune plugin.

Here, we could also take the opportunity to configure the autotune to our liking.

In this case, we've set the autotune to maximum speed. This serves to establish the speed at which the autotune will correct the artist's pitch deviation.

We'll also select an option that attempts to counteract the vibrato in the singer's voice, and we'll also activate an option that attempts to make the voice sound less robotic or artificial as an autotune artifact.

We'll see later that we'll use a plugin to correct autotune artifacts if, for artistic reasons, we don't want them in our recording.

Once this is done, we could record the vocal tracks.

In this case, we'll make the following recordings:

-A main vocal recording, which will have the greatest presence and body in the mix.

-Another secondary vocal recording, which will be at a lower volume level to accompany and add body.

-A third recording, which will be the so-called chorus, will serve to reinforce certain words or syllables and give more emphasis to certain points of the song.

Once everything is recorded correctly and the way we like it, the first thing I do is clean up the track, eliminating the gaps where nothing is being said and where some ambient noise may creep in, to keep everything cleaner and clearer, and to avoid overlapping background noise.

The next thing I do is set standard volumes for the different tracks in advance.

For example, I'll set the main vocals to a more noticeable volume (0 dB), the secondary tracks (which are for reinforcement) to around 8 dB, and the backing vocals to a lower volume (8 dB).

Later, once the effects have been added and everything is being listened to together, we can adjust these volumes to find the mix we like best. I usually use a limiter to play with the track gains, but it could be done from the DAW fader or in many other ways.

Once I've recorded everything, with roughly set volumes and the tracks clean, I proceed with all the digital effects processing.

I like to start with the tonal aspect, that is, the part that has to do with tuning.

Since the artist wanted to hear themselves, we already have the autotune set up beforehand.

In this case, this song isn't intended to have the electronic artifacts of autotune that some artists crave, where it seems like the voice is slipping away and sounds electronic.

Since these artifacts occur at specific times or on specific syllables, we need a tool that allows us to surgically correct these off-key sounds.

This can be done in many ways.

Autotune has a version that allows you to fix this manually, but there are other plugins, like the one I use, Melodyne, that also allow you to do this.

What we need to do is add Melodyne to our effects rack and import our vocal track into it.

Once we've done that, what I do is see where the vocals are out of tune, and since Melodyne breaks the vocal track down into words and even syllables, I take the specific word or syllable that's out of tune and move it up or down, looking for the note where it sounds correct.

I do this with all the vocal tracks until I achieve the most natural sounding vocals with the fewest possible pitch differences.

Melodyne is a very powerful and complex tool that has a ton of features I don't explore, and potential functionalities that someone with more knowledge could surely get much more out of. So I invite you to explore this tool on your own, which also works for tuning instruments (not just voices).

The next thing I like to do is equalize the vocals.

This equalization is never unique and depends on factors such as the microphone (each microphone captures frequencies differently), the voice (each person has a different voice that represents frequencies differently and requires a specific equalization for it), and the artistic intention of the mix, since one equalization or another will be applied depending on these factors.

For example, if it's an acoustic composition, I wouldn't try to remove too much bass because this adds a lot of body and accompanies very well. But if it's a more electronic composition, I would cut some of the vocal bass to give it a more consistent beat.

In this composition, which is neither one nor the other, I'm looking for something in between, leaving the bass as is, lowering the midrange a bit (since this particular singer sounds better that way), and highlighting the upper midrange and high frequencies to brighten the vocals.

The next thing I do is compress the vocals.

Like EQ, compression depends on the type of song we're mixing.

For example, if it's a rap song, I look for more aggressive compression, since emphasis isn't usually placed on different points of the melody, such as choruses, and a more consistent presence is sought.

However, if it's a mix of a more melodic song, I try to use less aggressive compression to highlight certain words or moments in the song, such as choruses, that the artist wants to emphasize in their performance.

The compressor's aggressiveness is controlled with the Peak Reduction wheel.

The next step is a De-Esser, which is basically an equalizer that reduces the frequencies surrounding the letter S, which some people find very audible.

We simply set a threshold above which to reduce the syllable, and the plugin will do the rest of the work for us.

That's all for the dynamics processing. This, while subjective and dependent on the genre and editing style we prefer, is usually more or less standard.

Now we'll move on to a more creative approach involving delay, reverb, and other types of effects.

In this mix, for creative reasons, I decided to add a saturator, as I like how it sounds in certain places.

Although it's clearly added to the main track, we'll see later that, through automation, we can choose which parts of the track it plays in or out.

For example, in this composition, on the main track, the saturator is only applied at the beginning of the track. However, in the choruses, the saturator is continuously applied.

To apply the saturator, what I do is slightly increase the frequency band where it has the greatest effect and then increase the amount of the saturator effect.

In this case, I left it around 60% so that it's subtly noticeable and doesn't completely overpower the human voice.

The next thing I like to add is the delay.

In this case, we'll use, as I mentioned earlier, the Valhalla Supermassive plugin, and we've given it a fairly low presence in the mix, with only 10% of the mix.

I've also adjusted parameters like the delay to 1/4 (which indicates how long it takes for what has been heard to repeat) and the feedback (which indicates how long the repetitions are extended over time) to a low value (10%).

This will help accompany the main track and give it a little more warmth, so that the spaces where the singer's main voice isn't heard don't feel empty.

The next thing I add is reverb. This will mainly serve to blend the singer's voice with the instrumental so that it doesn't seem like each track is going in a different direction.

Typically, reverbs tend to be more subtle and have a mix of 20% and a decay time (which indicates how long the reverberation lasts) of about 2 seconds, or at least that's what would be done in other musical genres like purer rap.

But for artistic reasons, in this mix, I've decided to give the reverb a lot of presence, as I'm looking for a more ambient and melodic sound.

As you can see, I've set the mix to 50% and the decay time to 4 seconds on the main track.

You can see that on other tracks, such as the secondary track or the chorus, the mix is slightly lowered so that there's no excessive delay, strange resonances start to be heard, silences in the tracks don't overpower, or too much presence in the mix.

And finally, and as I mentioned at the beginning, I like to add a limiter to control the gain of the tracks. It also ensures that if for any reason the recording exceeds the recommended threshold, this limiter will prevent it from sounding saturated and always control the peaks.

A very important part of mixing is automation, since the effects we've added won't always be the same at every point in the mix, and we'll want them to change.

For example, as I mentioned earlier, you can see that on the main track, the saturator is applied only to the opening chorus and then off for the rest of the mix.

Or, for example, on the main track, the gain is lowered in the first chorus, and when it breaks the bass, the gain is raised to a more present volume that will remain throughout the rest of the mix.

The next thing I do is create a stereo of the secondary vocal and chorus tracks to give the mix more body.

I do this by creating the so-called HAAS effect, which involves duplicating the track from which we want to create a stereo sound and adding a delay with a delay time of 30 milliseconds. This will give us a stereo sensation. Since humans cannot identify two sounds that are less than 50ms apart, we won't notice that there are two duplicate tracks and will feel like one.

Once the HAAS effect is applied, we'll pan one track hard left and another hard right using the Ableton pan wheel, which is located in the area highlighted in the image.

In this case, we've created the effect for the secondary track and the chorus.

This way, we'll have the main track playing in the center of the mix, with greater volume and body, and the other tracks in stereo providing very good accompaniment and compensating for the greater stereo presence with a lower gain level than the main track.

This is how I roughly perform the mix.

The next step is to do the mastering, which I personally do as follows:

-On the master fader, I set a gain value of minus 8 decibels and then increase it to the gain I need when mastering.

-Once the threshold is set to minus 8 dB, I export the mix.

-Once I have the mix exported, I introduce it into another program, in this case Ozone 9 by Izotope.

This program is a tool that helps make mastering easier, and I use it as follows:

- Drag the mix into the program.

- Click on Master Assistant to analyze our mix and automatically set a series of effects to master our composition.

Of the added effects, the most important are the equalizer and the maximizer.

The equalizer will try to create the flattest possible spectrum.

This is good so that our mix sounds more or less the same on all devices. For example, studio monitors don't need as much bass as a cell phone speaker, as monitors don't need as much bass to make the bass present.

Therefore, we're interested in the flattest possible mix.

We're also not looking to give the program complete editing control and override our personal, artistic touch.

For example, if it's determined that the bass should be two decibels lower than it is, then we leave it just half a decibel lower, since we're assuming we've also been playing with this throughout the mixing process and that all of this has also been taken into account in the process of creating the instrumental.

And so it is with the rest of the frequencies.

Also, if for whatever reason we see that the program automatically makes a very exaggerated decision, it's always better to trust our creative intuition.

And the other very important and final effect is the maximizer.

This is great because it already has the industry-standard LUFS presets built in.

LUFS are basically a gain standard found on most platforms to try to make all songs sound more or less the same.

On Spotify, for example, the standard is -14 LUFS.

What this maximizer effect does is take the gain of our mix and increase it to this threshold without going over it.

That's why, when exporting the mix, we lower the gain to minus 8 dB, since the maximizer can boost to the correct level of the LUFS, but it can't reduce our gain if we've gone too far.

The "True Peak ceiling" on the maximizer refers to the actual maximum output level the audio signal will reach, precisely measured to avoid distortion.

Unlike traditional peak meters (sample peaks) that only measure the levels of individual digital samples, a True Peak meter predicts the peaks that will occur when converting the digital signal to analog (intersample peaks). These intersample peaks can exceed the maximum digital level (0 dBFS) if not controlled, causing clipping or distortion in various playback systems, such as those used by streaming services.

Once this is done, our mix and master are ready. We export from the Izotope program, and this way we would have completed the mix and master.

You can listen to the song from the project we've been working on by searching YouTube or Spotify for the song "HUNDIDO" by the singer AGRAV, to see the explanation provided, or watch the video linked to this post.

As I emphasized at the beginning of the explanation, this is how I learned to perform these processes, completely self-taught.

I'm sure there are much more methodical and professional ways to carry them out.

But even so, I think it's great to explain and teach how I do it, because you can get decent sound quality, and I, at least, would have liked to have been taught this kind of thing when I started and couldn't find any information or tools anywhere.