Jay Taylor's notes

Understanding the Mel Spectrogram | by Leland Roberts | Analytics Vidhya | Medium

Tags: audio fft fast-fourier-transform speech-recognition spectrogram mel-spectrogram medium.com

Clipped on: 2024-02-06

import librosa.display
import matplotlib.pyplot as plty, sr = librosa.load('./example_data/blues.00000.wav')plt.plot(y);
plt.title('Signal');
plt.xlabel('Time (samples)');
plt.ylabel('Amplitude');

ft = np.abs(librosa.stft(y[:n_fft], hop_length = n_fft+1))plt.plot(ft);
plt.title('Spectrum');
plt.xlabel('Frequency Bin');
plt.ylabel('Amplitude');

spec = librosa.amplitude_to_db(spec, ref=np.max)librosa.display.specshow(spec, sr=sr, x_axis='time', y_axis='log');
plt.colorbar(format='%+2.0f dB');
plt.title('Spectrogram');

mel_spect = librosa.power_to_db(spect, ref=np.max)librosa.display.specshow(mel_spect, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram');
plt.colorbar(format='%+2.0f dB');

In Summary

That was a lot of information to take in, especially if you are new to signal processing like myself. However, if you continue to review the concepts laid out in this post (and spend enough time staring at the corner of a wall thinking about them), it’ll begin to make sense! Let’s briefly review what we have done.

We took samples of air pressure over time to digitally represent an audio signal
We mapped the audio signal from the time domain to the frequency domain using the fast Fourier transform, and we performed this on overlapping windowed segments of the audio signal.
We converted the y-axis (frequency) to a log scale and the color dimension (amplitude) to decibels to form the spectrogram.
We mapped the y-axis (frequency) onto the mel scale to form the mel spectrogram.

That’s it! Sounds easy, right? Well, not quite, but I hope this post made the mel spectrogram a little less intimidating. It took me quite a while to understand it. At the end of the day though, I found out that Mel wasn’t so standoffish.

If you would like to see a cool application of this topic, check out my post on musical genre classification where I use mel spectrograms to train a convolutional neural network to predict the genre of a song. How does it do? Find out here!