Jay Taylor's notes
back to listing indexUnderstanding the Mel Spectrogram | by Leland Roberts | Analytics Vidhya | Medium
[web search]import matplotlib.pyplot as plty, sr = librosa.load('./example_data/blues.00000.wav')plt.plot(y);
plt.title('Signal');
plt.xlabel('Time (samples)');
plt.ylabel('Amplitude');
plt.title('Spectrum');
plt.xlabel('Frequency Bin');
plt.ylabel('Amplitude');
plt.colorbar(format='%+2.0f dB');
plt.title('Spectrogram');
In Summary
That was a lot of information to take in, especially if you are new to signal processing like myself. However, if you continue to review the concepts laid out in this post (and spend enough time staring at the corner of a wall thinking about them), it’ll begin to make sense! Let’s briefly review what we have done.
- We took samples of air pressure over time to digitally represent an audio signal
- We mapped the audio signal from the time domain to the frequency domain using the fast Fourier transform, and we performed this on overlapping windowed segments of the audio signal.
- We converted the y-axis (frequency) to a log scale and the color dimension (amplitude) to decibels to form the spectrogram.
- We mapped the y-axis (frequency) onto the mel scale to form the mel spectrogram.
That’s it! Sounds easy, right? Well, not quite, but I hope this post made the mel spectrogram a little less intimidating. It took me quite a while to understand it. At the end of the day though, I found out that Mel wasn’t so standoffish.
If you would like to see a cool application of this topic, check out my post on musical genre classification where I use mel spectrograms to train a convolutional neural network to predict the genre of a song. How does it do? Find out here!
Data Storytelling | Math | Driven by Curiosity