From my previous blogposts, I’ve explained that most models involving music of some kind use spectrograms as their vehicle of comparison. CNNs and RNNS are a popular choice for these algorithms for a couple reasons, the most obvious being that they both run on images. Aside from their aptitude for image inputs, CNNs and RNNs are best at pattern matching. This is due to their ability to link each pixel to its surrounding in multiple ways: CNNs because of its kernel convolutions and RNNs because of its memory of previous inputs/calculations.
First, the most salient feature of a CNN (which stands for Convolutional Neural Network and is the model I used for an Inspirit AI Emotion Detection project) is its convolutional input. A CNN takes a kernel—essentially a “frame” or “window” of a certain dimension—and moves it an x number of pixels, where x is the “stride”, across the image, moving a predetermined amount down when it runs out of space. This enable the model to see the bigger picture by analyzing the image in chunks rather than single pixels, usually noting edges and colors. A perk of this model is its limited pre-processing compared to other neural networks, for images don’t require as much data manipulation.
The second model is an RNN, which stands for Recurrent Neural Network. These models are really good at recognization (like speech and handwriting), a subset of pattern matching. Each node has a one-way connection to every node in the successive layer with no outside inputs, which means that the model is fully recurrent—definition of recurrent being that each layer solely relies on the data from the one before it. Each node performs calculations and weights the answer before sending that data to the next node. Some successful examples of this model are LSTMs (long short-term memory), which are commonly used in music and AI because of their improvement on normal RNNs pattern-matching skills.
Stay tuned for my next blog post!
Ally Bush is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes curious high school students globally to AI through live online classes. Learn more at