From my previous blogposts, I’ve explained that most models involving music of some kind use spectrograms as their vehicle of comparison. CNNs and RNNS are a popular choice for these algorithms for a couple reasons, the most obvious being that they both run on images. Aside from their aptitude for image inputs, CNNs and RNNs are best at pattern matching. This is due to their ability to link each pixel to its surrounding in multiple ways: CNNs because of its kernel convolutions and RNNs because of its memory of previous inputs/calculations.

