Automated Music Playlist Generator

by Arka Sarkar, Pankil Kalra and Daksh Thapar, Machine Learning (CSE343, ECE343) from Indraprastha Institute of Information Technology, Delhi.

With the growing popularity of music streaming services like Spotify, Apple Music and Wynk, the number of songs have skyrocketed globally. Creating personalized playlists for users has become tedious and challenging as it involves individual listening to various songs and categorizing them based on their audio features. Our objective is to sort songs with similar musical characteristics into playlists automatically. Modern machine learning techniques and visualization tools enable us find accurate models that categorize millions of songs into user playlists based on song choices. Related works on this problem did not consider Lyrical Analysis while making playlists. We are using Topic Modelling techniques on lyrics, and will be using the extracted topics as a feature for generating playlists.

Source Code :

1. State of the Art

1. Automated Playlist generation from Personal Music Libraries by Diana Lin and Sampath Jayarathna [Link]

  • The paper uses multiple Machine Learning and clustering tools to create playlists on the basis of similar song audio features which include tempo, acousticness, danceability and energy.
  • The authors used K Means, Affinity Propagation and DBSCAN clustering algorithms to generate playlists based on similar audio features.
  • A questionnaire was asked to be filled by different people to help determine the general sentiment about manual playlist creation. It was found out that majority of respondents did not like creating their own playlists. Time was the primary reason they did not like to do so. Also, majority of the participants rated the automated playlists 3 or 4 (on scale of 5).

2. Music Playlist Generation based on Community Detection and Personalized PageRank by Bangzheng He, Yandi Li and Bobby Nguy. [Link]

  • This paper tackles this as a graph based problem. A directed graph was constructed with the different songs as nodes. The direction of the edges was based on two parts : Acoustic analysis and User listening data. For example, users who listen to song A often listen to song B, but users who listen to song B never listens to song A. Coupled with the level of acoustic (dis)similarity, song A would point to song B but not the other way round.
  • Community detection algorithm was applied on this graph and the nodes(songs) were divided into 90,000+ different communities.
  • A personalized PageRank algorithm for sequencing the playlists. Based on the communities that were found out and the PageRank algorithm, an algorithm was defined to build a diversified and user-custom playlist.

2. Dataset Description

We have picked the Million Song Dataset (MSD lyrical) which consists of 2,33,662 songs with lyrics. We collected the Top Search Results of playlists using phrases like ’Summer’, ’Love,Party’, ’Breakup’, ’Rock’, ’Country’, ’Romance’, ’Rap’, ’Metal’, ’Hip-Hop’, ’Latin’, ’Blues’, ’Soul’, ’Classic’, ’Pop’, ’Jazz’, ’Folk’, ’RB’ from Spotify API which have a significant song overlap with the MSD dataset. In total, we obtained 169 playlists amounting to 11,159 unique songs.

We extracted 14 audio features for each song using Spotify API as shown below :

Dataset Features

The class distribution of playlists is shown below :

We performed Topic Modelling and extracted additional 20 features using Latent Dirichlet Allocation (LDA) on the lyrics which will be described in the subsection below.

The lyrics obtained from Million Song Dataset were already preprocessed to some extent. The words had been stemmed and lyrics were converted to Bag of Words format. We further removed Stopwords and words with less than 3 characters for better topic creation.

Word Cloud on Lyrics

2.1.1 Topic Modelling using Latent Dirichet Allocation (LDA)

We preprocessed our lyrics, applied topic modelling using Latent Dirichlet Allocation (LDA) and extracted the probabilities of each topic in every song. Latent Dirichlet allocation is a generative statistical model which provides us with a fixed number of unobserved topics which would help in analyzing some similarity between playlists. Latent Dirichlet Allocation (LDA) was applied on the lyrical corpus, we tried different values for the number of topics ranging from 3 to 30. The topics obtained were evaluated using Covariance ‘c v’ score. “20 number of topics” gave the best c v score of 0.58 and we extracted the probabilities of each topic in every song for the same.

20 new columns were added to our dataset containing the topic wise probabilities for each song.

A small visualization of the LDA topics are shown below :

  • The demo is an interactive visualisation of the topics obtained from topic modeling. The diagram denotes the importance of each topic which is represented in the size of their bubbles. It also gives the 30 most salient terms in the overall lyrics corpus. The topics that are closer in the 2d space (in the visualisation) have words which are usually closely associated together(in real life songs).
  • For example, in our visualisation the topics associated with the dark genres are close to each other. Topics 7 and 14 look are topics of songs about satanic evil, power (dark metal songs) whereas topic 9 which is close to these topics in this space is about killing, death, blood, fight (is another dark topic). Topic 6 is about love and is far away from these three topics. Topic 12 and Topic 18 are about non-english language songs and are far away from the english topics.
  • If we click on any topic, it shows the words in each topic that are ranked. There is a slider in the interactive visualisation which can be adjusted. When the labda is closer to zero, the words are ranked according to how exclusive a word is in a given topic. When the lambda is closer to one, the words are ranked according to how probable the word is to appear in the given topic(Lift).

2.1.2 Dimensionality Reduction using Principal Axis Component (PCA)

Principal Component Analysis is a method of reducing dimensions of a dataset by transforming a large set of variables into a smaller set of variables that still contains most of the information in the larger data set. The various steps in PCA include standardization, computation of covariance matrix and eigenvectors to identify principle components. PCA can be thought of as fitting a “ p-dimensional ellipsoid” to the data; every axis of the ellipsoid means a single principal component. For our project, we are working with 30 principle components.

Figure below shows the explained variance of the principle components.

PCA explained variance

2.1.3 Visualizing High Dimensional Data using t-SNE

t-Distributed Stochastic Neighbour Embedding (t-SNE) is an unsupervised, non-linear technique primarily used for data exploration and visualizing high-dimensional data. tSNE, unlike PCA, is not a linear projection. It uses the local relationships between points to create a low-dimensional mapping. This allows it to capture non-linear structure. tSNE creates a probability distribution using the Gaussian distribution that defines the relationships between the points in high-dimensional space.

2.1.4 Data Standardization

Data Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. Here’s the formula for standardization:

x¯ is the mean of the feature values, σ is the standard deviation of the feature values.

3. Methodologies

Our objective is to generate automated song playlists for users using Label Classification and Graph-based techniques. For the Classification problem, two different methodologies were used. We tried different ML based classification algorithms both for a single-class prediction for 169 playlist and multi-class prediction for 13 playlists. For playlist generation, we used clustering and graph-based approaches to construct playlists based on the feature similarity of songs from certain seed songs.

We applied binary classification on a total of 169 playlists. We used a One vs All approach where the target playlist was considered positive and all other playlists were considered negative and randomly undersampled. We performed 60:20:20 train-validation-test split and stratified sampling on the dataframe constructed. For multi-class classification, we took all the non overlapping songs among various playlists; the total songs reduced from 11159 to 6120 for 169 playlists. We filtered out playlists having more than 100 songs for our model training.

We used the following classification models: Logistic Regression, Decision Trees, Random Forests, Linear SVC, XGBoost Ensemble technique, K-Nearest Neighbours Classifier and Artificial Neural Network. To optimise various parameters in the aforementioned models, we applied GridSearchCV and 10-fold cross validation.

We performed data standardization, dimensionality reduction using Singular Value Decomposition (SVD) and used the following graph- based techniques for Playlist visualization and generation.

3.2.1 K Nearest Neighbours

Each song was represented in an N-dimensional space according to our features. The we selected the next K songs for that playlist based on the Euclidean distances measured from the average(centroid) of all of the seed songs and returned these nearest songs as a playlist collection to the user with song and artist names.

3.2.2 Clustering

We are running unsupervised clustering techniques to segregate songs into different clusters, which will act as playlists.

K-Means Clustering : We implemented K-Means Clustering algorithm, one of the most common unsupervised Machine Learning algorithm. K-Means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. The algorithm identifies k number of centroids, and then each data point is assigned to its nearest cluster, while keeping the centroids as small as possible.

Agglomerative Clustering: The agglomerative clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all clusters have been merged into one big cluster containing all objects. The result is a tree-based representation of the objects, named dendrogram.

4. Results and Analysis

The visualization below shows the importance of different topics, their separation in space in our lyrics corpus. The size of each bubble represents the importance of that 3 topic for our LDA model. The visualizations shows 20 clearly separated topics.

Visualization of topics extracted from lyrics

4.2.1 Binary Classification

For Binary Classification we used all 169 playlists.We used a One vs All approach where the target playlist was considered positive and all other playlists were considered negative and randomly undersampled. The XG Boost algorithm outperforms all the models with an Average F1 score of 81.0% across all playlists. Linear SVC and Logistic Regression also gave very good results posing F1 (macro) scores 78.6% and 79.2% respectively. Metrics for each model are shown in the table below:

Binary Classification Results

The training and Validation learning curve for SVC is shown in Figure below.

Learning Curve for SVC

The ROC curve is also plotted in figure below and shows an excellent area under the curve.

ROC curve for SVC

4.2.2 Multi Class Classification

For multi-class classification 5.2.2 we took 13 nonoverlapping playlists each having more than 100 songs. XGB in this case also got the best performance posing a F1 score of 58.7%.

Multi Classification Results

XBG performed well in our case as it is an ensemble method and XGBoost improves upon the basic Gradient Boosting Method framework through systems optimization and algorithmic enhancements.

We also computed the feature importance of the features which are shown below :

Feature Importance for Random Forest multi classification

As we can see from the above figure the topics obtained from the lyrics data had the highest importance (by a significant margin) for song classification. This implies that lyrics of a song are an important attribute which an user takes into account while creating a playlist. Also ”release date”, ”acousticness” and ”energy” are important attributes of a song.

4.3.1 Clustering

We took the song corpus consisting of 11159 songs and applied SVD to get 15 component features for each song. We ran K-Means clustering algorithm for k ranging from 2 to 10. We would report the average distances of the songs in each playlist .

As a baseline we would be using the average distance of songs in our training playlist. The results are as follows :

Clustering Results

We can observe lower mean Euclidean distance between the song nodes of our clustered playlists which implies we have achieved really good quality playlists comparable to actual top playlists on Spotify.

The playlist generated from can be visualised using t-SNE as shown in the figure below :

Cluster for k = 8 for K-Means

The above Figure cluster diagram shows us a beautiful distribution of playlist clusters and shows clear distinction between the K playlists. We can clearly distinguish individual playlists with decision boundaries. The above diagram signifies the success of our clustering algorithms and feature extraction processes; we are able to distinguish songs into playlists on the basis of our extracted features from Spotify.

To find the optimum number of clusters in both K-means and Agglomeration, we plotted the silhouette scores for clusters ranging from 1 to 30 and found out that k = 8 was optimum for both K-means and Agglomeration.

Figure below shows the plot for K-Means :

Silhouette Scores for K-Means

4.3.2 K Nearest Neighbours

For playlist generation, we sent as an input the information of a country genre song: ’On the Road Again’, Bob Dylan For this corresponding input, the automatic playlist generator based on K Nearest Neighbours, where K=50 returns the following generated playlist as output (10 songs here):

Songs Generated

We can clearly observe that our output generated playlist contains songs that are very similar to the input songs. On closer manual observation for this example, we see that the output playlist’s songs are majorly of the ’country’ genre, have slower rhythmic melodies, similar classic instrumen5 tals and having similar lyrics and wordings

5. Conclusions

We are pleased to inform that our initial hypothesis that lyrics play an significant role in curating playlists was in fact true. This can be clearly observed by looking at the feature importance graphic shown in the results section. The metric scores obtained on our binary and multi classification were very promising and posed a higher or similar score with respect to the previous work done on this topic. One of the limitations of the MSD dataset is that these contain information including audio features and song lyrics before 2011, and it does not contain all the trending songs from this era. These datasets mostly contains songs of the genres ’Rock’ and ’Metal’. The scores obtained by us shows the importance of audio features and lyrics in classifying playlists. Future improvements on the approach can be made by gathering more features and even creating derived features to improve the performance.

Future general improvements could focused on extracting more song characteristics along with deriving more generated features from them to improve upon the performance of the current models. This process could be achieved by gathering more data from various providers especially some of the genres that are under represented in the MSD dataset. This entire idea could be extended to be implemented in current streaming softwares with interactive UI for musical professionals and enthusiasts. The idea of topic modelling from lyrical corpus can be extended to various text based services such as twitter, reddit and facebook data to extract topics from them and provide a similar implementation for such systems as well.


[1] Lin, D. and Jayarathna, S. (2018). Conferences 2018 IEEE International Confe… Automated Playlist Generation from Personal Music Libraries. IEEE International Conference on Information Reuse and Integration (IRI). [online] Available at:

[2] Bonnin, G. and Jannach, D. (2013). AAAI 2013 Workshop. [online] Available at:

[3] Music Playlist Generation based on Community Detection and Personalized PageRank. Stanford University Social and Information Network Analysis Autumn 2015. [online] Available at:

[4] Pichl, M., Zangerle, E. and Specht, G. (2017). Understanding Playlist Creation on Music Streaming Platforms. IEEE International Symposium on Multimedia (ISM). [online] Available at:

[5] Pampalk, E. and Gasser, M. (2006). An Implementation of a Simple Playlist Generator Based on Audio Similarity Measures and User Feedback. ISMIR 2006, 7th International Conference on Music Information Retrieval. [online] Available at:

UG CSAM | IIIT-Delhi | Foodie | Travel the world one day | tech-savvy