Automated Music Playlist Generator

by Arka Sarkar, Pankil Kalra and Daksh Thapar, Machine Learning (CSE343, ECE343) from Indraprastha Institute of Information Technology, Delhi.

With the growing popularity of music streaming services like Spotify, Apple Music and Wynk, the number of songs have skyrocketed globally. Creating personalized playlists for users has become tedious and challenging as it involves individual listening to various songs and categorizing them based on their audio features. Our objective is to sort songs with similar musical characteristics into playlists automatically. Modern machine learning techniques and visualization tools enable us find accurate models that categorize millions of songs into user playlists based on song choices. Related works on this problem did not consider Lyrical Analysis while making playlists. We are using Topic Modelling techniques on lyrics, and will be using the extracted topics as a feature for generating playlists.

Source Code :

1. State of the Art

  • The paper uses multiple Machine Learning and clustering tools to create playlists on the basis of similar song audio features which include tempo, acousticness, danceability and energy.
  • The authors used K Means, Affinity Propagation and DBSCAN clustering algorithms to generate playlists based on similar audio features.
  • A questionnaire was asked to be filled by different people to help determine the general sentiment about manual playlist creation. It was found out that majority of respondents did not like creating their own playlists. Time was the primary reason they did not like to do so. Also, majority of the participants rated the automated playlists 3 or 4 (on scale of 5).

2. Music Playlist Generation based on Community Detection and Personalized PageRank by Bangzheng He, Yandi Li and Bobby Nguy. [Link]

  • This paper tackles this as a graph based problem. A directed graph was constructed with the different songs as nodes. The direction of the edges was based on two parts : Acoustic analysis and User listening data. For example, users who listen to song A often listen to song B, but users who listen to song B never listens to song A. Coupled with the level of acoustic (dis)similarity, song A would point to song B but not the other way round.
  • Community detection algorithm was applied on this graph and the nodes(songs) were divided into 90,000+ different communities.
  • A personalized PageRank algorithm for sequencing the playlists. Based on the communities that were found out and the PageRank algorithm, an algorithm was defined to build a diversified and user-custom playlist.

2. Dataset Description

We extracted 14 audio features for each song using Spotify API as shown below :

Dataset Features

The class distribution of playlists is shown below :

We performed Topic Modelling and extracted additional 20 features using Latent Dirichlet Allocation (LDA) on the lyrics which will be described in the subsection below.

2.1 Preprocessing, Feature Extraction

Word Cloud on Lyrics

2.1.1 Topic Modelling using Latent Dirichet Allocation (LDA)

We preprocessed our lyrics, applied topic modelling using Latent Dirichlet Allocation (LDA) and extracted the probabilities of each topic in every song. Latent Dirichlet allocation is a generative statistical model which provides us with a fixed number of unobserved topics which would help in analyzing some similarity between playlists. Latent Dirichlet Allocation (LDA) was applied on the lyrical corpus, we tried different values for the number of topics ranging from 3 to 30. The topics obtained were evaluated using Covariance ‘c v’ score. “20 number of topics” gave the best c v score of 0.58 and we extracted the probabilities of each topic in every song for the same.

20 new columns were added to our dataset containing the topic wise probabilities for each song.

A small visualization of the LDA topics are shown below :

  • The demo is an interactive visualisation of the topics obtained from topic modeling. The diagram denotes the importance of each topic which is represented in the size of their bubbles. It also gives the 30 most salient terms in the overall lyrics corpus. The topics that are closer in the 2d space (in the visualisation) have words which are usually closely associated together(in real life songs).
  • For example, in our visualisation the topics associated with the dark genres are close to each other. Topics 7 and 14 look are topics of songs about satanic evil, power (dark metal songs) whereas topic 9 which is close to these topics in this space is about killing, death, blood, fight (is another dark topic). Topic 6 is about love and is far away from these three topics. Topic 12 and Topic 18 are about non-english language songs and are far away from the english topics.
  • If we click on any topic, it shows the words in each topic that are ranked. There is a slider in the interactive visualisation which can be adjusted. When the labda is closer to zero, the words are ranked according to how exclusive a word is in a given topic. When the lambda is closer to one, the words are ranked according to how probable the word is to appear in the given topic(Lift).

2.1.2 Dimensionality Reduction using Principal Axis Component (PCA)

Principal Component Analysis is a method of reducing dimensions of a dataset by transforming a large set of variables into a smaller set of variables that still contains most of the information in the larger data set. The various steps in PCA include standardization, computation of covariance matrix and eigenvectors to identify principle components. PCA can be thought of as fitting a “ p-dimensional ellipsoid” to the data; every axis of the ellipsoid means a single principal component. For our project, we are working with 30 principle components.

Figure below shows the explained variance of the principle components.

PCA explained variance

2.1.3 Visualizing High Dimensional Data using t-SNE

t-Distributed Stochastic Neighbour Embedding (t-SNE) is an unsupervised, non-linear technique primarily used for data exploration and visualizing high-dimensional data. tSNE, unlike PCA, is not a linear projection. It uses the local relationships between points to create a low-dimensional mapping. This allows it to capture non-linear structure. tSNE creates a probability distribution using the Gaussian distribution that defines the relationships between the points in high-dimensional space.

2.1.4 Data Standardization

Data Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. Here’s the formula for standardization:

x¯ is the mean of the feature values, σ is the standard deviation of the feature values.

3. Methodologies

3.1 Classification

We used the following classification models: Logistic Regression, Decision Trees, Random Forests, Linear SVC, XGBoost Ensemble technique, K-Nearest Neighbours Classifier and Artificial Neural Network. To optimise various parameters in the aforementioned models, we applied GridSearchCV and 10-fold cross validation.

3.2. Playlist Generation

3.2.1 K Nearest Neighbours

Each song was represented in an N-dimensional space according to our features. The we selected the next K songs for that playlist based on the Euclidean distances measured from the average(centroid) of all of the seed songs and returned these nearest songs as a playlist collection to the user with song and artist names.

3.2.2 Clustering

We are running unsupervised clustering techniques to segregate songs into different clusters, which will act as playlists.

K-Means Clustering : We implemented K-Means Clustering algorithm, one of the most common unsupervised Machine Learning algorithm. K-Means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. The algorithm identifies k number of centroids, and then each data point is assigned to its nearest cluster, while keeping the centroids as small as possible.

Agglomerative Clustering: The agglomerative clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all clusters have been merged into one big cluster containing all objects. The result is a tree-based representation of the objects, named dendrogram.

4. Results and Analysis

4.1. Latent Dirichet Allocation (LDA)

Visualization of topics extracted from lyrics

4.2. Classification

For Binary Classification we used all 169 playlists.We used a One vs All approach where the target playlist was considered positive and all other playlists were considered negative and randomly undersampled. The XG Boost algorithm outperforms all the models with an Average F1 score of 81.0% across all playlists. Linear SVC and Logistic Regression also gave very good results posing F1 (macro) scores 78.6% and 79.2% respectively. Metrics for each model are shown in the table below:

Binary Classification Results

The training and Validation learning curve for SVC is shown in Figure below.

Learning Curve for SVC

The ROC curve is also plotted in figure below and shows an excellent area under the curve.

ROC curve for SVC

4.2.2 Multi Class Classification

For multi-class classification 5.2.2 we took 13 nonoverlapping playlists each having more than 100 songs. XGB in this case also got the best performance posing a F1 score of 58.7%.

Multi Classification Results

XBG performed well in our case as it is an ensemble method and XGBoost improves upon the basic Gradient Boosting Method framework through systems optimization and algorithmic enhancements.

We also computed the feature importance of the features which are shown below :

Feature Importance for Random Forest multi classification

As we can see from the above figure the topics obtained from the lyrics data had the highest importance (by a significant margin) for song classification. This implies that lyrics of a song are an important attribute which an user takes into account while creating a playlist. Also ”release date”, ”acousticness” and ”energy” are important attributes of a song.

4.3. Playlist Generation

We took the song corpus consisting of 11159 songs and applied SVD to get 15 component features for each song. We ran K-Means clustering algorithm for k ranging from 2 to 10. We would report the average distances of the songs in each playlist .

As a baseline we would be using the average distance of songs in our training playlist. The results are as follows :

Clustering Results

We can observe lower mean Euclidean distance between the song nodes of our clustered playlists which implies we have achieved really good quality playlists comparable to actual top playlists on Spotify.

The playlist generated from can be visualised using t-SNE as shown in the figure below :

Cluster for k = 8 for K-Means

The above Figure cluster diagram shows us a beautiful distribution of playlist clusters and shows clear distinction between the K playlists. We can clearly distinguish individual playlists with decision boundaries. The above diagram signifies the success of our clustering algorithms and feature extraction processes; we are able to distinguish songs into playlists on the basis of our extracted features from Spotify.

To find the optimum number of clusters in both K-means and Agglomeration, we plotted the silhouette scores for clusters ranging from 1 to 30 and found out that k = 8 was optimum for both K-means and Agglomeration.

Figure below shows the plot for K-Means :

Silhouette Scores for K-Means

4.3.2 K Nearest Neighbours

For playlist generation, we sent as an input the information of a country genre song: ’On the Road Again’, Bob Dylan For this corresponding input, the automatic playlist generator based on K Nearest Neighbours, where K=50 returns the following generated playlist as output (10 songs here):

Songs Generated

We can clearly observe that our output generated playlist contains songs that are very similar to the input songs. On closer manual observation for this example, we see that the output playlist’s songs are majorly of the ’country’ genre, have slower rhythmic melodies, similar classic instrumen5 tals and having similar lyrics and wordings

5. Conclusions

5.1. Outcomes

5.2. Future Work


[2] Bonnin, G. and Jannach, D. (2013). AAAI 2013 Workshop. [online] Available at:

[3] Music Playlist Generation based on Community Detection and Personalized PageRank. Stanford University Social and Information Network Analysis Autumn 2015. [online] Available at:

[4] Pichl, M., Zangerle, E. and Specht, G. (2017). Understanding Playlist Creation on Music Streaming Platforms. IEEE International Symposium on Multimedia (ISM). [online] Available at:

[5] Pampalk, E. and Gasser, M. (2006). An Implementation of a Simple Playlist Generator Based on Audio Similarity Measures and User Feedback. ISMIR 2006, 7th International Conference on Music Information Retrieval. [online] Available at:

UG CSAM | IIIT-Delhi | Foodie | Travel the world one day | tech-savvy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store