My Spotify Journey towards a Recommendation System

There’s one particular event that cheers my Mondays. I’m talking about the amazing Spotify ‘Weekly Discovery’ Playlist that is updated. It’s quite amazing the combination of songs that you never heard but still love.

There are several articles online explaining the AI behind this beloved Weekly Discovery technique used by Spotify. Here’s one of my favourites:

Putting together my recent programming skills with my not so recent Spotify love and addiction, I tried to build my own recommendation system, as well as explore some fun Data Analysis along the way.

Here is my Project git folder in case you’d like to check the code. It is divided into several notebooks.

Also, my Spotify Profile.

Setting up the Spotify API

The witchery starts with the use of Spotify API to create an app within the Spotify developer environment. After creating a developer’s account and an application environment I’m able to access data on any public playlist out there, following this tutorial.

My first access to the API involved getting two playlists: liked and disliked songs. (Spoiler alert: these were then used for supervised learning)

I build a master function to be able to convert any playlist into a dataframe:

def master_function(uri):
uri = playlist_uri
username = uri.split(':')[2]
playlist_id = uri.split(':')[4]
results = {'items':[]}

for n in range(0,3000,100):
new = sp.user_playlist_tracks(username, playlist_id, offset = n)
results['items'] += new['items']

playlist_tracks_data = results
playlist_tracks_id = []
playlist_tracks_titles = []
playlist_tracks_artists = []

for track in playlist_tracks_data['items']:

#adds a list of all artists involved in the song to the list of artists for the playlist
for artist in track['track']['artists']:
artist_list = []

df = pd.DataFrame([])
for i in range(0, len(playlist_tracks_id)):
features = sp.audio_features(playlist_tracks_id[i])
features_df = pd.DataFrame(features)
df = df.append(features_df)

df['title'] = playlist_tracks_titles
#features_df['first_artist'] = playlist_tracks_first_artists
df['main_artist'] = playlist_tracks_artists
#features_df = features_df.set_index('id')
df = df[['id', 'title', 'main_artist',
'danceability', 'energy', 'key', 'loudness',
'mode', 'acousticness', 'instrumentalness',
'liveness', 'valence', 'tempo',
'duration_ms', 'time_signature']]

return df

Exploratory Data Analytics

Luckily for us, the API provides access to statistics and details on each song — the Audio Feature Object aka Features!!

Let’s take a look at each feature.

TEMPO. Simply put, how many beats per minute (BPM) does each song have?

Tempos are also related to different Genres: Hip Hop 85–95 BPM, Techno 120–125 BPM, House & Pop 115–130 BPM, Electro 128 BPM, Reggaeton >130 BPM, Dubstep 140 BPM

0-Disliked; 1- Liked Songs

ENERGY. A measure of intensity and activity.

  • This is the first of Spotify’s more subjective metrics.
  • Energy represents a perceptual measure of intensity and activity.
  • Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.

What artists are driving my Energy taste down?

Thank’s Nick fka Chet Faker


Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.

Looks like while my ‘disliked’ songs follow a skewed distribution towards higher levels of danceability, my loved songs follow a supernormal distribution on this feature showing that I enjoy a wide range of danceability level.

Checking what artists are driving the lower levels of danceability.
we found the guilty one.


From the graph, we can see that I do enjoy songs with a normal level of danceability but low energy. Two very distinct clusters.

What is driving High Danceability but Low Energy?

LO-FI music! Makes sense — I love this type of music.


The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

I mapped the keys to have the real notation.
We can see key A is used more often both on liked and disliked songs.


A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

Very skewed towards more acoustic songs.


This is one of the most interesting metrics that Spotify produces: A measure describing the musical positiveness conveyed by a track.

  • Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric).
  • Tracks with low valence sound more negative (e.g. sad, depressed, angry).
Why do I like sad songs? Probably just LDR again….

LOUDNESS. The overall loudness of a track in decibels (dB).

Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

Fyi. Did you know Spotify adjusts for loudness?

Supervised Learning

You can find the full code here.

The first step is to split our data into a training and testing set. I used sklearn function called train_test_split() which splits the data according to a test_size percent specified in the method. The code below breaks up the data into 85% train, 10% test.

from sklearn.model_selection import train_test_splitfeatures = ['danceability', 'energy', 'key','loudness', 'mode', 'acousticness', 'instrumentalness', 'liveness','valence', 'tempo', 'duration_ms', 'time_signature']X = data[features]
y = data['Like']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.1)

Logistic Regression

The first model that I tried was logistic regression. I got 80% accuracy.

from sklearn.linear_model import LogisticRegressionlr_model = LogisticRegression(), y_train)
lr_pred = lr_model.predict(X_test)
score = metrics.accuracy_score(y_test, lr_pred)*100
print("Accuracy using Logistic Regression: ", round(score, 3), "%")

Decision Tree Classifier

A decision tree classifier is often the easiest to visualize. All it is is pretty much a decision tree based on the features so you can trace the path down and visually see how it makes decisions.

model = DecisionTreeClassifier(max_depth = 8), y_train)
score = model.score(X_test,y_test)*100
print("Accuracy using Decision Tree: ", round(score, 2), "%")
#visualize the tree
export_graphviz(model, out_file="", class_names=["malignant", "benign"],
feature_names = data[features].columns, impurity=False, filled=True)

import graphviz
with open("") as f:
dot_graph =
Tree visualization
#feature importance
def plot_feature_importances(model):
n_features = data[features].shape[1]
plt.barh(range(n_features), model.feature_importances_, align='center')
plt.yticks(np.arange(n_features), features)
plt.xlabel("Feature importance")
plt.title("Spotify Features Importance - Decision Tree")

K-Nearest Neighbors (KNN)

The K-Nearest Neighbors classifier looks at the neighbours of a data point in order to determine what the output is. This approach gave a slightly better accuracy than the previous classifiers.

knn = KNeighborsClassifier(n_neighbors = 2), y_train)
knn_pred = c.predict(X_test)
score_test_knn = accuracy_score(y_test, knn_pred) * 100

print("Accuracy using Knn Tree: ", round(score_test_knn, 3), "%")

Random Forest

The random forest is a model made up of many decision trees. Rather than just simply averaging the prediction of trees (which we could call a “forest”), this model uses sampling and random subsets to build trees and split nodes, respectively. Accuracy of 87%!

model =,y_train)

score = metrics.accuracy_score(y_test, y_pred)*100

print("Accuracy with Random Forest:", round(score, 4), "%")

Test Algorithm with external playlists

After checking within my test datasets which songs had been incorrectly classified, I decided to compare against external playlists.

To do so, I used the best classifier (Random Forest) with predict_proba instead of predicting only. This allowed having a probability as dependent variable instead of a simple binary output.

pred = clf.predict_proba(felipe_df[features])[:,1]
felipe_df['prediction'] = pred
print("How similar is it to my taste?", round(felipe_df['prediction'].mean()*100,3), "%")

Unsupervised Learning

Recall that in supervised machine learning you have independent/input variables (X) and dependent/output variable (Y ) and you use an algorithm to learn the mapping function from the input to the output.

In contrast, in unsupervised machine learning, you only have input data (X) and no corresponding output variables.

The goal here is to extract knowledge based on any patterns we’ll be able to find — no labels will be addressed on liked and disliked. Unsupervised learning problems can be further grouped into clustering and association problems.

PCA: Principal Component Analysis

Principal Component Analysis (PCA) emphasizes variation and brings out strong patterns in a dataset. In other words, it takes all the variables then represents it in a smaller space while keeping the nature of the original data as much as possible.

  • The first principal component will encompass as much of the dataset variation as possible in 1 dimension
  • The second component will encompass as much as possible of the remaining variation as possible while remaining orthogonal to the first, and so on
from sklearn.decomposition import PCApca = PCA(n_components=3, random_state=42)
df_pca = pd.DataFrame(data=pca.fit_transform(features_scaled), columns=[‘PC1’,’PC2',’PC3'])
plt.matshow(pca.components_, cmap='viridis')
plt.yticks([0, 1, 2], ["First component", "Second component", "Third component"])
plt.xticks(range(len(data[features].columns)),data[features], rotation=60, ha='left')
plt.ylabel("Principal components")
PCA with 3 components

Plotting in 3D: I chose danceability for colour differentiation since, as per above, we can conclude that it is an important feature.

# Plot the PCA
title='Principal Component Analysis Projection (3-D)',

We can see each song position and its distance to other songs based on the audio features that have been transformed. Most points are concentrated on the green-eish areas. The mapping also confirms that danceability does correlate with PC2 to some extent. ‘Am I boy? Am I a girl? Do I really care’ (fyi it’s a liked song) is on the opposite side to the way way less dance level with Hellowen’s “Hallowen”.

Clustering with K-Means

The main idea behind k-means clustering is that we choose how many clusters we would like to create (typically we call that number k). We choose this based on domain knowledge, a ‘best-guess’, or randomly.

In the end you are left with areas that identify in which a cluster a newly assigned point would be classified.

# Let’s start with 2 clusters= 2 features
kmeans = KMeans(n_clusters=2)
model =
data_2 = data.copy()
data_2[‘labels’] = model.labels_
data_2['labels'].value_counts() #check how many obs in each cluster
Plotting Clusters

What’s the optimal number of clusters? While having too many clusters might mean that we haven’t actually learned much about the data — the whole point of clustering is to identify a relatively small number of similarities that exist in the dataset. Too few clusters might mean that we are grouping unlike samples together artificially.

There are many different methods for choosing the appropriate number of clusters, but one common method is calculating a metric for each number of clusters, then plotting the error function vs the number of clusters.

  • Yellowbrick’s KElbowVisualizer: implements the “elbow” method of selecting the optimal number of clusters by fitting the K-Means model with a range of values for K.
X = features_scaled
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,10)) # Fit the data to the visualizer # Finalize and render the figure
we see that the model is fitted with 3 clusters — the “elbow” in the graph

I used two other methodologies for unsupervised learning: Gaussian Mixture Models and HAC (Hierarchical Agglomerative Clustering). Please refer to the notebook.


Throughout this project, I have investigated the Spotify API, done Exploratory Data Analysis on disliked vs liked songs, as well as with my Musical Journey, and even dug into several supervised and unsupervised ML techniques.

Finally, produced a final product where the user can input his (public) playlist URI and get an Exploratory Data Analysis on the songs’ features, as well as new song recommendations. This will be produced based on Unsupervised Learning techniques since we can assume the shared playlist will be a collection of liked songs and therefore we can’t apply labelling techniques.

I’ve put together a super big function that can be found here. Let’s investigate the most important parts.

1- Use the master function that takes the playlist URI and transforms it into a dataframe.

2- Create an enormous playlist with musical diversity that will work as a library for music recommendations (we can think of this as the music universe).

3- Perform PCA technique with 3 components on the two playlists.

Output example for friend’s playlist

4- Get recommendations with PCA and Nearest Neighbors. Export to PDF.

from scipy.spatial import KDTree
columns = [‘PC1’, ‘PC2’, ‘PC3’]
kdB = KDTree(df_pca_all_songs[columns].values)
neighbours = kdB.query(df_pca[columns].values, k=1)[-1]
#recomendations output: 30 songs you might like
recomendations = all_songs[all_songs.index.isin(neighbours[:31])]
recomendations_output = recomendations[['title', 'main_artist']]
recomendations_output.columns = ['Song Title', 'Artist']
Example of Recommendations (30 songs)

5- Data Analysis with the Obamas playlist, Pitchfork top albums and songs and Billboard Top 100.

from sklearn.preprocessing import MinMaxScaler  #scaledata_scaled = pd.DataFrame(MinMaxScaler().fit_transform(data[features]), 
data_scaled[‘Playlist’] = data[‘Playlist’]
df_radar = data_scaled.groupby(‘Playlist’).mean().reset_index() \
.melt(id_vars=’Playlist’, var_name=”features”, value_name=”avg”) \
fig = px.line_polar(df_radar,
title=’Mean Values of Each Playlist Features’,
range_r=[0, 0.9],
Polar Graph: Your taste vs Billboard, Pitchfork or The Obamas

Hipster or Mainstream? Compare your taste to Billboard and Pitchfork

High Fidelity TV show. “Making a playlist is a delicate art. You get to use someone else’s poetry to express how you feel”
def big_graph(feature, label1="", label2="", label3 = ""):
sns.kdeplot(data[data['Playlist']=='Your Songs'][feature],label=label1)
sns.kdeplot(data[data['Playlist']=='Billboard Top 100'][feature],label=label3)

plots =[]plt.figure(figsize=(16,16))
plt.suptitle("Hipster or Mainstream?", fontsize="x-large")
for i, f in enumerate(features):
if ((i+1)% 4 == 0) or (i+1==len(features)):
big_graph(f,label1="Your Songs", label2="Pitchfork", label3="Billboard Top 100")
plots.append(plt.gca())#the code is the same for the Obamas
You vs Pitchfork vs Billboard Top 100

You an Obama? Compare your taste with the Obamas.

You vs The Obamas

6- Put together a Final Report PDF. et voilá!


If you liked the article make sure to check the full Github with all the code that you can grab and explore! Also, if you have any suggestions regarding the project workflow or if you notice I did something wrong please be sure to let me know in the comments. Thanks!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store