Issue #40 - All I Want for Christmas is SARIMA
💊 Pill of the week
Welcome to a Christmas Special issue! Numerous countries across the globe gear up for Christmas celebrations, and what better way to celebrate it than with a festive Data Science project?
Let’s forecast the popularity of the “All I Want for Christmas” search by Mariah Carey on YouTube in the upcoming weeks.
We can get the data from Google Trends. We will use data from the last 5 years. This data comes in weekly periods, so a year of data will consist of 52 samples.
This is what the data looks like. The first thing we observe is that the popularity column has some non-numeric values to show that on a week the value was greater than zero but lower than one. We can simply replace those values with zero:
df.popularity.replace('<1', 0, inplace=True)Let’s check for missing values:
df.isna().sum()We have no missing values! That’s great, we can continue.
Let’s check for the data types of our features:
df.dtypesBoth of our features are object types… We need to convert week to datetime and popularity to numeric (integer or float):
df['week'] = pd.to_datetime(df['week'], format='%Y-%m-%d')
df['popularity'] = df['popularity'].astype(int)We will also set the index of the dataframe as the week:
df = df.set_index('week')Let’s finally visualize the data:
df.popularity.plot(figsize=(12, 5))
plt.grid(True, alpha=0.5)
plt.xlabel('Date')
plt.ylabel('Popularity')
plt.show()We will use autoARIMA to train our SARIMA model. Why SARIMA? We can see that there is a clear seasonal component in our data!
# Install the pmdarima if you don't have it
!pip install pmdarima==2.0.3
# Import the library
from pmdarima.arima import auto_arimaBefore proceeding let’s split our data into train and test sets:
samples_train = int(df.shape[0] * 0.9)
train = df.iloc[:samples_train]
test = df.iloc[samples_train:]Why do we need to split the data? Check my 𝕏 thread about this:
We have weekly data. From the previous graph, we can observe annual seasonality. Since a year has 52 weeks, we will select the seasonal period m as 52. Let’s train a SARIMA model using the train set:
# Build and fit the AutoARIMA model
model = auto_arima(train, seasonal=True, m=52, suppress_warnings=True)
# Check the model summary
model.summary()The optimal model seems to be: SARIMAX(1, 0, 0)x(1, 0, [1, 2], 52)
We can plot the residual diagnostics to see how good our model is
model.plot_diagnostics(figsize=(12,8))
plt.show()It is not perfect, but we can consider it acceptable for our case. There might be some inadequacies or misspecifications in the model. We could potentially check for outliers, consider exogenous variables, apply transformations…
This is how it should look ideally. Check my 𝕏 thread here.
Let’s now see how well it predicts our test set:
# Make predictions
predictions = model.predict(n_periods=df.shape[0]-samples_train)
# Format as dataframe
predictions = predictions.to_frame(name='predictions')Not bad, right?
Let’s retrain the model using the entire data and make our predictions for this festive period!
That means that during this Christmas we can expect a popularity peak of 58%! It seems like “All I Want for Christmas“‘ popularity has been decreasing each year, right?
I hope this was useful! I must say that this is just a simplified version of the process of training your Time Series model. Some steps are missing such as addressing the non-normal residuals, properly evaluating your model and comparing it with other models, etc.
And as a Christmas gift, you can find all the code at the end of the newsletter! For everyone! 🎁
🤖 Tech Round-Up
This week's TechRoundUp comes full of AI news. From Tesla’s new toy to Meta's AI art leap, the future is zooming towards us! 🚀
1️⃣ 𝗚𝗼𝗼𝗴𝗹𝗲'𝘀 𝗩𝗶𝗱𝗲𝗼𝗣𝗼𝗲𝘁: 𝗔 𝗹𝗲𝗮𝗽 𝗶𝗻 𝗔𝗜!
🤖 Google introduces VideoPoet, a LLM that creates zero-shot videos.
Imagine AI crafting unique videos without prior training!
2️⃣ 𝗦𝗽𝗼𝘁𝗶𝗳𝘆 𝘁𝗲𝘀𝘁𝘀 𝗔𝗜 𝗳𝗼𝗿 𝗺𝘂𝘀𝗶𝗰 𝗺𝗮𝗴𝗶𝗰! 🎵
Spotify's new AI-powered feature creates playlists based on your prompts.
Get ready for a more personalized music experience!
3️⃣ 𝗠𝗶𝘀𝘁𝗿𝗮𝗹 𝗔𝗜'𝘀 𝗯𝗼𝗹𝗱 𝗺𝗼𝘃𝗲! 🌐
Mistral AI aims for open-source supremacy with its new language model.
A big step towards accessible AI for everyone!
4️⃣ 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁'𝘀 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝘁𝘂𝗻𝗲𝘀 𝗶𝗻𝘁𝗼 𝗺𝘂𝘀𝗶𝗰! 🎶
Discover how Microsoft's Copilot integrates with Suno for music creation.
AI isn't just for code; it's for creativity too!
5️⃣ 𝗕𝗶𝗱𝗲𝗻 𝗔𝗱𝗺𝗶𝗻𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻'𝘀 𝗔𝗜 𝘀𝘁𝗲𝗽! 📜
First steps toward key AI standards by the Biden administration.
This move could shape the future of AI regulation and innovation!
🎓Learn Advanced Machine Learning Concepts!*
Have you outgrown introductory courses? Ready for a deeper dive?
Explore feature engineering and feature selection methods
Discover tactics for optimizing hyperparameters and addressing imbalanced data
Master fundamental machine learning methods and their Python application
Enroll today and take the next step in mastering the world of data science!
*Sponsored: by purchasing any of their courses you would also be supporting MLPills.
🪐Get the code!
And as promised, here you have the code! For free for everyone!
Merry Christmas! 🎅














