💊 Pill of the week
Feature engineering is crucial for making time series models better at predicting outcomes. It is all about extracting important information from time-based data. Time series data naturally have patterns and connections, so creating effective features is important for accurate predictions. There are multiple ways of generating these features and each of them is used to capture specific aspects of data. Today we will share the most common ones.
Are you already familiar with this topic and you just want to practice? Check the notebook at the end of this newsletter 👇👇
Date/Time-Related Features: These features are derived from the date-time value of each observation. They can include the year, month, day, hour, minute, second, day of the week, and whether the day is a weekend or a holiday.
These features can be useful when there is a trend or seasonality in your data that corresponds to these periods.
For example, retail sales might increase during weekends or holidays or website traffic might peak during certain hours.
Lag Features: Lag features are values at prior time steps. They can help capture the temporal dependencies in the data.
These features can be useful when past values of a series are useful for predicting its future values.
For example, in stock price prediction, the price of a stock in the past few days can be useful for predicting its price tomorrow.
Rolling Window Features: These features are statistical measures like mean, median, standard deviation, etc., over a sliding or rolling window of time periods.
These features can be useful when you want to capture local trends and patterns in your data.
For example, a rolling mean of temperature readings can smooth out daily fluctuations and highlight longer-term trends. Also, it can show the sales inertia of the period of interest.
Expanding Window Features: These are similar to rolling window features but the window size increases with time.
These features can be useful when you want to capture all the past information up to the current point.
For example, the expanding mean of a stock’s price can give you its average price since the beginning of the time series, i.e. the historic price average.
Domain-Specific Features: These are features that are specific to the problem at hand.
These features can be useful when you have domain knowledge that can help you create informative features.
For example, in a stock price prediction problem, features like the company’s earnings, the sector’s performance, etc., can be used.
Time Since an Event: This feature measures the time that has passed since a particular event occurred.
This can be useful in scenarios where the occurrence of an event significantly impacts the time series data.
For example, in predicting website traffic, an event could be a marketing campaign, and the feature could be the time since the campaign started.
Autoregressive Features: These are based on the idea that past values have an influence on current values. An autoregressive feature of order
p
would use the lastp
values. This is similar to lag features but instead of using the raw values, we use the values predicted by an autoregressive model.These features can be useful when modeling time series data with dependencies on its own past, such as predicting stock prices where historical trends impact future values.
For example, when predicting stock prices, an autoregressive feature of order 3 would consider the last three days' predicted values to capture short-term trends.
Difference Features: These features represent the difference between consecutive values in the time series. Differences can highlight trends or abrupt changes in the data.
These features can be useful when identifying and analyzing patterns that emerge as changes or trends between consecutive observations.
For example, in weather forecasting, difference features might highlight sudden temperature changes, aiding in the prediction of weather patterns.
Exponential Moving Averages: Similar to moving averages, exponential moving averages give more weight to recent observations.
These features can be useful when modelling data with evolving patterns or trends that change more rapidly over time.
For example, in predicting user engagement on a website, exponential moving averages can give more weight to recent user activity, providing insights into current trends.
Seasonal Features: Seasonal features capture recurring patterns in the data related to specific seasons or periods. You can create binary features indicating whether the observation falls within a particular season or period.
These features can be useful when predicting phenomena influenced by recurring cycles, like sales patterns associated with holidays or special events.
For example, in retail sales forecasting, seasonal features could help account for increased sales during holiday seasons, improving the accuracy of sales predictions.
Cyclical Features: Some time series data exhibits cyclical patterns that may not align with standard date-related features. Creating cyclical features can help capture such patterns.
These features can be useful when modelling time series data with recurring but non-linear patterns, enabling the model to better capture and understand cyclic variations.
For example, in predicting electricity consumption throughout the day, cyclical features involving sine and cosine transformations of the hour could account for the daily cycle, helping the model adapt to the fluctuating demand patterns.
Do you want to put all this into practice? Check the notebook at the end of the newsletter 👇👇
🤖 Tech Round-Up
No time to check the news this week?
This week's TechRoundUp comes full of AI news. From AI in literature to robotics in manufacturing, the future is zooming towards us! 🚀
Let's dive into the latest Tech highlights you shouldn’t miss this week 💥
1️⃣ 𝗖𝗵𝗮𝘁𝗚𝗣𝗧'𝘀 𝗟𝗶𝘁𝗲𝗿𝗮𝗿𝘆 𝗟𝗲𝗮𝗽 📚
Japan's literary world embraces AI!
A renowned author openly uses ChatGPT for writing assistance.
She says it's not just a tool; it's a creative partner! 🤖✍️
2️⃣ 𝗦𝗮𝗺𝘀𝘂𝗻𝗴 𝗦24: 𝗔𝗜-𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗱 𝗘𝗹𝗲𝗴𝗮𝗻𝗰𝗲 📱
The upcoming Galaxy S24 is all set to dazzle with Google Gemini's AI power.
Expect smarter features and seamless experiences!
Will it be the first phone to have native LLM? 🤔
3️⃣ 𝗣𝗶𝗻𝗲𝗰𝗼𝗻𝗲'𝘀 𝗣𝗶𝗼𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗣𝗮𝘁𝗵 🌲
Pinecone revamps its vector database with a serverless architecture, promising more efficient data management and scalability.
A game-changer in data processing!
4️⃣ 𝗕𝗠𝗪'𝘀 𝗥𝗼𝗯𝗼𝘁𝗶𝗰 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻 🤖🚗
BMW's South Carolina plant gears up for a futuristic makeover with Figure's humanoid robots.
Say hello to a new era of automated efficiency!
5️⃣ 𝗗𝗲𝗲𝗽𝗠𝗶𝗻𝗱'𝘀 𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝗶𝗰 𝗚𝗲𝗻𝗶𝘂𝘀 🔢
DeepMind's AlphaGeometry AI cracks complex geometry problems, rivaling Olympiad champs.
A breakthrough in AI problem-solving and mathematical reasoning!
Learn Advanced Machine Learning Concepts!*
Have you outgrown introductory courses? Ready for a deeper dive?
Explore feature engineering and feature selection methods
Discover tactics for optimizing hyperparameters and addressing imbalanced data
Master fundamental machine learning methods and their Python application
Enroll today and take the next step in mastering the world of data science!
*Sponsored: by purchasing any of their courses you would also be supporting MLPills.
🛠️ Do It Yourself!
Time for you to play with the code!
I share with you the notebook with almost everything you need. Your task is to make it work and get the results I shared in this newsletter. I provide you with some hints. The best way of learning is by checking the documentation.
Contact me at david@mlpills.dev if you need any help!
Keep reading with a 7-day free trial
Subscribe to Machine Learning Pills to keep reading this post and get 7 days of free access to the full post archives.