<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Machine Learning Pills: DIY]]></title><description><![CDATA[The main challenge when learning something new? Time. You can understand theory from books, podcasts, and videos, but without practice, true learning is limited. Finding time for projects is tough. MLPills DIY has a solution: a weekly step-by-step approach to completing a Data Science project. One week to grasp and achieve each step.]]></description><link>https://mlpills.substack.com/s/diy</link><image><url>https://substackcdn.com/image/fetch/$s_!yCAU!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png</url><title>Machine Learning Pills: DIY</title><link>https://mlpills.substack.com/s/diy</link></image><generator>Substack</generator><lastBuildDate>Sat, 18 Apr 2026 13:04:14 GMT</lastBuildDate><atom:link href="https://mlpills.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[MLPills]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mlpills@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mlpills@substack.com]]></itunes:email><itunes:name><![CDATA[David Andrés]]></itunes:name></itunes:owner><itunes:author><![CDATA[David Andrés]]></itunes:author><googleplay:owner><![CDATA[mlpills@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mlpills@substack.com]]></googleplay:email><googleplay:author><![CDATA[David Andrés]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[DIY #21 - Step-by-Step Guide to Time Series Forecasting with RNN]]></title><description><![CDATA[In the last article, we introduced the theory behind Recurrent Neural Networks. This time, we will use a hands-on example to illustrate the process of training a Vanilla RNN to forecast time series data.]]></description><link>https://mlpills.substack.com/p/diy-21-step-by-step-guide-to-time</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-21-step-by-step-guide-to-time</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 21 Mar 2026 08:01:11 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d1414372-4fe9-4a4a-a6a7-8bde88f28949_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the week</h1><p>In the <a href="https://mlpills.substack.com/p/introduction-to-rnns">last article</a>, we introduced the theory behind Recurrent Neural Networks. This time, we will use a hands-on example to illustrate the process of training a Vanilla RNN to forecast time series data.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;198f0759-2e00-4b36-987b-e68d62d03004&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #124 - Introduction to RNNs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Senior AI Engineer / Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c3d8652-f999-47bb-a87d-c06608a921f1_1984x1984.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-15T08:31:08.748Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5d6d9b4-2a50-45e4-84e9-edce63da96fd_2752x1536.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/introduction-to-rnns&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:190936501,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!yCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>To ensure our model actually learns a pattern (rather than just guessing yesterday&#8217;s value), we are going to use a cyclical dataset. We will generate a synthetic sine wave with a bit of random noise. This mimics real-world seasonal data and is the perfect playground for an RNN.</p><blockquote><p><strong>&#128142;Full code at the end!&#128142;</strong></p></blockquote><div><hr></div><h1>&#128736;&#65039; DIY: Build your Vanilla RNN to forecast Time Series data</h1><p><em>Theory is essential, but code is where the magic happens. To build our forecasting model, we&#8217;ll use <strong>Keras</strong> for the neural network architecture and <strong>Scikit-Learn</strong> to handle the data preprocessing. This combination allows us to focus on the RNN logic without getting bogged down in manual matrix multiplication.</em></p><p>Let&#8217;s start by importing our essential Data Science libraries:</p><pre><code><code>import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error</code></code></pre><h2>1. Generating and Preparing the Data</h2><p>Instead of downloading an external dataset, we will generate our own noisy wave. This guarantees you can reproduce these exact results.</p><pre><code><code># Generate 1000 data points of a sine wave with some random noise
time_steps = np.arange(0, 100, 0.1)
data = np.sin(time_steps) + np.random.normal(scale=0.1, size=len(time_steps))

# Put it into a pandas DataFrame
df = pd.DataFrame(data, columns=['Value'])

# Plot the first 200 points to see our pattern
plt.figure(figsize=(10, 4))
plt.plot(df['Value'][:200])
plt.title("Noisy Cyclical Data")
plt.show()</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mw4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mw4m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 424w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 848w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 1272w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mw4m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png" width="1012" height="393" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:393,&quot;width&quot;:1012,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mw4m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 424w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 848w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 1272w, https://substackcdn.com/image/fetch/$s_!Mw4m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe15b527-dff3-4b25-80ce-426f9f8e7f31_1012x393.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This plot helps verify that our data generation worked. You'll see a clear, repeating wave pattern, but with jagged, irregular edges caused by the <code>np.random.normal</code> noise we injected.</p><h3>Scaling the Data</h3><p>Neural Networks are highly sensitive to the scale of the input data. Since we will be using the <code>tanh</code> activation function (which outputs values between -1 and 1), it is best practice to scale our data. Here, we will scale everything to be strictly between 0 and 1.</p><pre><code><code># Initialize the scaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df)</code></code></pre><div><hr></div><h1>&#127891; Agentic AI for Production</h1><p>Most developers hit the exact same wall with Agentic AI: it works beautifully on their local machine, and then completely falls apart in production.</p><p>If you&#8217;re ready to escape &#8220;demo purgatory,&#8221; check out <strong><a href="https://academy.towardsai.net/courses/agent-engineering?ref=3b122f">Your Path to Agentic AI for Production</a>*</strong> by <a href="https://academy.towardsai.net/?ref=3b122f">Towards AI</a> and Paul Iusztin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://academy.towardsai.net/courses/agent-engineering?ref=3b122f" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nR1L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nR1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png" width="604" height="337.260989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:604,&quot;bytes&quot;:6652521,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://academy.towardsai.net/courses/agent-engineering?ref=3b122f&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/189475896?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!nR1L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!nR1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62428571-3a90-41bf-8de6-44d8d7088e0a_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This isn&#8217;t a passive, follow-along tutorial. It&#8217;s a hands-on sprint to teach you the system design fundamentals that outlast today&#8217;s trending frameworks. You will leave with two fully deployed systems for your portfolio:</p><ul><li><p><strong>An Autonomous Research Agent:</strong> Master multi-source data collection, ReAct reasoning loops, and tool orchestration (using Gemini, Perplexity, and Firecrawl).</p></li><li><p><strong>A Multi-Modal Writing Workflow:</strong> Implement evaluator-optimizer patterns and LangGraph orchestration to turn research into publication-ready content.</p></li></ul><p><strong>It includes:</strong></p><ul><li><p>Lifetime access</p></li><li><p>Active engineering Discord</p></li><li><p>Live introductory calls.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://academy.towardsai.net/courses/agent-engineering?ref=3b122f&quot;,&quot;text&quot;:&quot;Enroll Now!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://academy.towardsai.net/courses/agent-engineering?ref=3b122f"><span>Enroll Now!</span></a></p><p><em>*by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h2>2. Reframing as a Supervised Learning Problem</h2><p>RNNs cannot just ingest a long list of numbers. We need to create a &#8220;sliding window&#8221; of data. We will use a sequence of previous steps (e.g., the last 20 points) to predict the very next step.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hIlA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hIlA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hIlA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6337965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/191470445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hIlA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!hIlA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bef9b0-6525-4110-bc63-5c9c3ba72d0c_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><pre><code><code># Initialize empty lists
X = []
y = []

# Set the number of historical steps the model will look at
timesteps = 20

# Iterate to populate the lists
for i in range(len(scaled_data) - timesteps):
    X.append(scaled_data[i : i + timesteps, 0])
    y.append(scaled_data[i + timesteps, 0])

# Convert lists to numpy arrays
X = np.array(X)
y = np.array(y)</code></code></pre><p>Neural Networks employ a unique input structure. Specifically, for RNNs, the dimensions of the input tensor must be exactly three-dimensional: <code>[number of samples, number of timesteps, number of features]</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N_2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N_2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N_2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1456547,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/191470445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N_2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!N_2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4217f03-95a5-42f6-acc1-1406182361d9_1376x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Currently, our <code>X</code> array is two-dimensional. We only have one feature (the value itself), so we need to reshape our array to explicitly include that feature dimension.</p><pre><code><code># Reshape X to [samples, timesteps, features]
X = X.reshape(X.shape[0], X.shape[1], 1)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")</code></code></pre><p><em>Output: </em></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;595a6df4-0e65-40b1-97a3-d42e2b300c5a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">X shape: (980, 20, 1), y shape: (980,)</code></pre></div><p>We started with 1000 data points. Because our sliding window requires 20 historical points to make its first prediction, we "lose" the first 20 steps, leaving us with <code>980</code> usable samples. The <code>20</code> represents the historical timesteps our model looks at for each sample. The <code>1</code> is our single feature (the value of the sine wave). The <code>y</code> array naturally has 980 corresponding target values.</p><h2>3. Train/Test Split</h2><p>Let&#8217;s select 80% of the data for training and the remaining 20% to test our model&#8217;s performance. <strong>Crucial note:</strong> Because this is time-series data, we must split it sequentially. We cannot randomly shuffle the data, or we will leak the future into the past!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V913!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V913!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!V913!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!V913!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!V913!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V913!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6248485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/191470445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V913!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!V913!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!V913!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!V913!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f11582-7c81-46ff-b130-342d3c1499d3_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><pre><code><code>split_idx = int(X.shape[0] * 0.8)

X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]</code></code></pre><h2>4. Building the Vanilla RNN Model</h2><p>First, import the required Keras modules:</p><pre><code><code>from keras.models import Sequential
from keras.layers import SimpleRNN, Dense</code></code></pre><p>Let&#8217;s build the model. We will use a single <code>SimpleRNN</code> layer followed by a <code>Dense</code> layer to output our single prediction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AEZ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AEZ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AEZ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6045331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/191470445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AEZ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!AEZ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F311e124a-0ff1-4ba3-ab41-ca39aa22b858_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Units:</strong> We&#8217;ll use 16 neurons. Since our dataset has a clear pattern, we don&#8217;t need a massive network.</p></li><li><p><strong>Activation:</strong> We use <code>tanh</code>, which is standard for RNNs as it helps mitigate vanishing gradients better than sigmoid.</p></li><li><p><strong>Dropout:</strong> <em>We are intentionally skipping the Dropout layer here.</em> While dropout is great for complex, noisy data (like stocks) to prevent overfitting, applying it to a simple, deterministic pattern often just introduces artificial noise and prevents the model from learning the curve smoothly.</p></li></ul><pre><code><code># Initialize the Sequential model
model = Sequential()

# Add an RNN layer with 16 units
model.add(SimpleRNN(16, 
                    activation='tanh', 
                    input_shape=(X_train.shape[1], 1)))

# Add a Dense layer with 1 unit to output the final prediction
model.add(Dense(1))

# Compile the model using Mean Squared Error and the Adam optimizer
model.compile(loss='mean_squared_error', optimizer='adam')

model.summary()</code></code></pre><p>Output:</p><pre><code>&#9487;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9523;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9523;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9491;
&#9475;<strong> Layer (type)                    </strong>&#9475;<strong> Output Shape           </strong>&#9475;<strong>       Param # </strong>&#9475;
&#9505;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9543;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9543;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9473;&#9513;
&#9474; simple_rnn (SimpleRNN)          &#9474; (None, 16)             &#9474;           288 &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474; dense (Dense)                   &#9474; (None, 1)              &#9474;            17 &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></pre><p>The <code>(None, 16)</code> output shape means the layer outputs 16 values (our 16 units), while <code>None</code> is just a placeholder representing a variable batch size. The <code>288</code> parameters in the RNN layer come from three sets of weights: the input-to-hidden weights (1 feature &#215; 16 units = 16), the hidden-to-hidden recurrent weights (16 units &#215; 16 units = 256), and the biases (16). Added together, 16 + 256 + 16 = 288. Finally, the Dense layer has <code>17</code> parameters (16 incoming weights from the RNN + 1 bias).</p><h2>5. Training the Model</h2><p>We will train the model for 30 epochs. We will also use a <code>validation_split</code> to monitor how well the model generalizes to unseen data during training.</p><pre><code><code># Fit the model
history = model.fit(X_train, y_train, 
                    epochs=30, 
                    batch_size=16, 
                    validation_split=0.1, 
                    shuffle=False, # Remember, no shuffling in time series!
                    verbose=1)</code></code></pre><p>Let&#8217;s visualize the training process to ensure our loss decreased smoothly:</p><pre><code><code>plt.figure(figsize=(8, 4))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss Over Time')
plt.legend()
plt.show()</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WKaD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WKaD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 424w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 848w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 1272w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WKaD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png" width="846" height="393" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:393,&quot;width&quot;:846,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WKaD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 424w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 848w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 1272w, https://substackcdn.com/image/fetch/$s_!WKaD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d2a39c3-d3c2-4070-8ee9-bdddb4510daa_846x393.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A line graph with two downward-trending curves. The 'Training Loss' (how well the model fits the training data) and 'Validation Loss' (how well it performs on the 10% hold-out set) should both drop rapidly in the first few epochs and then smoothly level out. Because they track closely together without the validation loss spiking back up, it confirms our model is learning successfully without overfitting.</p><h2>6. Evaluating the Results</h2><p>Now for the moment of truth. Let&#8217;s make predictions on our test set and see if the model successfully learned the cyclical pattern.</p><pre><code><code># Generate predictions
y_pred = model.predict(X_test)

# Inverse transform to get the values back to their original scale
y_test_scaled_back = scaler.inverse_transform(y_test.reshape(-1, 1))
y_pred_scaled_back = scaler.inverse_transform(y_pred)

# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_test_scaled_back, y_pred_scaled_back))
print(f"Test RMSE: {rmse:.4f}")</code></code></pre><p>Output:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b99f6936-2df5-4c96-bcd5-0e45092ff8ce&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Test RMSE: 0.1221</code></pre></div><p>RMSE (Root Mean Squared Error) measures the average magnitude of our model's errors. An RMSE of 0.1221 on a dataset that naturally fluctuates between roughly -1.2 and 1.2 is very low. It tells us that our model's predictions are, on average, only off by about 0.12 units from the true values.</p><p>You should see a very low RMSE score. To truly appreciate the results, let&#8217;s plot the actual values versus the model&#8217;s predictions.</p><pre><code><code>plt.figure(figsize=(12, 5))
plt.plot(y_test_scaled_back, label='Actual Data', color='blue')
plt.plot(y_pred_scaled_back, label='RNN Predictions', color='red', linestyle='--')
plt.title('Vanilla RNN: Actual vs Predicted')
plt.legend()
plt.show()</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sdIK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sdIK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 424w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 848w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 1272w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sdIK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png" width="1167" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:1167,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sdIK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 424w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 848w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 1272w, https://substackcdn.com/image/fetch/$s_!sdIK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc837ddb1-2e47-4e97-8a2a-e01005295275_1167x470.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Your plot shows the red dashed line (predictions) hugging the blue line (actual data) almost perfectly. The Vanilla RNN has successfully learned the underlying sequential logic of the dataset! Notice how the model doesn't just blindly copy the previous step; it actively anticipates the curve's peaks and valleys, demonstrating that it has captured the true temporal pattern.</p><div><hr></div><p>As an &#128142;<strong>Extra Issue&#128142;</strong>, next Wednesday we will share more details about this analysis (<strong>only for paid subscribers</strong>).</p><p>Here you have the <strong>full code</strong> as promised:</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-21-step-by-step-guide-to-time">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #20 - Routing LLM Agent with LangChain]]></title><description><![CDATA[Imagine walking into a massive hospital with a generic complaint like &#8220;I hurt&#8221;. You don&#8217;t walk straight into the operating room, and you certainly don&#8217;t ask the neurosurgeon to check your blood pressure. Instead, you stop at the front desk or the Triage Nurse.

The nurse assesses you in seconds: &#8220;Chest pain? Go to the ER immediately.&#8221; &#8220;Sprained ankle? Go to Urgent Care.&#8221; &#8220;Just a check-up? Go to Family Medicine.&#8221;

This triage process saves lives and resources. In the world of AI Agents, this is the Routing pattern.]]></description><link>https://mlpills.substack.com/p/diy-20-routing-llm-agent-with-langchain</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-20-routing-llm-agent-with-langchain</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sun, 15 Feb 2026 08:30:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Wj8i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine walking into a massive hospital with a generic complaint like &#8220;I hurt&#8221;. You don&#8217;t walk straight into the operating room, and you certainly don&#8217;t ask the neurosurgeon to check your blood pressure. Instead, you stop at the front desk or the Triage Nurse.</p><p>The nurse assesses you in seconds: &#8220;Chest pain? Go to the ER immediately.&#8221; &#8220;Sprained ankle? Go to Urgent Care.&#8221; &#8220;Just a check-up? Go to Family Medicine.&#8221;</p><p>This triage process saves lives and resources. In the world of AI Agents, this is the <strong>Routing</strong> pattern.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PG8u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PG8u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PG8u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6406870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/187874095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PG8u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PG8u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc5a9be-4be0-47bc-8485-cef81479cd2b_2752x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this issue we will:</p><ul><li><p>Define the <strong>Routing</strong> pattern and why &#8220;one prompt fits all&#8221; is a myth.</p></li><li><p>Explore how to use <strong>Structured Outputs</strong> to make reliable decisions.</p></li><li><p>Build a &#8220;Customer Support Triage&#8221; agent that routes queries to billing, technical support, or sales specialists. &#128142; <strong>With all the code + notebook!</strong> &#128142;</p></li></ul><p><em>Let&#8217;s begin!</em></p><div><hr></div><h1>&#128138; Pill of the week</h1><p>This is the essence of the <strong>Routing</strong> pattern. In simple applications, we often stuff a single prompt with instructions for every possible scenario: <em>&#8220;If the user asks about X, do Y. If they ask about A, do B...&#8221;</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wj8i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wj8i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wj8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6370242,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/187874095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wj8i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Wj8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1c97414-62e7-4b92-87a2-62800b00a861_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As your application grows, this &#8220;Mega-Prompt&#8221; becomes a nightmare. It becomes slow, expensive, and confusing for the LLM.</p><p><strong>Routing</strong> solves this by introducing a preliminary step. The LLM acts as a traffic controller. It classifies the input intent and then directs the flow to a <strong>specialized</strong> handler (which could be another prompt, a tool, or even a completely different model).</p><p>In <a href="https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent">previous articles</a>, we covered the <strong><a href="https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent">Orchestrator-Worker</a></strong>: Break a complex task into subtasks and do them <em>all</em> (e.g., &#8220;Research, then Write, then Format&#8221;).</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;840067a7-96a8-4f6d-9a4c-44daf6c5457a&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DIY #18 - Orchestrator-Worker LLM Agent with LangChain&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-23T08:01:28.721Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afa1f589-acc5-4f3e-9f02-2b4dcbb60498_2048x2080.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent&quot;,&quot;section_name&quot;:&quot;DIY&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179157853,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!yCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Routing</strong> is different. It is about <strong>Selection</strong>. It is &#8220;Either/Or&#8221;, not &#8220;And/Then&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tW01!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tW01!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!tW01!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!tW01!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!tW01!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tW01!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6216851,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/187874095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tW01!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!tW01!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!tW01!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!tW01!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e68e5c-a3c1-4114-a993-3a5ef8374f71_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Why is Routing so Effective?</h4><ol><li><p><strong>Specialization:</strong> You can have a prompt specifically tuned for &#8220;Writing SQL&#8221; and another for &#8220;Writing Poems.&#8221; They don&#8217;t need to know about each other.</p></li><li><p><strong>Cost &amp; Speed:</strong> You can use a small, fast model (like GPT-4o-mini or a local Llama model) to do the routing, and only call the expensive &#8220;Smart&#8221; model when the task actually requires it.</p></li><li><p><strong>Safety:</strong> You can route &#8220;sensitive&#8221; topics to a hard-coded refusal response without ever engaging the generative engine.</p></li></ol><p>There are other patterns that we&#8217;ve already covered, here you can get a summary of them:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6d58e5f3-b54d-448b-a282-11d7a776819e&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #110 - LLM Workflow Patterns&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-25T07:31:26.656Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-110-llm-workflow-patterns&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:176993635,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!yCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h1>&#128214; Book of the Week</h1><p>I highly recommend checking out the new release, <em><strong>AI-Native LLM Security: Threats, Defenses, and Best Practices for Building Safe and Trustworthy AI</strong></em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NXTO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NXTO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 424w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 848w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 1272w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NXTO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png" width="616" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:616,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:279980,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/187874095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NXTO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 424w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 848w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 1272w, https://substackcdn.com/image/fetch/$s_!NXTO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03e5e89e-db7a-4558-b785-c7288c2b24a0_616x765.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What makes this book stand out is the pedigree of its authors&#8212;<strong>Vaibhav Malik, Ken Huang, and Ads Dawson</strong> are pioneers involved in the <strong>OWASP Top 10 for LLM applications</strong>. They aren&#8217;t just theorizing; they are the ones defining the standards.</p><p><strong>What You&#8217;ll Gain:</strong> Rather than offering vague advice on &#8220;being careful,&#8221; this guide provides a technical blueprint for:</p><ul><li><p><strong>Secure-by-Design Architecture:</strong> Strategies for isolation and access control that are baked in from day one, not bolted on later.</p></li><li><p><strong>Operationalizing Security (MLSecOps):</strong> How to integrate security controls directly into your CI/CD and MLOps pipelines.</p></li><li><p><strong>Framework Integration:</strong> Practical ways to leverage established taxonomies from <strong>OWASP, NIST, and MITRE</strong> to identify vulnerabilities.</p></li><li><p><strong>The Full Lifecycle:</strong> Mitigating risk from data curation all the way to incident response in live deployments.</p></li></ul><p>Whether you are navigating the complex legal landscape of AI or trying to prevent prompt injections, this is the comprehensive manual the industry has been waiting for.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.packtpub.com/en-us/product/ai-native-llm-security-9781836203742?srsltid=AfmBOopVSJUG9MlCh4K67W4KLCFVku0ACosmAaV1BTF791he6AecJIr2&quot;,&quot;text&quot;:&quot;Get the book&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.packtpub.com/en-us/product/ai-native-llm-security-9781836203742?srsltid=AfmBOopVSJUG9MlCh4K67W4KLCFVku0ACosmAaV1BTF791he6AecJIr2"><span>Get the book</span></a></p><div><hr></div><h1>&#128736;&#65039; DIY: The Customer Support Triage Bot</h1><p>Imagine you are building a support bot for a SaaS company. You receive three types of emails:</p><ol><li><p><strong>Technical:</strong> &#8220;I can&#8217;t log in,&#8221; &#8220;The API is throwing a 500 error.&#8221;</p></li><li><p><strong>Billing:</strong> &#8220;I want a refund,&#8221; &#8220;Update my credit card.&#8221;</p></li><li><p><strong>General:</strong> &#8220;How are you?&#8221;, &#8220;What is your pricing?&#8221;</p></li></ol><p>We don&#8217;t want our expensive &#8220;Technical Expert&#8221; model wasting tokens processing a refund request. We want to <strong>Route</strong> them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lnG1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lnG1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lnG1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6659423,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/187874095?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lnG1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!lnG1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a91816-230c-4509-8722-b513e4c31c64_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>The Challenge</h4><p>We will build a workflow that:</p><ol><li><p><strong>Classifies</strong> an incoming user query into a distinct category.</p></li><li><p><strong>Selects</strong> the appropriate &#8220;Expert Chain&#8221; based on that category.</p></li><li><p><strong>Executes</strong> only the selected chain.</p></li></ol><h4>Setting the Stage</h4><p>We will use LangChain&#8217;s structured output capabilities (Pydantic) to ensure our Router makes a firm decision, not a fuzzy text response.</p><pre><code><code># Install necessary libraries
# !pip install langchain-openai langchain pydantic

import os
from typing import Literal
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field

# Set up your OpenAI API key
# os.environ["OPENAI_API_KEY"] = "your_api_key"

# Initialize our LLMs
# The router can be a faster model. The experts might be smarter models.
router_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) # Fast &amp; Deterministic
expert_llm = ChatOpenAI(model="gpt-4.1", temperature=0.7)   # Creative &amp; Capable
</code></code></pre><h4>Step 1: Building the Router</h4><p>The Router&#8217;s only job is to output a <strong>Category</strong>. We don&#8217;t want it to answer the question yet.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-20-routing-llm-agent-with-langchain">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #19 - Evaluator-Optimiser LLM Agent with LangChain]]></title><description><![CDATA[Imagine you are writing a critical novel. You don&#8217;t just type &#8220;The End&#8221; after the first draft and send it to the printer. Instead, you write a draft, hand it to a ruthless editor who covers it in red ink (&#8221;This character is weak,&#8221; &#8220;The pacing is slow here&#8221;), and then you rewrite it. You repeat this cycle until the manuscript is polished.]]></description><link>https://mlpills.substack.com/p/diy-19-evaluator-optimiser-llm-agent</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-19-evaluator-optimiser-llm-agent</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Fri, 23 Jan 2026 22:30:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f93846c6-6aa9-47a2-84eb-4e8c455ad222_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Imagine you are writing a critical novel. You don&#8217;t just type &#8220;The End&#8221; after the first draft and send it to the printer. Instead, you write a draft, hand it to a ruthless editor who covers it in red ink (&#8221;This character is weak,&#8221; &#8220;The pacing is slow here&#8221;), and then you rewrite it. You repeat this cycle until the manuscript is polished.</em></p><p>In this issue we will:</p><ul><li><p>Define the &#8220;Evaluator-Optimiser&#8221; pattern and its feedback loops.</p></li><li><p>Explore how to implement a feedback cycle using LangChain.</p></li><li><p>Build a step-by-step example that refines Python code until it is bug-free. &#128142; <strong>With all the code + notebook!</strong> &#128142;</p></li></ul><p><em>Let&#8217;s begin!</em></p><div><hr></div><h1>&#128138; Pill of the week</h1><p>This is the essence of the <strong>Evaluator&#8211;Optimiser</strong> pattern. Most LLM interactions are &#8220;one-shot&#8221;&#8212;you ask, and it answers. But for high-stakes tasks, the first answer is rarely the best one.</p><p>In this <a href="https://mlpills.substack.com/p/issue-110-llm-workflow-patterns?utm_source=publication-search">workflow</a>, we separate the &#8220;creative&#8221; process from the &#8220;critical&#8221; process. One LLM acts as the <strong>Generator</strong> (the writer), producing an initial attempt. A second LLM acts as the <strong>Evaluator</strong> (the editor), scoring the output against strict criteria and providing feedback. If the output isn&#8217;t good enough, the feedback is fed back into the Generator to improve the next version.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PuYG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PuYG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PuYG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6514574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/185411449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PuYG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PuYG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c287ea9-f343-458c-8894-c7f245c27804_2752x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In <a href="https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent?utm_source=publication-search">a previous article</a>, we covered the <strong>Orchestrator-Worker</strong> pattern, where work is split horizontally across different specialists:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;04834e10-b80f-4da6-ac21-2654a1dfd405&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DIY #18 - Orchestrator-Worker LLM Agent with LangChain&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-23T08:01:28.721Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afa1f589-acc5-4f3e-9f02-2b4dcbb60498_2048x2080.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent&quot;,&quot;section_name&quot;:&quot;DIY&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179157853,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!yCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8fd2f64f-0fdc-4130-b4e0-9534306279e4&quot;,&quot;caption&quot;:&quot;&#10024;Extra Pill of the Week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Extra #1 - Orchestrator-Worker Pattern in LangGraph (Parallel Execution)&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-26T12:03:21.919Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/548f2dc0-fbf2-4cf7-b5e0-95b58782e02e_2746x2080.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/extra-1-orchestrator-worker-pattern&quot;,&quot;section_name&quot;:&quot;Extra&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179664251,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!yCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8efe1d-e165-4098-9fcc-b465f7286f50_1063x1063.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>The key difference? <strong>Orchestrator-Worker</strong> is about <strong>breadth</strong> (coordinating many different subtasks). <strong>Evaluator-Optimiser</strong> is about <strong>depth</strong> (iterating on a single task until quality is maximized).</p><h3>Why is Evaluator-Optimiser so Effective?</h3><ul><li><p><strong>Self-Correction:</strong> LLMs often hallucinate or make small logic errors. An explicit evaluation step catches these before they reach the user.</p></li><li><p><strong>Separation of Concerns:</strong> It is hard for a single prompt to be both &#8220;creative&#8221; and &#8220;critical&#8221; at the same time. This pattern lets one model focus on generating ideas and another on strict quality control.</p></li><li><p><strong>Higher Quality Ceiling:</strong> By allowing multiple attempts, you move beyond the &#8220;average&#8221; probabilistic response to a highly refined output.</p></li><li><p><strong>Alignment:</strong> You can enforce specific business rules (e.g., &#8220;Must be under 280 chars&#8221;, &#8220;Must use JSON format&#8221;) more reliably by rejecting outputs that fail these checks.</p></li></ul><p>This pattern is perfect for code generation, legal document drafting, or complex math problems where &#8220;almost right&#8221; is effectively &#8220;wrong.&#8221;</p><h3>Under the Hood</h3><p>The Evaluator-Optimiser pattern involves a circular flow:</p><ol><li><p><strong>The Generator:</strong> Produces the initial response or a revised response based on feedback.</p></li><li><p><strong>The Evaluator:</strong> analyzes the response against acceptance criteria. It outputs a decision (PASS/FAIL) and structured feedback.</p></li><li><p><strong>The Loop:</strong> If the result is a PASS (or max attempts reached), we exit. If FAIL, the feedback flows back to the Generator.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5JKz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5JKz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 424w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 848w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 1272w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5JKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png" width="1307" height="285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:1307,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77545,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5JKz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 424w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 848w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 1272w, https://substackcdn.com/image/fetch/$s_!5JKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17b7d6bc-200f-4200-bba1-b854dccf7979_1307x285.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Let&#8217;s see this in action!</p><div><hr></div><h1>&#128214; Book of the Week</h1><p><strong>The AI Optimization Playbook</strong> by Dr. Chun Schiros, Supreet Kaur, Rajdeep Arora and Dr. Usha Jagannathan is a clear, execution-focused guide for leaders who want AI to deliver real business outcomes rather than endless pilots.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/Optimization-Playbook-strategies-responsible-innovation-ebook/dp/B0G4MYGSVW#detailBullets_feature_div" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xdoe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 424w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 848w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 1272w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xdoe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png" width="427" height="511.96795952782463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:711,&quot;width&quot;:593,&quot;resizeWidth&quot;:427,&quot;bytes&quot;:250417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.amazon.com/Optimization-Playbook-strategies-responsible-innovation-ebook/dp/B0G4MYGSVW#detailBullets_feature_div&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/185411449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xdoe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 424w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 848w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 1272w, https://substackcdn.com/image/fetch/$s_!xdoe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af14955-64c9-4538-92d7-b0f3f9e88fa7_593x711.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What will you learn?</strong><br>How to design AI strategies that are explicitly aligned with business goals and ROI<br>How to select, prioritise, and justify AI initiatives in an enterprise context<br>How to move from proofs of concept to production using MLOps and LLMOps practices<br>How to measure success beyond model accuracy, including adoption, experimentation, and causal impact<br>How to operationalise generative AI and AI agents responsibly at scale<br>How to embed explainability, fairness, and regulatory compliance into AI systems from day one</p><p><strong>Why should you read it?</strong><br>It treats AI as an organisational capability, not just a technical one<br>The focus is on decision-making, governance, and scale, which is where most AI efforts struggle<br>Responsible AI is positioned as a strategic enabler, not a box-ticking exercise<br>Real-world case studies help bridge the gap between executives, product leaders, and technical teams<br>It avoids hype and tool obsession, favouring durable frameworks that hold up over time</p><p>If you are responsible for turning AI investment into measurable impact across an organisation, this book is a practical and credible companion. Highly recommended for CTOs, CIOs, CDAOs, and senior AI leaders who want results, not demos.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/Optimization-Playbook-strategies-responsible-innovation-ebook/dp/B0G4MYGSVW#detailBullets_feature_div&quot;,&quot;text&quot;:&quot;Get it here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/Optimization-Playbook-strategies-responsible-innovation-ebook/dp/B0G4MYGSVW#detailBullets_feature_div"><span>Get it here</span></a></p><div><hr></div><h1>&#128736;&#65039; DIY: The Self-Healing Code Generator</h1><p>Imagine you are building an AI coding assistant. You don&#8217;t want it to just generate code that <em>looks</em> right; you want code that is actually correct and follows best practices.</p><h4>The Challenge</h4><p>We will build a workflow that:</p><ol><li><p>Takes a user request for a Python function.</p></li><li><p><strong>Generates</strong> a candidate solution.</p></li><li><p><strong>Evaluates</strong> the code (simulating a &#8220;senior dev&#8221; review) for correctness, efficiency, and style.</p></li><li><p><strong>Optimizes</strong> the code based on the review until it passes.</p></li></ol><p>As an example, we&#8217;ll ask it to: <strong>&#8220;Write a Python function to check if two strings are valid anagrams.&#8221;</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZGYn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZGYn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZGYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6173012,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/185411449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZGYn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!ZGYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d1122a-ab0b-4452-8490-3a5f191273e5_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Setting the Stage</h4><p>We will use LangChain to manage our prompts and Pydantic to structure the critical &#8220;Evaluation&#8221; step.</p><div class="paywall-jump" data-component-name="PaywallToDOM"></div><pre><code><code># Install necessary libraries if you haven't already
# !pip install langchain-openai langchain pydantic

import os
from typing import List, Optional

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from pydantic import BaseModel, Field

# Set up your OpenAI API key
# os.environ["OPENAI_API_KEY"] = "your_api_key"

# Initialize our LLMs
# We use a higher temperature for generation (creativity) 
# and a low temperature for evaluation (strictness).
generator_llm = ChatOpenAI(model="gpt-4.1", temperature=0.7)
evaluator_llm = ChatOpenAI(model="gpt-4.1", temperature=0.0)
</code></code></pre><p>This section initializes the tools we need.</p><ul><li><p><strong>Imports:</strong> We import <code>ChatOpenAI</code> (to talk to the models), <code>BaseModel</code>/<code>Field</code> (to define strict data structures), and output parsers (to turn raw AI text into usable code objects).</p></li><li><p><strong>LLM Initialization (Crucial Detail):</strong></p><ul><li><p><code>generator_llm</code> uses <code>temperature=0.7</code>. This adds &#8220;creativity&#8221; or randomness, helping the model try different approaches if it gets stuck.</p></li><li><p><code>evaluator_llm</code> uses <code>temperature=0.0</code>. This makes the model deterministic and strict. We want the &#8220;critic&#8221; to be consistent, not creative.</p></li></ul></li></ul><h4>Step 1: Building the Generator</h4><p>The generator needs to handle two states: creating a fresh draft, and refining a draft based on feedback.</p><pre><code><code>generator_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert Python developer. Your goal is to write clean, efficient, and documented Python code.
    
    If you are provided with FEEDBACK, you must revise your previous code to address the specific issues mentioned.
    Do not explain your thought process in the output, just output the Python code block."""),
    ("user", """Request: {request}
    
    {context_placeholder}""")
])</code></code></pre><pre><code><code>def generator_chain(request, feedback=None, previous_code=None):
    context = ""
    if feedback and previous_code:
        context = f"\nPREVIOUS CODE:\n{previous_code}\n\nFEEDBACK TO ADDRESS:\n{feedback}"
    
    chain = generator_prompt.partial(context_placeholder=context) | generator_llm | StrOutputParser()
    return chain.invoke({"request": request})</code></code></pre><p>This creates the &#8220;Writer&#8221; of our story.</p><ul><li><p><code>generator_prompt</code><strong>:</strong> This tells the LLM it is a Python expert. Crucially, it has a conditional instruction: <em>&#8220;If you are provided with FEEDBACK, you must revise...&#8221;</em> This allows the prompt to work for both the first draft (no feedback) and subsequent revisions.</p></li><li><p><code>generator_chain</code><strong> function:</strong></p><ul><li><p>It accepts <code>feedback</code> and <code>previous_code</code> as optional arguments.</p></li><li><p>If feedback exists, it injects it into the prompt context so the LLM knows <em>what</em> to fix.</p></li><li><p>If no feedback exists (the first run), it just sees the original request.</p></li></ul></li></ul><h4>Step 2: Building the Evaluator</h4><p>The evaluator is the strict &#8220;Senior Engineer.&#8221; It must output a structured decision so our code knows whether to loop or stop.</p><pre><code><code># Define the evaluation structure
class Evaluation(BaseModel):
    decision: str = Field(description="The decision: 'PASS' if the code is perfect, 'NEEDS_IMPROVEMENT' otherwise.")
    feedback: str = Field(description="Specific feedback on what is wrong or how to improve. If PASS, say 'Looks good'.")

evaluator_parser = JsonOutputParser(pydantic_object=Evaluation)

evaluator_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a Senior Software Engineer conducting a code review. 
    Analyze the provided Python code for:
    1. Correctness (Does it solve the problem?)
    2. Efficiency (Is the time/space complexity optimal?)
    3. Style (Does it follow PEP8 and include docstrings?)
    
    Be strict. If the code is inefficient or lacks docstrings, mark it as NEEDS_IMPROVEMENT.
    
    {format_instructions}"""),
    ("user", """Original Request: {request}
    
    Code to Review:
    {code}""")
]).partial(format_instructions=evaluator_parser.get_format_instructions())

evaluator_chain = evaluator_prompt | evaluator_llm | evaluator_parser
</code></code></pre><p>This creates the &#8220;Editor&#8221; or &#8220;Critic.&#8221;</p><ul><li><p><code>class Evaluation(BaseModel)</code><strong>:</strong> This is the most important part of this step. We force the LLM to return data in a specific JSON format containing exactly two fields:</p><ul><li><p><code>decision</code>: A &#8220;PASS&#8221; or &#8220;NEEDS_IMPROVEMENT&#8221; flag. We can use this in our Python <code>if</code> statements later.</p></li><li><p><code>feedback</code>: Text explaining <em>why</em> it failed.</p></li></ul></li><li><p><code>evaluator_prompt</code><strong>:</strong> The system instructions tell the model to be &#8220;strict&#8221; and check for specific criteria (correctness, efficiency, style). This ensures it doesn&#8217;t just say &#8220;Good job!&#8221; to bad code.</p></li></ul><h4>Step 3: The Workflow Loop</h4><p>Now we combine them into a <code>while</code> loop that runs until the evaluator is happy or we run out of attempts.</p><pre><code><code>def evaluator_optimiser_workflow(request: str, max_attempts: int = 3):
    print(f"&#128640; Starting task: {request}\n")
    
    current_code = ""
    feedback = ""
    attempt = 1
    
    while attempt &lt;= max_attempts:
        print(f"--- Attempt {attempt}/{max_attempts} ---")
        
        # 1. GENERATE
        print("&#9889; Generating code...")
        current_code = generator_chain(request, feedback, current_code)
        # Just a helper to print a snippet of the code
        print(f"   (Generated {len(current_code)} chars of code)")
        
        # 2. EVALUATE
        print("&#129488; Evaluating...")
        evaluation = evaluator_chain.invoke({
            "request": request, 
            "code": current_code
        })
        
        decision = evaluation['decision']
        feedback = evaluation['feedback']
        
        print(f"   Decision: {decision}")
        print(f"   Feedback: {feedback}\n")
        
        # 3. CHECK EXIT CONDITION
        if decision == "PASS":
            print("&#9989; Quality Standard Met! Returning final code.")
            return current_code
            
        attempt += 1
        
    print("&#9888;&#65039; Max attempts reached. Returning last version.")
    return current_code
</code></code></pre><p>This is the &#8220;Orchestration&#8221; logic that ties the two agents together.</p><ul><li><p><code>while attempt &lt;= max_attempts</code><strong>:</strong> This creates a safety loop. We don&#8217;t want an infinite loop if the AI never gets it right, so we cap it (e.g., at 3 tries).</p></li><li><p><strong>The Execution Flow:</strong></p><ol><li><p><strong>Generate:</strong> It calls the Generator. On the first loop, <code>feedback</code> is empty.</p></li><li><p><strong>Evaluate:</strong> It takes the code we just made and sends it to the Evaluator.</p></li><li><p><strong>Parse:</strong> It extracts the <code>decision</code> and <code>feedback</code> from the JSON response.</p></li><li><p><strong>Decision Point:</strong></p><ul><li><p><strong>If </strong><code>PASS</code><strong>:</strong> The function breaks the loop immediately and returns the code. The job is done.</p></li><li><p><strong>If </strong><code>NEEDS_IMPROVEMENT</code><strong>:</strong> The loop restarts, but this time, the <code>feedback</code> variable is full. The Generator will see this feedback in the next iteration.</p></li></ul></li></ol></li></ul><h4>Putting It to the Test</h4><p>Let&#8217;s run our &#8220;Anagram Checker&#8221; request. To see the pattern work, I will simulate a difficult request by asking for an <em>optimized</em> solution.</p><pre><code><code>request = "Write a function `is_anagram(s1, s2)` that checks if two strings are anagrams. It must be O(n) time complexity and handle case-insensitivity."

final_code = evaluator_optimiser_workflow(request)

print("=" * 60)
print("FINAL RESULT:")
print(final_code)</code></code></pre><p>This is simply the &#8220;Run&#8221; button.</p><ul><li><p>It defines a complex request (Anagram checker, O(n) complexity).</p></li><li><p>It calls the main workflow function.</p></li><li><p>It prints the final result after the loop has finished (either because it passed or because it ran out of tries).</p></li></ul><p><strong>Possible Output Flow:</strong></p><pre><code><code>&#128640; Starting task: Write a function `is_anagram(s1, s2)`...

--- Attempt 1/3 ---
&#9889; Generating code...
   (Generated 150 chars of code)
&#129488; Evaluating...
   Decision: NEEDS_IMPROVEMENT
   Feedback: The solution uses sorting (sorted(s1) == sorted(s2)), which is O(n log n). The request specifically asked for O(n) time complexity. Please use a hash map or frequency counter array instead. Also, ensure docstrings are added.

--- Attempt 2/3 ---
&#9889; Generating code...
   (Generated 320 chars of code)
&#129488; Evaluating...
   Decision: NEEDS_IMPROVEMENT
   Feedback: Logic is now O(n) using Counter, which is good. However, you forgot to handle case-insensitivity as requested. "Listen" and "silent" should return True.

--- Attempt 3/3 ---
&#9889; Generating code...
   (Generated 340 chars of code)
&#129488; Evaluating...
   Decision: PASS
   Feedback: Looks good. Time complexity is O(n), logic handles case insensitivity correctly, and docstrings are present.

&#9989; Quality Standard Met! Returning final code.

============================================================
FINAL RESULT:
def is_anagram(s1: str, s2: str) -&gt; bool:
    """
    Checks if two strings are anagrams of each other.
    Time Complexity: O(n)
    """
    from collections import Counter
    
    # Normalize strings: remove spaces and convert to lowercase
    s1 = s1.replace(" ", "").lower()
    s2 = s2.replace(" ", "").lower()
    
    if len(s1) != len(s2):
        return False
        
    return Counter(s1) == Counter(s2)
</code></code></pre><p>See what happened? The <strong>Evaluator</strong> caught the complexity issue (Attempt 1) and the functional bug (Attempt 2), forcing the <strong>Generator</strong> to fix them. A standard &#8220;one-shot&#8221; prompt might have settled for the easier <code>sorted()</code> solution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PELT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PELT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PELT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PELT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PELT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PELT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6205114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/185411449?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PELT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!PELT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!PELT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!PELT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b95566-bb6c-450d-930b-421db05f6c98_2752x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here you have a <strong>summary</strong>:</p><blockquote><ol><li><p><strong>User Request</strong> &#8594; <strong>Generator</strong> &#8594; <em>Draft Code</em></p></li><li><p><em>Draft Code</em> + <em>Requirements</em> &#8594; <strong>Evaluator</strong> &#8594; <em>Feedback + Decision</em></p></li><li><p><strong>If Fail:</strong> <em>Feedback</em> &#8594; <strong>Generator</strong> &#8594; <em>New Draft Code</em> (Repeat)</p></li><li><p><strong>If Pass:</strong> Return <em>Final Code</em></p></li></ol></blockquote><p>Here you have all the code:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://colab.research.google.com/drive/1q4Rw7iBsO_3Rc9DFbVL1SMfkty7NuuGo?usp=sharing&quot;,&quot;text&quot;:&quot;Play with the code&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://colab.research.google.com/drive/1q4Rw7iBsO_3Rc9DFbVL1SMfkty7NuuGo?usp=sharing"><span>Play with the code</span></a></p><p></p><h3>When to Reach for Evaluator-Optimiser?</h3><p>The Evaluator-Optimiser pattern is your best choice when:</p><ul><li><p><strong>Quality &gt; Speed:</strong> You can afford the latency of multiple LLM calls to get a superior result.</p></li><li><p><strong>Objective Criteria Exists:</strong> It works best when success is measurable (e.g., code syntax, word count, JSON schema, absence of specific words).</p></li><li><p><strong>The &#8220;First Draft&#8221; is usually flabby:</strong> Tasks like coding, creative writing, or summarizing complex documents often benefit from a &#8220;review&#8221; phase.</p></li></ul><h3>&#9888;&#65039; Important Considerations</h3><ol><li><p><strong>Infinite Loops:</strong> Always set a <code>max_attempts</code> counter. If the generator can&#8217;t satisfy the evaluator, you don&#8217;t want to burn tokens forever.</p></li><li><p><strong>The &#8220;Strictness&#8221; Balance:</strong> If the Evaluator is too strict or vague in its feedback, the Generator might get confused and degrade the output. The feedback prompt is just as important as the generation prompt.</p></li><li><p><strong>Context Window:</strong> Passing the <code>PREVIOUS CODE</code> and <code>FEEDBACK</code> back into the generator grows the context. For very long tasks, you might need to manage history carefully.</p></li></ol><div><hr></div><h1>Conclusion</h1><p>The Evaluator-Optimiser pattern mimics the human workflow of &#8220;drafting and revising.&#8221; By accepting that the first output is rarely perfect, we build systems that are resilient, self-correcting, and capable of much higher quality than raw models alone.</p><p>Next time you are building an agent, ask yourself: <em>Does this need to be fast, or does it need to be right?</em> If the answer is &#8220;right,&#8221; hire an Evaluator.</p>]]></content:encoded></item><item><title><![CDATA[DIY #18 - Orchestrator-Worker LLM Agent with LangChain]]></title><description><![CDATA[Imagine you&#8217;re managing a complex research project. You need to analyze a company&#8217;s financial health, but that requires gathering quarterly reports, checking recent news, analyzing competitor data, and synthesizing market trends. No single person can efficiently do all of this at once&#8212;and asking them to do it sequentially would take forever.This is where the Orchestrator-Worker pattern comes in. It&#8217;s a workflow pattern that transforms your AI from a solo performer into a coordinated team, with one &#8220;manager&#8221; LLM breaking down complex tasks and delegating specialized work to multiple &#8220;expert&#8221; workers.]]></description><link>https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sun, 23 Nov 2025 08:01:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/afa1f589-acc5-4f3e-9f02-2b4dcbb60498_2048x2080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>&#128138; Pill of the week</strong></h1><p>Imagine you&#8217;re managing a complex research project. You need to analyze a company&#8217;s financial health, but that requires gathering quarterly reports, checking recent news, analyzing competitor data, and synthesizing market trends. No single person can efficiently do all of this at once&#8212;and asking them to do it sequentially would take forever.</p><p>This is where the <strong>Orchestrator-Worker</strong> pattern comes in. It&#8217;s a workflow pattern that transforms your AI from a solo performer into a coordinated team, with one &#8220;manager&#8221; LLM breaking down complex tasks and delegating specialized work to multiple &#8220;expert&#8221; workers.</p><p>Think of it like running a restaurant kitchen. The head chef (orchestrator) doesn&#8217;t cook every dish personally. Instead, they plan the menu, coordinate timing, and delegate: the grill station handles steaks, the pastry chef makes desserts, and the sous chef prepares sauces. Each specialist focuses on what they do best, and the head chef ensures everything comes together into a cohesive dining experience.</p><p>In the previous article we covered <strong><a href="https://mlpills.substack.com/p/diy-17-parallelisation-with-langchain">Parallelization</a></strong>, where independent <strong>predefined</strong> tasks run simultaneously:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e95b219a-b53f-44c5-a580-de29111d6ad0&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DIY #17 - Parallelisation with LangChain&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-01T08:30:44.511Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1bf9c2a-414a-476b-b835-e3395e478ec0_1020x664.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/diy-17-parallelisation-with-langchain&quot;,&quot;section_name&quot;:&quot;DIY&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:177662726,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>The key difference? In Parallelization, <strong>all tasks are predetermined</strong> and run on the same input. In Orchestrator-Worker, the orchestrator <strong>dynamically decides</strong> what tasks to create, potentially with different inputs for each worker, and then <strong>synthesizes</strong> their diverse outputs into a final answer (which can also be run in parallel if wanted).</p><p>In this issue we will:</p><ul><li><p>Define the &#8220;Orchestrator-Worker&#8221; pattern and explain its key benefits</p></li><li><p>Explore how to implement this pattern using LangChain</p></li><li><p>Build a step-by-step example that analyzes a company comprehensively using specialized workers &#128142; <strong>With all the code + notebook!</strong> &#128142;</p></li></ul><p>This is one out of several <a href="https://mlpills.substack.com/p/issue-110-llm-workflow-patterns">other workflow patterns</a>, which we covered here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;69957f8c-47fc-4dc1-ae9b-fd6cf58a13ea&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #110 - LLM Workflow Patterns&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-25T07:31:26.656Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-110-llm-workflow-patterns&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:176993635,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><blockquote><p><strong>Quick note before we start</strong>: I&#8217;ll be sharing the <strong>LangGraph version</strong> of this workflow <strong>in parallel</strong> later this week, available only to &#128142;<strong>paid subscribers</strong>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mlpills.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mlpills.substack.com/subscribe?"><span>Subscribe now</span></a></p></blockquote><p>Let&#8217;s begin with the <strong>Orchestrator-Worker</strong> pattern in LangChain!</p><h2>Why is Orchestrator-Worker so Effective?</h2><ul><li><p><strong>Task Decomposition</strong>: The orchestrator intelligently breaks down complex queries into manageable subtasks, each handled by the most appropriate specialist.</p></li><li><p><strong>Specialization &amp; Modularity</strong>: Each worker can use different models, prompts, tools, or knowledge bases optimized for their specific domain. You can easily add, remove, or replace workers without redesigning the entire system.</p></li><li><p><strong>Scalability</strong>: As your requirements grow, you simply add new specialized workers rather than making one massive, unwieldy prompt.</p></li><li><p><strong>Dynamic Adaptation</strong>: Unlike fixed patterns, the orchestrator can adjust its plan based on the query, skipping irrelevant subtasks or adding new ones as needed.</p></li></ul><p>This pattern is perfect for scenarios where a complex question requires multiple types of expertise, and you need intelligent coordination rather than just running everything in a predetermined way in parallel.</p><h2>Under the Hood</h2><p>The Orchestrator-Worker pattern in LangChain involves three key components:</p><ol><li><p><strong>The Orchestrator</strong>: An LLM that analyzes the user&#8217;s request and creates a plan by breaking it into subtasks</p></li><li><p><strong>The Workers</strong>: Specialized chains (potentially different LLMs, tools, or knowledge bases) that each handle one type of subtask</p></li><li><p><strong>The Synthesizer</strong>: An LLM that takes all worker outputs and integrates them into a coherent final answer</p></li></ol><p>The flow looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GSVV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GSVV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 424w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 848w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 1272w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GSVV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png" width="1456" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GSVV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 424w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 848w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 1272w, https://substackcdn.com/image/fetch/$s_!GSVV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1244a47-69f4-46c7-b582-f0593dc58551_1576x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s see this in action.</p><h1>&#128736;&#65039; DIY: <strong>Product Launch Analysis</strong></h1><p>Imagine you&#8217;re building an internal analysis tool for your company. When preparing for a product launch review, you need to provide a comprehensive analysis covering multiple dimensions: technical readiness, market positioning, and risk assessment&#8212;all from internal documentation and simulated data.</p><h3><strong>The Challenge</strong></h3><p>We&#8217;ll take a product name and use an orchestrator to:</p><ol><li><p>Determine which aspects of the product need analysis</p></li><li><p>Delegate each aspect to a specialized worker:</p><ul><li><p><strong>Technical Readiness Worker</strong>: Analyzes features and development status</p></li><li><p><strong>Market Position Worker</strong>: Evaluates competitive positioning and target audience</p></li><li><p><strong>Risk Assessment Worker</strong>: Identifies potential challenges and mitigation strategies</p></li></ul></li><li><p>Synthesize all findings into a comprehensive launch readiness report</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MfJJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MfJJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 424w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 848w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 1272w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MfJJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png" width="508" height="454.421875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1832,&quot;width&quot;:2048,&quot;resizeWidth&quot;:508,&quot;bytes&quot;:5561946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/179157853?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ffc25a-2c65-46df-86dc-168d54d7466a_2048x2080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MfJJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 424w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 848w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 1272w, https://substackcdn.com/image/fetch/$s_!MfJJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abbe6f7-df16-4278-a073-97559481b8d0_2048x1832.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As an example, we&#8217;ll analyze: <strong>&#8220;SmartHome Hub Pro&#8221;</strong></p><p>Let&#8217;s begin!</p><h3><strong>Setting the Stage</strong></h3><p>First, we need our LangChain tools and our LLM. We&#8217;ll also import Pydantic to define structured outputs.</p><pre><code><code># Install necessary libraries if you haven&#8217;t already
# !pip install langchain-openai langchain pydantic

import os
from typing import List, Dict

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

# Set up your OpenAI API key
# os.environ[&#8221;OPENAI_API_KEY&#8221;] = &#8220;your_api_key&#8221;

# Initialize our LLMs
orchestrator_llm = ChatOpenAI(model=&#8221;gpt-4o&#8221;, temperature=0)
worker_llm = ChatOpenAI(model=&#8221;gpt-4o-mini&#8221;, temperature=0.3)
synthesizer_llm = ChatOpenAI(model=&#8221;gpt-4o&#8221;, temperature=0.2)</code></code></pre><ul><li><p><code>orchestrator_llm</code>: Uses the more capable model for planning (temperature=0 for consistency)</p></li><li><p><code>worker_llm</code>: Uses a faster, cheaper model for specialized tasks</p></li><li><p><code>synthesizer_llm</code>: Uses the capable model again for final integration</p></li></ul><h3><strong>Step 1: Building the Orchestrator</strong></h3><p>The orchestrator&#8217;s job is to analyze the query and create a structured plan of subtasks.</p><pre><code><code># Define the plan structure
class SubTask(BaseModel):
    task_id: str = Field(description=&#8221;Unique identifier for the subtask&#8221;)
    worker_type: str = Field(description=&#8221;Type of worker needed: &#8216;technical&#8217;, &#8216;market&#8217;, or &#8216;risk&#8217;&#8221;)
    instructions: str = Field(description=&#8221;Specific instructions for this subtask&#8221;)

class Plan(BaseModel):
    product: str = Field(description=&#8221;The product being analyzed&#8221;)
    subtasks: List[SubTask] = Field(description=&#8221;List of subtasks to execute&#8221;)

# Set up the orchestrator parser
orchestrator_parser = JsonOutputParser(pydantic_object=Plan)

# Create the orchestrator prompt
orchestrator_prompt = ChatPromptTemplate.from_messages([
    (&#8221;system&#8221;, &#8220;&#8221;&#8220;You are an expert product launch coordinator. Your job is to analyze a product launch request and break it down into specific subtasks for specialized workers.

Available workers:
- &#8216;technical&#8217;: Analyzes technical readiness, features, and development status
- &#8216;market&#8217;: Evaluates market positioning, target audience, and competitive landscape
- &#8216;risk&#8217;: Identifies potential risks and mitigation strategies

For the given product, create a comprehensive plan that covers all relevant aspects for launch readiness. Each subtask should have clear, specific instructions.&#8221;&#8220;&#8221;),
    (&#8221;user&#8221;, &#8220;Product to analyze: {product}\n\n{format_instructions}&#8221;)
]).partial(format_instructions=orchestrator_parser.get_format_instructions())

# Define the orchestrator chain
orchestrator_chain = orchestrator_prompt | orchestrator_llm | orchestrator_parser
</code></code></pre><ul><li><p>We define a structured <code>Plan</code> with multiple <code>SubTask</code> objects</p></li><li><p>The orchestrator receives the product name and creates a JSON plan</p></li><li><p>Each subtask specifies which worker type to use and what that worker should do. In this case we defined three different workers, which <strong>we will implement in the next step</strong>: </p><ul><li><p>the technical specialist</p></li><li><p>the market specialist</p></li><li><p>the risk specialist</p></li></ul></li></ul><div><hr></div><h1>&#128214; Book of the Week</h1><p>If you are building enterprise-grade LLM systems or are responsible for bringing GenAI into complex organisational environments, this is a standout strategic and practical guide.</p><p><strong>&#8220;<a href="https://www.packtpub.com/en-us/product/llms-in-enterprise-9781836203063?srsltid=AfmBOoqsHMLVsCpKySskk0oh2c32FUqgX_LIJ5J5SBQdUB1hREK_ceEb">LLMs in Enterprise: Design strategies, patterns, and best practices for large language model development</a>&#8221; by Ahmed Menshawy and Mahmoud Fahmy</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.packtpub.com/en-us/product/llms-in-enterprise-9781836203063?srsltid=AfmBOoqsHMLVsCpKySskk0oh2c32FUqgX_LIJ5J5SBQdUB1hREK_ceEb" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q8ac!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q8ac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775" width="394" height="486.0054945054945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1796,&quot;width&quot;:1456,&quot;resizeWidth&quot;:394,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;LLMs in Enterprise&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.packtpub.com/en-us/product/llms-in-enterprise-9781836203063?srsltid=AfmBOoqsHMLVsCpKySskk0oh2c32FUqgX_LIJ5J5SBQdUB1hREK_ceEb&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="LLMs in Enterprise" title="LLMs in Enterprise" srcset="https://substackcdn.com/image/fetch/$s_!q8ac!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!q8ac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50cfc551-83b9-46f9-a6dd-156e252fc5fe_2250x2775 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#128161; A comprehensive, end-to-end playbook for designing, optimising, deploying, and governing large language model applications at enterprise scale. You will move from foundational concepts through to advanced design patterns, fine-tuning approaches, deployment architectures, and real operational concerns such as evaluation, monitoring, compliance, and cost optimisation.</p><p><strong>What sets it apart</strong></p><p>It brings a deeply structured, pattern-driven perspective to enterprise LLM development, going far beyond simple demos or hobby-level guidance. You get clear strategies for real-world challenges: scaling across business units, improving reliability, integrating RAG, ensuring fairness and transparency, and managing production LLM systems responsibly.</p><p>Each chapter bridges theory with pragmatic guidance, giving teams a common language for building robust and future-proof GenAI applications.</p><p><strong>You will learn to:</strong></p><p>&#9989; Apply proven design patterns to integrate LLMs into enterprise systems<br>&#9989; Overcome challenges in scaling, deploying, and optimising LLM applications<br>&#9989; Use fine-tuning techniques, contextual customisation, and RAG to boost performance<br>&#9989; Build data strategies that genuinely improve LLM quality and reliability<br>&#9989; Implement advanced inferencing engines and performance optimisation patterns<br>&#9989; Evaluate LLM applications with enterprise-ready metrics and frameworks<br>&#9989; Monitor production LLMs and ensure security, privacy, and compliance<br>&#9989; Understand responsible AI practices, including transparency and robustness<br>&#9989; Track emerging trends, multimodality, and the next wave of GenAI capabilities</p><p>If you are serious about deploying LLMs across an organisation and want proven patterns that reduce risk and accelerate delivery, this is the resource that turns experimentation into scalable enterprise AI.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.packtpub.com/en-us/product/llms-in-enterprise-9781836203063?srsltid=AfmBOoqsHMLVsCpKySskk0oh2c32FUqgX_LIJ5J5SBQdUB1hREK_ceEb&quot;,&quot;text&quot;:&quot;Get it here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.packtpub.com/en-us/product/llms-in-enterprise-9781836203063?srsltid=AfmBOoqsHMLVsCpKySskk0oh2c32FUqgX_LIJ5J5SBQdUB1hREK_ceEb"><span>Get it here</span></a></p><div><hr></div><h3><strong>Step 2: Building the Specialized Workers</strong></h3><p>Each worker is a specialized chain optimized for its domain, with access to relevant internal data (<code>PRODUCT_DATABASE</code>).</p><p><strong>Worker 1: Technical Readiness Analyst</strong></p><pre><code><code>technical_prompt = ChatPromptTemplate.from_template(
    &#8220;&#8221;&#8220;You are a technical readiness expert. Analyze the following product based on these instructions:

Product: {product}
Instructions: {instructions}

Product Data:
Features: {features}
Development Status: {development_status}

Provide a concise but thorough technical readiness analysis. Assess completeness, quality indicators, and readiness for launch.&#8221;&#8220;&#8221;
)

def technical_worker_invoke(inputs):
    product = inputs[&#8217;product&#8217;]
    product_data = PRODUCT_DATABASE.get(product, {})
    
    return (technical_prompt | worker_llm | StrOutputParser()).invoke({
        &#8220;product&#8221;: product,
        &#8220;instructions&#8221;: inputs[&#8217;instructions&#8217;],
        &#8220;features&#8221;: product_data.get(&#8217;features&#8217;, []),
        &#8220;development_status&#8221;: product_data.get(&#8217;development_status&#8217;, {})
    })</code></code></pre><p><strong>Worker 2: Market Position Analyst</strong></p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-17-orchestrator-worker-llm-agent">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #17 - Parallelisation with LangChain]]></title><description><![CDATA[Imagine you&#8217;re a news editor trying to understand a breaking story. You get a single field report. To really cover it, you need to know the &#8220;who, what, where&#8221; (the key facts), the &#8220;so what&#8221; (the summary), and the &#8220;how do people feel&#8221; (the public sentiment) all at once. Asking an AI to do this sequentially&#8212;first find the facts, then write the summary, then analyze the sentiment&#8212;is slow and inefficient.]]></description><link>https://mlpills.substack.com/p/diy-17-parallelisation-with-langchain</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-17-parallelisation-with-langchain</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 01 Nov 2025 08:30:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e1bf9c2a-414a-476b-b835-e3395e478ec0_1020x664.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the Week</h1><p>Imagine you&#8217;re a news editor trying to understand a breaking story. You get a single field report. To really cover it, you need to know the &#8220;who, what, where&#8221; (the key facts), the &#8220;so what&#8221; (the summary), and the &#8220;how do people feel&#8221; (the public sentiment) <em>all at once</em>. Asking an AI to do this sequentially&#8212;first find the facts, <em>then</em> write the summary, <em>then</em> analyze the sentiment&#8212;is slow and inefficient.</p><p>This is where <strong>Parallelization</strong> comes in. It&#8217;s a workflow pattern that transforms your AI from a single-track worker into a multi-talented team, tackling multiple, independent tasks <em>at the same time</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7exQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7exQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 424w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 848w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 1272w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7exQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png" width="1275" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7exQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 424w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 848w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 1272w, https://substackcdn.com/image/fetch/$s_!7exQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd16c4c63-cbcd-40ee-9834-701a8419f0ac_1275x660.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Think of it like cooking a big meal. You don&#8217;t cook the chicken, then wait for it to finish before starting the potatoes, then wait again before making the salad. You put the chicken in the oven, put the potatoes on to boil, and chop the salad <em>all at the same time</em>. They all finish around the same time, and dinner is ready much faster.</p><p>In the <a href="https://mlpills.substack.com/p/issue-110-llm-workflow-patterns">previous article</a> we covered the 5 main workflows when working with LLMs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://mlpills.substack.com/p/issue-110-llm-workflow-patterns" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YAUI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 424w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 848w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 1272w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YAUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png" width="1009" height="643" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:643,&quot;width&quot;:1009,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://mlpills.substack.com/p/issue-110-llm-workflow-patterns&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YAUI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 424w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 848w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 1272w, https://substackcdn.com/image/fetch/$s_!YAUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c2de18fa-b451-4944-9d83-aea3a6c14262&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #110 - LLM Workflow Patterns&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-25T07:31:26.656Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ebdb5cd-3c36-40ba-a631-4029c68d599f_1009x643.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-110-llm-workflow-patterns&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:176993635,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Also, previously we had covered the first, most basic one: &#8220;<a href="https://mlpills.substack.com/p/diy-15-prompt-chaining-with-langchain">prompt chaining</a>&#8221;:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4d48b48b-dc76-4716-a1d8-6dd8987120c7&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DIY #15 - Prompt Chaining with LangChain&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-06-29T16:20:22.068Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9182708e-a357-4276-ae25-8e26a59c13f1_1395x734.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/diy-15-prompt-chaining-with-langchain&quot;,&quot;section_name&quot;:&quot;DIY&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:167104231,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><blockquote><p>In this issue we will:</p><ul><li><p><strong>Define</strong> the &#8220;Parallelization&#8221; workflow pattern and explain its key benefits.</p></li><li><p><strong>Explore</strong> how to implement this pattern using <strong>LangChain</strong>.</p></li><li><p><strong>Build</strong> a step-by-step example that analyzes a news article in three different ways simultaneously. &#128142; <strong>With all the code + notebook!</strong> &#128142;</p></li></ul></blockquote><p><em>Let&#8217;s begin!</em></p><h3>Why is Parallelization so Effective?</h3><ul><li><p><strong>Speed and Efficiency:</strong> This is the biggest win. Instead of adding up the time for each task (Task A + Task B + Task C), the total time is only as long as your <em>slowest</em> task. This dramatically reduces latency, especially for complex queries.</p></li><li><p><strong>Rich, Structured Output:</strong> You can get a variety of different analyses from a single input, all neatly packaged into one structured object (like a dictionary or JSON).</p></li><li><p><strong>Task Isolation:</strong> Each &#8220;branch&#8221; of the parallel workflow runs independently. A failure or a poor-quality result in one branch (e.g., failing to find any &#8220;key phrases&#8221;) doesn&#8217;t stop the other branches (like &#8220;sentiment analysis&#8221;) from completing successfully.</p></li></ul><p>This pattern is the perfect choice for any scenario where you need <strong>multiple different insights from the same single piece of input</strong>, and none of those insights depend on each other.</p><h3>Under the Hood</h3><p><strong>LangChain</strong> Expression Language (LCEL) makes parallelization incredibly simple using a component called <code>RunnableParallel</code>.</p><p>More often, you&#8217;ll use its convenient shorthand: <strong>a Python dictionary</strong>.</p><p>When you define a step in your chain as a dictionary of other runnables, LCEL is smart enough to know it should execute all of them in parallel. The input to this dictionary step is passed to <em>every</em> runnable inside it, and the output is a dictionary where the keys are the same, but the values are the <em>results</em> of each runnable.</p><p>It looks like this:</p><pre><code><code>parallel_step = {
    &#8220;output_key_1&#8221;: chain_1,
    &#8220;output_key_2&#8221;: chain_2,
}</code></code></pre><p>If you invoke this with <code>{&#8221;input&#8221;: &#8220;some data&#8221;}</code>, LCEL will run <code>chain_1</code> and <code>chain_2</code> at the same time, both receiving that input. The final output will be <code>{&#8221;output_key_1&#8221;: &#8220;result from chain 1&#8221;, &#8220;output_key_2&#8221;: &#8220;result from chain 2&#8221;}</code>.</p><div class="poll-embed" data-attrs="{&quot;id&quot;:398639}" data-component-name="PollToDOM"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2100fbc0-fe5f-43fe-88e4-e91509f8a37b&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #86 - LangChain vs LangGraph&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-01-11T15:12:44.495Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03302d30-d54b-49b0-b09b-a5008768712f_1378x918.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-86-langchain-vs-langgraph&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:154617986,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1354140,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Let&#8217;s see this in action.</p><h3>&#128736;&#65039; Do It Yourself: Multi-Faceted Article Analysis</h3><p>Imagine you have a stream of short news articles, and for your dashboard, you need to instantly extract three distinct pieces of information from each one.</p><p><strong>The Challenge</strong></p><p>We&#8217;ll take a single news blurb and run it through three <em>parallel</em> chains:</p><ol><li><p><strong>One-Sentence Summary:</strong> Generate a single, concise summary sentence.</p></li><li><p><strong>Key Entity Extraction:</strong> Pull out all people, organizations, and locations into a structured JSON object.</p></li><li><p><strong>Tone Analysis:</strong> Classify the article&#8217;s tone (e.g., Objective, Optimistic, Critical).</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C4lG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C4lG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 424w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 848w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 1272w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C4lG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png" width="1275" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:1275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110252,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/177662726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C4lG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 424w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 848w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 1272w, https://substackcdn.com/image/fetch/$s_!C4lG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681e598d-6656-40d9-aec1-0f7f4139dc41_1275x550.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As an example, we will use the following news blurb:</p><blockquote><p>&#8220;Quantum Leap Innovations (QLI), a tech startup based in Silicon Valley, announced yesterday it has secured $50 million in Series B funding. The round was led by Apex Ventures. CEO Dr. Aris Thorne stated the funds will be used to accelerate the development of their next-gen quantum computing platform, which aims to solve complex logistical problems.&#8221;</p></blockquote><p>Let&#8217;s begin!</p><h4>Setting the Stage</h4><p>First, we need our LangChain tools and our LLM. We&#8217;ll also import Pydantic to define the <em>exact</em> structure we want for our extracted entities.</p><pre><code><code># Install necessary libraries if you haven&#8217;t already
# !pip install langchain-openai langchain pydantic

import os

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

# Set up your OpenAI API key (replace with your actual key or environment variable)
# If your key is not already set as an environment variable, uncomment and run the following:
# os.environ[&#8221;OPENAI_API_KEY&#8221;] = "your_api_key"

# Initialize our AI brain (the LLM) using gpt-4o
llm = ChatOpenAI(model=&#8221;gpt-4o&#8221;)</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p><code>ChatOpenAI(model=&#8221;gpt-4o&#8221;)</code>: This initializes our LLM.</p></li><li><p><code>StrOutputParser</code>: We&#8217;ll use this to get simple string outputs for our summary and tone.</p></li><li><p><code>BaseModel, Field</code>: These are from Pydantic. We&#8217;ll use them to create a &#8220;schema&#8221; or template for our entities.</p></li><li><p><code>JsonOutputParser</code>: This parser will take the LLM&#8217;s string output and automatically convert it into a Python dictionary that matches our Pydantic model.</p></li></ul><div><hr></div><h1>&#128214; Book of the Week</h1><p>If you are building, or want to build, real agentic systems in Python, this is a standout pick for your stack.</p><p><strong>&#8220;Learn Model Context Protocol with Python&#8221;</strong> by <strong>Christoffer Noring</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.packtpub.com/en-us/product/learn-model-context-protocol-with-python-9781806103232" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HPgg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HPgg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775" width="396" height="488.4725274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1796,&quot;width&quot;:1456,&quot;resizeWidth&quot;:396,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Learn Model Context Protocol with Python&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.packtpub.com/en-us/product/learn-model-context-protocol-with-python-9781806103232&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Learn Model Context Protocol with Python" title="Learn Model Context Protocol with Python" srcset="https://substackcdn.com/image/fetch/$s_!HPgg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!HPgg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c85049f-f770-4afd-85f6-2a246c17747c_2250x2775 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#128161; A practical, end-to-end guide to Model Context Protocol (MCP) that shows you how to design, build, test, secure, and deploy interoperable AI applications using Python. You will move from first server and client to cloud deployment with tooling that works across LLM and non-LLM clients.</p><p><strong>What sets it apart</strong></p><p>It focuses on standardisation and interoperability with MCP, so your agents, tools, and hosts speak the same language. You get hands-on workflows for server and client development, clear security guidance, and real integrations with Claude Desktop and Visual Studio Code Agents.</p><p><strong>You will learn to:</strong></p><p>&#9989; Understand the MCP spec and core components<br>&#9989; Build MCP servers that expose tools and resources to many clients<br>&#9989; Describe host, client, and server capabilities for smooth interoperability<br>&#9989; Test and debug with interactive inspector tools<br>&#9989; Consume servers using Claude Desktop and VS Code Agents<br>&#9989; Secure MCP apps and mitigate common threats<br>&#9989; Deploy MCP apps using cloud-based strategies</p><p><strong>Who should read this</strong></p><p>&#129504; Web developers, software architects, AI practitioners, and tech leads building scalable AI-integrated apps<br>&#128200; Product managers driving AI initiatives who need a shared language for teams<br>&#128218; Readers with basic web and AI knowledge who want production-ready patterns</p><p><strong>Why it is useful for teams</strong></p><ul><li><p>A single, modern approach for distributed agentic AI apps</p></li><li><p>Professional guidance for both LLM and non-LLM clients</p></li><li><p>Print or Kindle purchase includes a free PDF eBook, plus downloadable code</p></li></ul><p>If you are serious about MCP and agentic architectures, this is the resource that turns experimentation into reliable AI systems.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.packtpub.com/en-us/product/learn-model-context-protocol-with-python-9781806103232&quot;,&quot;text&quot;:&quot;Get it here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.packtpub.com/en-us/product/learn-model-context-protocol-with-python-9781806103232"><span>Get it here</span></a></p><div><hr></div><p>Now we are ready to build our parallel branches!</p><h4>Branch 1: The Summary Chain</h4><p>This is a simple chain to generate a one-sentence summary.</p><pre><code><code># Prompt for Summary
summary_prompt = ChatPromptTemplate.from_template(
    &#8220;Summarize the following article in one single, concise sentence.\n\nArticle:\n{article}&#8221;
)

# Define the summary sub-chain
summary_chain = summary_prompt | llm | StrOutputParser()
</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p>This is a standard sequential chain. It takes an <code>article</code> input, formats it with the prompt, sends it to the LLM, and parses the output as a string.</p></li></ul><h4>Branch 2: The Entity Extraction Chain</h4><p>This branch is more advanced. We&#8217;ll define a Pydantic model for our desired output and use a <code>JsonOutputParser</code> to force the LLM&#8217;s response into that structure.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-17-parallelisation-with-langchain">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #16 - Build a Persistent Conversational Agent with LangGraph]]></title><description><![CDATA[In this article, we'll build exactly that: a Python-based conversational agent using LangGraph that can remember user information across sessions and automatically update that information as conversations progress. This isn't just about storing data&#8212;it's about creating an agent that naturally extracts and organizes information from normal conversation flow.]]></description><link>https://mlpills.substack.com/p/diy-16-build-a-persistent-conversational</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-16-build-a-persistent-conversational</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sun, 07 Sep 2025 07:30:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e62d1c08-879e-4e07-9459-2b5bbfdb68c5_788x526.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>&#128138; Pill of the Week</strong></h1><p>Traditional chatbots start fresh with every conversation&#8212;they have no memory of who you are or what you've discussed before. But what if your chatbot could remember past interactions and learn about users over time?</p><p>In this article, we'll build exactly that: <strong>a Python-based conversational agent using LangGraph that can remember user information across sessions and automatically update that information as conversations progress</strong>. This isn't just about storing data&#8212;it's about creating an agent that naturally extracts and organizes information from normal conversation flow.</p><p>We'll cover this in three parts:</p><ol><li><p><strong>The concept</strong>: What the agent does and why it's useful</p></li><li><p><strong>A complete example</strong>: How the agent processes a real conversation</p></li><li><p><strong>The implementation</strong>: How the code actually works (&#128142;<strong>a colab notebook is also shared at the end with all the code&#128142;</strong>)</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NaFz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NaFz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 424w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 848w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 1272w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NaFz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png" width="1227" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1227,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95958,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/172960220?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NaFz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 424w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 848w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 1272w, https://substackcdn.com/image/fetch/$s_!NaFz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4729398a-e615-40f3-8bbf-9bc6ac568b55_1227x626.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Part 1: Understanding the Agent</h2><h3>Core Functionality</h3><p>This agent operates on a dual-track system during every conversation:</p><p><strong>1. Retrieves existing information:</strong> When you ask "What city do you have on file for me?" the agent looks up your stored profile and gives you an accurate answer. It doesn't guess or hallucinate&#8212;it checks the actual data and reports back what it finds.</p><p><strong>2. Captures new information:</strong> When you mention "I just moved to London and started painting," the agent automatically:</p><ul><li><p>Recognizes these as new data points worth storing</p></li><li><p>Extracts them into structured format (location: "London", hobbies: ["painting"])</p></li><li><p>Saves them to your profile without interrupting the conversation</p></li></ul><p>The magic is in the seamlessness. Users don't need to fill out forms or use special commands. They just talk naturally, and the agent handles both information retrieval and capture in the background. This creates a feedback loop where better memory leads to better conversations, which encourages users to share more, which improves the memory further.</p><h3>Practical Applications</h3><p>This architecture opens up several powerful use cases that go beyond simple Q&amp;A:</p><ul><li><p><strong>Better customer service</strong>: Imagine calling support and the bot already knows you own the Pro version, had a billing issue last month, and prefer email communications. It can skip the entire triage process and immediately offer relevant solutions. For businesses, this means faster resolution times and happier customers.</p></li><li><p><strong>Personal assistants that actually assist</strong>: A digital helper that remembers your daughter is allergic to peanuts, your anniversary is next week, and you're trying to learn Spanish can provide genuinely useful, proactive suggestions. It moves from being a reactive search tool to a proactive partner.</p></li><li><p><strong>Automated CRM updates</strong>: Sales teams spend hours logging client interactions. An agent that can parse "The client mentioned they're expanding to Europe next quarter and need enterprise pricing" and automatically update the CRM saves time while ensuring nothing gets forgotten.</p></li><li><p><strong>Educational companions</strong>: Tutoring bots that remember a student's weak areas, learning pace, and preferred explanation styles can provide truly personalized education that adapts over multiple sessions.</p></li><li><p><strong>Healthcare intake</strong>: Medical assistants that can naturally collect patient history during conversation, remembering symptoms mentioned across multiple visits and flagging important changes to doctors.</p></li></ul><h3>The ReAct Pattern</h3><p>The agent uses a "Reason and Act" (ReAct) pattern&#8212;a cognitive framework that mimics how humans solve problems. Unlike simple if-then chatbots, ReAct agents can plan multi-step solutions and adapt their approach based on intermediate results.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a768f214-8f96-4e8e-85a8-afd4b1d668e3&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DIY #14 - Step-by-step implementation of a ReAct Agent in LangGraph&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-20T12:37:39.926Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!du7b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/diy-14-step-by-step-implementation&quot;,&quot;section_name&quot;:&quot;DIY&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:161105148,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YODk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Here's what happens when you send a message like "What are my hobbies? By the way, I'm 28 now and I also enjoy coding":</p><ol><li><p><strong>Receive</strong>: The agent gets your raw message exactly as typed</p></li><li><p><strong>Reason</strong>: The LLM (Large Language Model) breaks down the message:</p><ul><li><p>Identifies a question: "What are my hobbies?"</p></li><li><p>Spots new information: age update (28) and new hobby (coding)</p></li><li><p>Plans the sequence: fetch profile first, then extract new details</p></li></ul></li><li><p><strong>Act</strong>: It executes the plan by calling tools:</p><ul><li><p>Calls <code>get_user_profile(user_id=12345)</code> to fetch current data</p></li><li><p>Calls <code>extract_personal_details(age=28, hobbies=["coding"])</code> to structure the new info</p></li></ul></li><li><p><strong>Observe</strong>: The tools return their results:</p><ul><li><p>Current profile shows existing hobbies: ["reading"]</p></li><li><p>Extraction confirms new data ready for storage</p></li></ul></li><li><p><strong>Respond</strong>: The agent:</p><ul><li><p>Synthesizes a natural response: "According to your profile, your hobbies include reading. I'll also note that you enjoy coding and update your age to 28."</p></li><li><p>Triggers background profile update to merge new data</p></li></ul></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!du7b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!du7b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 424w, https://substackcdn.com/image/fetch/$s_!du7b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 848w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1272w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png" width="520" height="521.7627118644068" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:590,&quot;resizeWidth&quot;:520,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!du7b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 424w, https://substackcdn.com/image/fetch/$s_!du7b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 848w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1272w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This pattern is powerful because the agent can adjust its approach mid-stream. If the profile lookup fails, it can acknowledge that and still save the new information. If the user's message is ambiguous, it can ask for clarification before proceeding.</p><div><hr></div><h1><strong>&#128214; Book of the week</strong></h1><p>I just started reading <em><strong><a href="https://www.amazon.com/Definitive-Guide-OpenSearch-techniques-practices/dp/B0DSJ453GJ/ref=sr_1_1?crid=2Z7D7TFH1F8OW&amp;dib=eyJ2IjoiMSJ9.ueIslRuzYgWrUIa7HaWVQg.T72uSWrKfWlGvUuV7jWHELxyR8fazv6TcPOYVwZuaKY&amp;dib_tag=se&amp;keywords=The+Definitive+Guide+to+OpenSearch&amp;qid=1751733269&amp;sprefix=%2Caps%2C311&amp;sr=8-1">The Definitive Guide to OpenSearch</a></strong></em> by Jon Handler, Soujanya Konka, and Prashant Agarwal &#8212; and it&#8217;s a must-have for anyone working with large datasets, search, or analytics.</p><p>What makes it stand out:</p><ul><li><p>&#9989; Covers fundamentals <em>and</em> advanced optimizations</p></li><li><p>&#9989; Real-world case studies and hands-on demos</p></li><li><p>&#9989; Insights on scaling search and analytics systems</p></li><li><p>&#9989; Even explores Generative AI with OpenSearch</p></li></ul><p>As data scientists and engineers, mastering search infrastructure is key to building scalable and intelligent systems &#8212; and this book brings best practices straight from AWS experts.</p><p>&#128269; <strong>Highly recommended</strong> if you&#8217;re in AI/ML, data engineering, or system design.</p><p>&#128073; Check it out here: <a href="https://www.amazon.com/Definitive-Guide-OpenSearch-techniques-practices/dp/B0DSJ453GJ/ref=sr_1_1?crid=2Z7D7TFH1F8OW&amp;dib=eyJ2IjoiMSJ9.ueIslRuzYgWrUIa7HaWVQg.T72uSWrKfWlGvUuV7jWHELxyR8fazv6TcPOYVwZuaKY&amp;dib_tag=se&amp;keywords=The+Definitive+Guide+to+OpenSearch&amp;qid=1751733269&amp;sprefix=%2Caps%2C311&amp;sr=8-1">The Definitive Guide to OpenSearch</a></p><div><hr></div><h2>Part 2: A Complete Example</h2><p>Let's walk through a real conversation to see what we aim to achieve.</p><h3>Starting Scenario</h3><p>Our user (ID: 12345) has this initial profile:</p><pre><code><code>{
    "name": "John Doe",
    "age": 30,
    "city": "New York",
    "hobbies": ["reading"]
}</code></code></pre><h3>The Conversation</h3><p><strong>User says:</strong> "Hi, what are my hobbies? Btw, I just moved to San Francisco, I'm 28 now, and I also enjoy coding."</p><p>Let's trace what happens:</p><h4>Step 1: Message Processing</h4><p>The agent receives the message and begins reasoning. It identifies:</p><ul><li><p>A question that needs the current profile</p></li><li><p>Three pieces of new information to extract</p></li></ul><h4>Step 2: Tool Execution</h4><p>The agent makes two tool calls:</p><p><strong>First tool call - Getting current profile:</strong></p><pre><code><code>--- TOOL: Looking up profile for user_id: 12345 ---
Returns: {"name": "John Doe", "age": 30, "city": "New York", "hobbies": ["reading"]}
</code></code></pre><p><strong>Second tool call - Extracting new details:</strong></p><pre><code><code>--- TOOL: Extracting details: {
    'location': 'San Francisco', 
    'age': 28, 
    'hobbies': ['coding']
} ---
</code></code></pre><h4>Step 3: Response Generation</h4><p>The agent crafts a natural response:</p><blockquote><p>"Based on your profile, your current hobbies include reading. I've noted that you also enjoy coding now. I've also updated your location to San Francisco and your age to 28. Is there anything specific you'd like to know about your updated profile?"</p></blockquote><h4>Step 4: Profile Update</h4><p>Behind the scenes, the smart update function processes the new details:</p><pre><code><code># Before update:
{
    "name": "John Doe",
    "age": 30,
    "city": "New York", 
    "hobbies": ["reading"]
}

# After smart merge:
{
    "name": "John Doe",
    "age": 28,                    # Updated scalar
    "city": "New York",           # Unchanged
    "location": "San Francisco",  # New field added
    "hobbies": ["reading", "coding"]  # List extended
}
</code></code></pre><p>Notice how:</p><ul><li><p>The age was updated from 30 to 28</p></li><li><p>"location" was added as a new field (the agent used "location" instead of "city" for San Francisco)</p></li><li><p>The hobbies list was extended, not replaced</p></li><li><p>The name remained unchanged</p></li></ul><h3>Following Up</h3><p>If the user continues the conversation:</p><p><strong>User:</strong> "Actually, I also enjoy hiking and photography."</p><p>The agent would:</p><ol><li><p>Extract <code>hobbies: ["hiking", "photography"]</code></p></li><li><p>Merge them into the existing list</p></li><li><p>End up with <code>hobbies: ["reading", "coding", "hiking", "photography"]</code></p></li></ol><p>This demonstrates the power of the append logic&#8212;information accumulates naturally over multiple interactions.</p><h2>Part 3: Implementation Details</h2><p>Now that we know what we want to achieve, let's examine how to do it, piece by piece.</p><h3>1. State Management</h3><p>The agent's working memory is deliberately minimal&#8212;just enough to maintain context without unnecessary complexity:</p><pre><code><code>class MyState(AgentState):
    user_id: int</code></code></pre><ul><li><p><code>user_id</code>: The unique identifier that links this conversation to a specific user's data</p></li><li><p><code>messages</code>: Inherited from <code>AgentState</code>, this list tracks the entire conversation history including human inputs, AI responses, tool calls, and tool results</p></li></ul><p>Why so simple? The state is transient&#8212;it only exists for the duration of a conversation. The permanent user data lives elsewhere (in a database or file), and the state just needs enough information to access it. This separation keeps the agent lightweight and allows the storage backend to be swapped out without changing the agent logic.</p><p>The <code>messages</code> list is particularly important because it provides the full audit trail. Every decision the agent makes, every tool it calls, and every result it receives gets logged here. This is invaluable for debugging and understanding the agent's reasoning process.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-16-build-a-persistent-conversational">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #15 - Prompt Chaining with LangChain]]></title><description><![CDATA[Imagine you have a complex task, like writing a detailed report or analyzing a lengthy customer feedback document. Trying to get an AI to do it all in one go can be like asking a chef to prepare a gourmet meal with just one instruction: "Make dinner!" The result might be edible, but probably not perfect.This is where Prompt Chaining comes in. It's a powerful, yet elegantly simple, workflow pattern that helps Large Language Models (LLMs) tackle intricate problems by breaking them down into a series of smaller, more manageable steps.]]></description><link>https://mlpills.substack.com/p/diy-15-prompt-chaining-with-langchain</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-15-prompt-chaining-with-langchain</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sun, 29 Jun 2025 16:20:22 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fcb7fe59-11d3-48de-818c-5581841b169f_2400x1792.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>&#128138; Pill of the Week</strong></h1><p>Imagine you have a complex task, like writing a detailed report or analyzing a lengthy customer feedback document. Trying to get an AI to do it all in one go can be like asking a chef to prepare a gourmet meal with just one instruction: "Make dinner!" The result might be edible, but probably not perfect.</p><p>This is where <strong>Prompt Chaining</strong> comes in. It's a powerful, yet elegantly simple, workflow pattern that helps Large Language Models (LLMs) tackle intricate problems by breaking them down into a series of smaller, more manageable steps. </p><p>Think of it like cleaning your house room by room instead of trying to tackle the whole place at once. When you focus on just the kitchen, you actually get it spotless. Try to clean everything simultaneously, and you'll probably just move clutter around.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5DaV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5DaV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 424w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 848w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5DaV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png" width="1456" height="1087" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1087,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4695462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/167104231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5DaV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 424w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 848w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!5DaV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfd95bb-c8fb-404c-85da-474c91bb885a_2400x1792.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Why is Prompt Chaining so Effective?</h3><ul><li><p><strong>Enhanced Accuracy</strong>: By focusing on one sub-task at a time, the LLM can dedicate its full attention and reasoning power to that specific step, leading to more <strong>precise and reliable outputs</strong> for each stage.</p></li><li><p><strong>Clarity and Control</strong>: You gain a <strong>clearer understanding</strong> of how the AI arrives at its final answer, making debugging and refinement much easier. Each step is transparent.</p></li><li><p><strong>Modularity</strong>: Each part of your chain is a <strong>distinct component</strong>. This means you can easily swap out or improve individual steps without disrupting the entire workflow.</p></li></ul><p>This pattern is particularly well-suited for tasks that naturally have a clear, sequential flow, where the successful completion of one stage is a prerequisite for the next. For example, generating a document outline, then checking that outline against specific criteria, and finally writing the document based on the refined outline.</p><h2>Under the Hood</h2><p>LangChain Expression Language (LCEL) makes building these chains incredibly intuitive. LCEL uses Python's familiar pipe operator (<code>|</code>) to connect different components, creating a <code>RunnableSequence</code>. This means the output of one component automatically flows as the input to the next, just like water through a pipe!</p><p>Every core component in LangChain &#8211; from your prompts to your LLMs and output parsers &#8211; implements the <code>Runnable</code> protocol, making them perfectly compatible with this chaining mechanism.</p><p>Let's see how this works in practice with a real-world example.</p><h2>&#128736;&#65039;Do It Yourself: Refining Sentiment Analysis</h2><blockquote><p>Imagine you have a <strong>stream of customer feedback</strong>, and you need to not only <strong>understand the sentiment</strong> but also <strong>extract key issues</strong> and present a concise, refined <strong>summary</strong>. <em>This is a perfect candidate for prompt chaining.</em></p></blockquote><h3>The Challenge</h3><p>We'll take a customer review and put it through a multi-stage process:</p><ol><li><p><strong>Sentiment Analysis:</strong> Determine the overall sentiment (Positive, Negative, or Neutral) and provide a brief explanation.</p></li><li><p><strong>Key Phrase Extraction:</strong> Identify up to 5 important key phrases that capture the essence of the review.</p></li><li><p><strong>Refined Summary:</strong> Create a concise 2-3 sentence summary incorporating the original review, its sentiment, and the extracted key phrases.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G65T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G65T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 424w, https://substackcdn.com/image/fetch/$s_!G65T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 848w, https://substackcdn.com/image/fetch/$s_!G65T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 1272w, https://substackcdn.com/image/fetch/$s_!G65T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G65T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png" width="1393" height="417" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:417,&quot;width&quot;:1393,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:143861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/167104231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G65T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 424w, https://substackcdn.com/image/fetch/$s_!G65T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 848w, https://substackcdn.com/image/fetch/$s_!G65T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 1272w, https://substackcdn.com/image/fetch/$s_!G65T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4d2698c-3970-45a8-a163-62d3117e0f35_1393x417.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As an <strong>example</strong> we will use the following customer review:</p><blockquote><p><em>&#8220;The new coffee machine is a disaster. It constantly leaks water, the coffee tastes burnt, and the brewing process takes forever. I'm extremely disappointed with this purchase and would not recommend it to anyone. The previous model was much better.&#8221;</em></p></blockquote><p>Let&#8217;s begin!</p><h3>Setting the Stage</h3><p>First, we need to import the necessary tools from LangChain and initialize our Large Language Model. We'll use <code>ChatOpenAI</code> for this example, specifically the <code>gpt-4o</code> model.</p><pre><code><code># Install necessary libraries if you haven't already
# !pip install langchain-openai

import os
from getpass import getpass

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

# Set up your OpenAI API key (replace with your actual key or environment variable)
# If your key is not already set as an environment variable, uncomment and run the following:
# os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")

# Initialize our AI brain (the LLM) using gpt-4o
llm = ChatOpenAI(model="gpt-4o")
</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p><code>!pip install langchain-openai</code>: This command installs the required LangChain and OpenAI libraries in your environment.</p></li><li><p><code>os.environ["OPENAI_API_KEY"] = getpass(...)</code>: This line is for securely setting your OpenAI API key as an environment variable, which is crucial for authenticating with OpenAI's models.</p></li><li><p><code>ChatOpenAI(model="gpt-4o")</code>: This initializes our LLM instance, specifically using the <code>gpt-4o</code> model.</p></li><li><p><code>ChatPromptTemplate</code>, <code>RunnablePassthrough</code>, <code>RunnableLambda</code>, <code>itemgetter</code>, <code>StrOutputParser</code>: These are the core LangChain components we'll use to build our flexible and powerful chains.</p></li></ul><h3>Understanding Pure Sequential Chaining</h3><p>Before diving into our multi-faceted example, it's helpful to see a simpler form of chaining where <strong>the output of one step directly becomes the </strong><em><strong>sole</strong></em><strong> input of the next</strong>. This illustrates the fundamental sequential flow.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ly0f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ly0f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 424w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 848w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ly0f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png" width="1210" height="237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:237,&quot;width&quot;:1210,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:137591,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/167104231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ly0f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 424w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 848w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Ly0f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe688e586-d1d6-4c4b-92e6-014bc04c6b88_1210x237.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><pre><code><code># A simple, pure sequential chain example
pure_sequential_example = (
    ChatPromptTemplate.from_template("What is the capital of {country}?")
    | llm
    | StrOutputParser()
    | ChatPromptTemplate.from_template("Briefly tell me about {text}.") # {text} here would be the capital from the previous step
   | llm
   | StrOutputParser()
)

# Invoke the simple sequential chain
print(pure_sequential_example.invoke({"country": 'Spain'}))
</code></code></pre><p>The output of this code would be:</p><blockquote><p><em>Madrid is the capital and largest city of Spain. It is located in the center of the country and serves as its political, economic, and cultural hub. Known for its vibrant nightlife, historic architecture, world-class museums like the Prado and Reina Sofia, and lively neighborhoods, Madrid is a city that seamlessly blends traditional charm with modern energy. It is also home to the Spanish royal family and government institutions.</em></p></blockquote><p><strong>Interpretation:</strong></p><p>This simple chain first asks the LLM for the capital of a given country, then takes that capital as the input (<code>{text}</code>) for a second prompt, asking for information about it. This perfectly illustrates how the output of one <code>Runnable</code> (the capital city) becomes the input for the next <code>Runnable</code> (<code>ChatPromptTemplate.from_template("Briefly tell me about {text}.")</code>).</p><div><hr></div><h1>&#128214; <strong>Book of the Week</strong></h1><p>If you're building or want to build <em>LLM-powered apps and agents</em> &#8212; whether you're an AI developer, MLOps engineer, or product team lead &#8212; you need to check this out:</p><p><strong>&#8220;Generative AI with LangChain&#8221; (Second Edition)</strong><br>By Ben Auffarth &amp; Leonid Kuligin</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ccNv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ccNv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ccNv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775" width="370" height="456.4010989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/feb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1796,&quot;width&quot;:1456,&quot;resizeWidth&quot;:370,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Generative AI with LangChain&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Generative AI with LangChain" title="Generative AI with LangChain" srcset="https://substackcdn.com/image/fetch/$s_!ccNv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 424w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 848w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 1272w, https://substackcdn.com/image/fetch/$s_!ccNv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb97a84-a7d6-4e2f-82b2-e0e7ec3a8032_2250x2775 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#128161; This book isn&#8217;t just about building cool prototypes &#8212; it&#8217;s a practical guide to designing, scaling, and deploying <strong>production-ready GenAI systems</strong> using LangChain, LangGraph, and Python.</p><p><strong>What sets it apart?</strong></p><p>It tackles one of the biggest challenges in GenAI: <strong>moving from prototype to production</strong> &#8212; with a strong focus on multi-agent coordination, observability, and real-world deployment:</p><ul><li><p>&#9989; Design robust <strong>LangGraph agent workflows</strong> that scale</p></li><li><p>&#9989; Build powerful <strong>RAG pipelines</strong> with re-ranking and hybrid search</p></li><li><p>&#9989; Apply enterprise-ready <strong>testing, monitoring, and error-handling</strong></p></li><li><p>&#9989; Explore <strong>Tree-of-Thoughts</strong>, structured generation, and agent handoffs</p></li><li><p>&#9989; Work with top LLMs like <strong>Gemini, o3-mini, Mistral, DeepSeek, and Claude</strong></p></li></ul><p>This is the guide that turns experimentation into <strong>reliable AI infrastructure</strong>.</p><p><strong>This is a must-read for:</strong></p><ul><li><p>&#129504; AI engineers building multi-agent systems</p></li><li><p>&#128013; Python devs deploying LLM apps in real-world environments</p></li><li><p>&#127970; Enterprise teams moving GenAI projects into production</p></li><li><p>&#128300; Anyone working with LangChain, LangGraph, or advanced RAG</p></li><li><p>&#9989; Devs who care about <strong>security, compliance, and ethical AI</strong></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.packtpub.com/en-es/product/generative-ai-with-langchain-9781837022007?srsltid=AfmBOoq7DVkWNdWGE5r512X8Dq1u5H-vc2mE7w0zi1pbvA9JKBmX9Pg9&quot;,&quot;text&quot;:&quot;Get it here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.packtpub.com/en-es/product/generative-ai-with-langchain-9781837022007?srsltid=AfmBOoq7DVkWNdWGE5r512X8Dq1u5H-vc2mE7w0zi1pbvA9JKBmX9Pg9"><span>Get it here</span></a></p><div><hr></div><p>Now we are ready for the <strong>more complex scenario</strong>!</p><h3>Sentiment Analysis</h3><p>Our first task is to determine the sentiment of the customer review. We'll define a prompt specifically for this and create a simple chain.</p><pre><code><code># Prompt for Sentiment Analysis
sentiment_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert sentiment analyzer. Analyze the sentiment of the following customer review and categorize it as 'Positive', 'Negative', or 'Neutral'. Provide a brief explanation for your categorization."),
        ("user", "Customer Review:\n{review}"),
    ]
)

# Define the sentiment analysis sub-chain
sentiment_analysis_chain = sentiment_prompt | llm | StrOutputParser()</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p><code>sentiment_prompt</code>: This <code>ChatPromptTemplate</code> is designed to take a <code>review</code> as input. The system message instructs the AI to act as a <strong>sentiment analyzer</strong> and categorize the sentiment.</p></li><li><p><code>sentiment_analysis_chain = sentiment_prompt | llm | StrOutputParser()</code>: This creates our first simple chain. It pipes the <code>review</code> into the <code>sentiment_prompt</code>, sends the result to our <code>llm</code> (gpt-4o), and then uses <code>StrOutputParser()</code> to convert the LLM's output into a clean string.</p></li></ul><p>We can invoke this chain with the customer review and see what is the output:</p><pre><code>print(sentiment_analysis_chain.invoke({"review": customer_review}))</code></pre><blockquote><p><em><strong>Sentiment</strong>: Negative </em></p><p><em><strong>Explanation</strong>: The review expresses clear dissatisfaction with the product, describing it as a "disaster" and highlighting several specific issues, including water leaks, burnt coffee flavor, and a long brewing process. Additionally, the reviewer explicitly states being "extremely disappointed" and advises against purchasing the machine, further reinforcing the negative sentiment.</em></p></blockquote><h3>Key Phrase Extraction</h3><p>Next, we'll focus on extracting key phrases from the original review. This will be another independent step, so it gets its own prompt and simple chain.</p><pre><code><code># Prompt for Key Phrase Extraction
key_phrases_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert in extracting key phrases. From the following customer review, identify and list up to 5 of the most important key phrases that capture the essence of the review. Present them as a comma-separated list."),
        ("user", "Customer Review:\n{review}"),
    ]
)

# Define the key phrase extraction sub-chain
key_phrase_extraction_chain = key_phrases_prompt | llm | StrOutputParser()</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p><code>key_phrases_prompt</code>: This prompt focuses solely on <strong>extracting important key phrases</strong> from the <code>review</code>.</p></li><li><p><code>key_phrase_extraction_chain = key_phrases_prompt | llm | StrOutputParser()</code>: Similar to the sentiment chain, this forms a self-contained unit that takes a review and returns a comma-separated list of key phrases.</p></li></ul><p>Similarly, we can invoke this chain and see its output:</p><pre><code><code>print(key_phrase_extraction_chain.invoke({"review": customer_review}))</code></code></pre><blockquote><p><em>leaks water, coffee tastes burnt, brewing process takes forever, extremely disappointed, previous model was better</em></p></blockquote><h3>Refined Summary</h3><p>Finally, we define the prompt for generating the refined summary. This prompt is special because it will require inputs from the original review <em>and</em> the outputs of our previous two sub-tasks.</p><pre><code><code># Prompt for Refined Summary using outputs from previous steps
summary_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert summarizer. Based on the following customer review, its sentiment, and key phrases, create a concise, 2-3 sentence summary that incorporates these details."),
        ("user", "Customer Review:\n{review}\n\nSentiment:\n{sentiment_analysis}\n\nKey Phrases:\n{key_phrases}"),
    ]
)</code></code></pre><p><strong>Interpretation:</strong></p><ul><li><p><code>summary_prompt</code>: This prompt expects three distinct inputs: <code>review</code>, <code>sentiment_analysis</code>, and <code>key_phrases</code>. The system message guides the AI to <strong>combine these elements into a concise summary</strong>. Notice that we don't define a full chain for this <em>yet</em>, as it needs to receive inputs from the previous <em>dynamic</em> steps.</p></li></ul><h3>Building the Full Chain</h3>
      <p>
          <a href="https://mlpills.substack.com/p/diy-15-prompt-chaining-with-langchain">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #14 - Step-by-step implementation of a ReAct Agent in LangGraph]]></title><description><![CDATA[Large Language Model (LLM) agents can make decisions about when to use external tools as part of answering a question.&#160;Let&#8217;s now cover the most basic agent: ReAct agent. The ReAct (Reasoning and Acting) style agent operates in a loop of:]]></description><link>https://mlpills.substack.com/p/diy-14-step-by-step-implementation</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-14-step-by-step-implementation</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sun, 20 Apr 2025 12:37:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!du7b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the Week</h1><p>Large Language Model (LLM) <strong>agents</strong> can make decisions about when to use external tools as part of answering a question. </p><p>We covered AI Agents in this previous issue:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b387b2f0-a531-4915-845b-afc6791231e2&quot;,&quot;caption&quot;:&quot;Today, we&#8217;ll explore AI Agents&#8212;what they are, their potential applications across industries, and their future impact on the real world.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;RW #2 - AI Agents and Vertical SaaS&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-15T19:36:39.974Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7676cbb-996e-47f5-8079-ab1bdcb73dc0_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/rw-2-ai-agents-and-vertical-saas&quot;,&quot;section_name&quot;:&quot;Real-World&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:153124623,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Let&#8217;s now cover the most basic agent: ReAct agent. The <strong>ReAct</strong> (Reasoning and Acting) style agent operates in a loop of:</p><ol><li><p><em>thinking</em> (reasoning with the LLM)</p></li><li><p><em>acting</em> (calling a tool or API)</p></li><li><p><em>observing</em> (incorporating the tool's result)&#8203;</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!du7b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!du7b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 424w, https://substackcdn.com/image/fetch/$s_!du7b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 848w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1272w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png" width="590" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52700,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/161105148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!du7b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 424w, https://substackcdn.com/image/fetch/$s_!du7b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 848w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1272w, https://substackcdn.com/image/fetch/$s_!du7b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f2cf46-df40-48f9-a798-931222b0f70a_590x592.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This allows the agent to handle queries that the LLM alone might not answer, by dynamically invoking tools for additional information&#8203;. For example, if asked <em>&#8220;What&#8217;s the GDP of Spain in 2024?&#8221;</em>, a ReAct agent could decide to call a Wikipedia search tool to fetch the latest data.</p><p>In this issue, we will build a simple ReAct-style agent from scratch using <strong>LangGraph</strong> (LangChain's graph-based framework) and <strong>LangChain</strong> in Python. </p><p>We will <strong>not</strong> use any pre-built agent utilities; instead, we'll explicitly define the agent's graph nodes and conditional edges. The agent will be able to use a Wikipedia search tool automatically when needed. </p><p>In this issue, we will cover a <strong>Stateless Agent (single-turn)</strong>, which is a minimal ReAct agent that answers one question at a time without conversation memory. However, in future articles we will also cover a <strong>Stateful Agent (with memory)</strong> - an extension that keeps track of the conversation history so it can handle follow-up questions.</p><p>Let's get started by setting up our environment and then implementing the agent step by step.</p><h2>Setup</h2><p>First, install the required packages and set up any API keys. We'll use <strong>LangGraph</strong> (part of LangChain for building graph-based LLM workflows), LangChain's OpenAI chat model wrapper, and the <code>wikipedia</code> package for the Wikipedia search tool.</p><pre><code><code>!pip install -U langgraph langchain-openai wikipedia</code></code></pre><p>This imports the necessary classes and functions:</p><ul><li><p>LangChain&#8217;s chat model (we'll use OpenAI's GPT-4o for demonstration via <code>ChatOpenAI</code>),</p></li><li><p>LangChain&#8217;s tool decorator (<code>@tool</code>) to define our custom tool,</p></li><li><p>LangGraph&#8217;s components for building the state graph,</p></li><li><p>Message classes for constructing the conversation state.</p></li></ul><pre><code>from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.graph import StateGraph, END
from langchain_core.messages import BaseMessage, SystemMessage, ToolMessage</code></pre><blockquote><p><strong>Note:</strong> You will need an OpenAI API key if using <code>ChatOpenAI</code>. Make sure to set <code>OPENAI_API_KEY</code> as an environment variable or via <code>os.environ</code> before running the agent, or directly passing through the <code>api_key</code> argument. Alternatively, you could use a local or open-source model with a similar interface.</p></blockquote><p>Now, let's implement the agent!</p><h2>Agent</h2><p>In this section, we build a minimal ReAct agent that can answer one question (with tool use if needed) and does <strong>not</strong> retain any memory of previous interactions. The agent will use the LLM to decide on actions and will handle the reasoning-action loop for a single query.</p><p>We will implement the agent step by step:</p><ol><li><p><strong>Define the agent's state</strong> &#8211; the data structure representing the agent's memory or context (for a stateless single-turn agent, this will just include the current conversation messages).</p></li><li><p><strong>Set up the LLM and tool</strong> &#8211; initialize the language model and define a Wikipedia search tool using the <code>wikipedia</code> package. Give tools access to the model.</p></li><li><p><strong>Define LangGraph nodes and edges</strong> &#8211; create the reasoning node (LLM call), the tool node (executes the tool), and a conditional edge that decides whether to continue the loop or end it, based on the LLM's output.</p></li><li><p><strong>Compile and run the graph</strong> &#8211; combine the nodes into a <code>StateGraph</code>, then test the agent on a sample question to see it in action.</p></li></ol><h3>1. Defining the State Model</h3><p>LangGraph uses a state object to keep track of the conversation and any intermediate data. For a basic ReAct agent, the state can be as simple as a list of messages (chat history). We define a <code>TypedDict</code> for the state with a single key <code>"messages"</code> that will hold a sequence of messages. We also attach a <strong>reducer</strong> <code>add_messages</code> to this field &#8211; this ensures that when we return new messages from a node, they get appended to the state&#8217;s message list (instead of overwriting it)&#8203;.</p><pre><code>from typing import TypedDict, Sequence, Annotated
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    """State of the agent for one turn or conversation."""
    messages: Annotated[Sequence[BaseMessage], add_messages]</code></pre><p>Here, <code>AgentState["messages"]</code> will contain a sequence of chat messages (from system, user, AI, or tool). For a single-turn stateless agent, we'll initialize this with just the latest user question (and a system prompt internally).</p><h3>2. Setting up the LLM and Wikipedia Tool</h3><p>Next, initialize the LLM and define our tool. We use <code>ChatOpenAI</code> from LangChain to create a chat model instance. For demonstration, we'll use the <code>"gpt-4o"</code> model (which supports OpenAI's function calling), but you can use any model available (as long as it is compatible with tools calling).</p><p>We then define a <strong>Wikipedia search tool</strong> using the <code>@tool</code> decorator. This decorator turns a Python function into a LangChain tool that the agent can call. Our tool function will take a search query string, use the <code>wikipedia</code> library to fetch a summary of the top result, and return that summary text. The docstring of the function serves as the tool&#8217;s description for the LLM.</p><pre><code># Initialize the chat model (LLM) - make sure your API key is set
model = ChatOpenAI(model="gpt-4o", temperature=0, api_key=OPENAI_API_KEY)

# Define a Wikipedia search tool
import wikipedia

@tool
def wiki_search(query: str) -&gt; str:
    """Search Wikipedia for the query and return a brief summary of the top result."""
    try:
        # Fetch summary of top search result (we set it to 5 sentences)
        summary = wikipedia.summary(query, sentences=5)
        return summary
    except Exception as e:
        return f"Error: {e}"</code></pre><p>We set a low temperature for the model to minimize randomness, since we want it to reliably produce tool calls for unknown facts. If the API key was not set in the environment you also need to pass it as an argument.</p><p>The <code>wiki_search</code> tool uses the Wikipedia API (via the <code>wikipedia</code> package) to get information. For example, if asked about a person or event not known to the model, the agent can call <code>wiki_search</code> to get up-to-date info&#8203;. This is a common pattern &#8211; Wikipedia tools are often used to fetch summaries for factual questions&#8203;. For this example we set the number of sentences of the summary to 5, but that is something that can be changed according to your application needs.</p><h3>3. Defining the LangGraph Nodes and Conditional Logic</h3><p>Before defining the graph, let&#8217;s prepare the tools so the model can access them:</p><pre><code><code>import json

# Map tool name to the tool function for easy lookup
tools = [wiki_search]
tools_by_name = {tool.name: tool for tool in tools}

# Give the model access to the tools
model = model.bind_tools(tools) </code></code></pre><p>With the model and tool ready, we create the <strong>nodes</strong> of our agent's computation graph:</p><ul><li><p><strong>Reasoner Node (LLM call)</strong>: This node will call the LLM to either produce an answer or decide on a tool action. We&#8217;ll implement it as a function <code>call_model(state)</code>. It takes the current state (which contains the conversation messages so far) and returns the LLM's response as a new message. We include a system prompt to guide the LLM&#8217;s behavior (e.g., &#8220;You are a helpful assistant&#8230;&#8221;). The user&#8217;s query is in the state&#8217;s messages. We invoke the model with the system prompt plus all existing messages. LangChain&#8217;s <code>ChatOpenAI</code> can return a message that includes a <strong>function call</strong> if the model decides a tool is needed (under the hood, the model may use OpenAI&#8217;s function calling feature to request <code>wiki_search</code>).</p></li></ul><pre><code><code>def call_model(state: AgentState):
    """LLM reasoning node: call the chat model with system prompt + conversation."""
    system_prompt = SystemMessage(content="You are a helpful AI assistant. If needed, you can use the wiki_search tool to build your answer.")
    # Call the chat model with system + existing messages (user question is included in state["messages"])
    response = model.invoke([system_prompt] + list(state["messages"]))
    # Return the response as a list (to be appended to state's messages via reducer)
    return {"messages": [response]}</code></code></pre><ul><li><p><strong>Tool Node (execute tool)</strong>: This node executes any tool that the LLM requested. We implement <code>tool_node(state)</code> to inspect the latest message from the LLM for a tool call. If a tool call is present, we invoke the corresponding tool function and package its result into a special <strong>ToolMessage</strong>. The ToolMessage will be added to the state so the LLM can see the tool&#8217;s output on the next iteration.</p></li></ul><pre><code><code>def tool_node(state: AgentState):
    """Tool execution node: execute any tool calls the LLM asked for."""
    outputs = []
    # Check the last message from the LLM for tool calls
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        # If the model requested one or more tool calls, execute each
        for tool_call in last_message.tool_calls:
            tool_name = tool_call["name"]
            tool_args = tool_call["args"]
            if tool_name in tools_by_name:
                # Invoke the corresponding tool function with provided arguments
                result = tools_by_name[tool_name].invoke(tool_args)
            else:
                result = f"Tool '{tool_name}' not found."
            # Wrap the result in a ToolMessage for the LLM to read
            outputs.append(
                ToolMessage(
                    content=json.dumps(result),   # tool result as JSON string
                    name=tool_name, 
                    tool_call_id=tool_call.get("id")  # use id if provided
                )
            )
    # Return the tool outputs to be added to messages
    return {"messages": outputs}</code></code></pre><ul><li><p><strong>Conditional Edge (</strong><code>should_continue</code><strong>)</strong>: After each LLM reasoning step, we need to decide whether the agent should end with an answer or continue by using a tool. We define a function <code>should_continue(state)</code> that checks the LLM's last message. If the LLM did <strong>not</strong> request any tool (no function call), that means it produced a final answer, so the agent can end. If a tool was requested, we should continue to the tool node next. This function will return a flag (e.g., <code>"continue"</code> or <code>"end"</code>) that LangGraph uses to choose the next node.</p></li></ul><pre><code>def should_continue(state: AgentState) -&gt; str:
    """Decide whether to continue the ReAct loop or end it, based on last LLM message."""
    last_message = state["messages"][-1]
    print(last_message)
    # If the LLM's last message did not request a tool, we're done
    if not (hasattr(last_message, "tool_calls") and last_message.tool_calls):
        return "end"
    else:
        # There is a tool request, so continue to the tool node
        return "continue"</code></pre><p>A few notes on this implementation:</p><ul><li><p>In <code>call_model</code>, we prepend a system message that defines the assistant's role and hints that it can use the <code>wiki_search</code> tool if needed. We then pass all messages (including the user's message) to the chat model. The model may return a normal AI message (with a direct answer) or a function/tool call message. LangChain's <code>ChatOpenAI</code> will automatically format the function call request in a structured way if the model decides to use a tool.</p></li><li><p>In <code>tool_node</code>, we look at <code>last_message.tool_calls</code>. LangChain&#8217;s message objects have a <code>tool_calls</code> attribute that contains any tool/function call requests the model made&#8203;. If there's a tool call, it includes the tool <code>name</code> and <code>args</code>. We invoke the appropriate tool from our <code>tools_by_name</code> registry. The result is wrapped in a <code>ToolMessage</code> which includes the tool&#8217;s name and outputs. By returning <code>{"messages": [ToolMessage(...)])</code>, LangGraph's reducer will append this tool result message to the state&#8217;s messages list.</p></li><li><p><code>should_continue</code> examines the last message. If <code>tool_calls</code> is empty, the LLM didn't ask for any action &#8211; meaning it likely produced a final answer &#8211; so we return <code>"end"</code>. If there's a tool call, we return <code>"continue"</code>, signaling the graph to proceed to the tool execution step&#8203;. These return strings will be used to choose the next node via a conditional mapping.</p></li></ul><div><hr></div><h1><strong>&#8205;&#127891;Further Learning*</strong></h1><p>Let us present: &#8220;<a href="https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f">From Beginner to Advanced LLM Developer</a>&#8221;. This comprehensive course takes you <strong>from foundational skills to mastering scalable LLM products</strong> through <em>hands-on projects, fine-tuning, RAG, and agent development</em>. Whether you're building a standout portfolio, launching a startup idea, or enhancing enterprise solutions, this program equips you to lead the LLM revolution and thrive in a fast-growing, in-demand field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6iMW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 424w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 848w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1272w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png" width="612" height="338.2105263157895" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:760,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6iMW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 424w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 848w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1272w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Who Is This Course For?</strong></p><p>This certification is for software developers, machine learning engineers, data scientists or computer science and AI students to rapidly convert to an LLM Developer role and start building</p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h3>4. Constructing and Compiling the Graph</h3><p>Now we assemble the graph using LangGraph&#8217;s <code>StateGraph</code>. We add our two nodes (<code>"agent"</code> for the LLM reasoning and <code>"tool"</code> for the tool execution), set the entry point, and define the transitions. The critical part is adding a <strong>conditional edge</strong> from the LLM node to either the tool node or the end of the graph, based on <code>should_continue</code> function's output. We will map the <code>"continue"</code> signal to the <code>"tool"</code> node, and the <code>"end"</code> signal to <code>END</code> (a special marker indicating the graph should terminate). We also add a normal edge from the tool node back to the LLM node, creating a cycle: after using the tool, the agent goes back to the LLM to incorporate the new information.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-14-step-by-step-implementation">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY#13 - Sentiment Analysis with Bag-of-Words]]></title><description><![CDATA[Today we&#8217;re diving into sentiment analysis&#8212;a cool way to teach our models to decide if a movie review is cheering us on or giving us the boot. We&#8217;ll first introduce the NLP technique &#8220;Bag-of-Words&#8221; and then we&#8217;ll use an IMDb movie reviews dataset, clean the text, tokenize it, create a bag-of-words representation, and finally train a logistic regression model. Let&#8217;s get our hands dirty and see exactly what&#8217;s going on!]]></description><link>https://mlpills.substack.com/p/diy13-sentiment-analysis-with-bag</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy13-sentiment-analysis-with-bag</guid><dc:creator><![CDATA[Muhammad Anas]]></dc:creator><pubDate>Sat, 08 Mar 2025 12:45:42 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7ceec447-d955-4868-8a1a-25ffb689a41b_1275x848.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; <strong>Pill of the Week</strong></h1><p>Today we&#8217;re diving into sentiment analysis&#8212;a cool way to teach our models to decide if a movie review is cheering us on or giving us the boot. We&#8217;ll first introduce the NLP technique &#8220;Bag-of-Words&#8221; and then we&#8217;ll use an IMDb movie reviews dataset, clean the text, tokenize it, create a bag-of-words representation, and finally train a logistic regression model. Let&#8217;s get our hands dirty and see exactly what&#8217;s going on! </p><blockquote><p><strong>Thanks to <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Muhammad Anas&quot;,&quot;id&quot;:236084597,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68943441-be86-431c-9ca5-876167a3ab9e_3024x4032.jpeg&quot;,&quot;uuid&quot;:&quot;1538c3c8-3cc2-406c-b025-6715e85e04a5&quot;}" data-component-name="MentionToDOM"></span> for all the code and explanations!</strong></p></blockquote><h2><strong>Bag-of-Words</strong></h2><p>Before we jump into the coding fun, let's explore the Bag-of-Words (BoW) method, which serves as a fundamental building block in natural language processing and text analysis.</p><h3>What It Is</h3><p>Bag-of-Words is a simple yet powerful technique that <strong>converts text documents into numerical vectors that machine learning algorithms can understand</strong>. The name "bag" comes from the fact that this <strong>method disregards grammar and word order, treating text as an unordered collection (or bag) of words</strong>. </p><p>The method works by:</p><ol><li><p><strong>Splitting each text</strong> (or review) into individual words (tokens).</p></li><li><p><strong>Building a vocabulary of all unique words</strong> found across all documents in the corpus.</p></li><li><p><strong>Representing each document as a numerical vector</strong> where each element corresponds to the count of a specific word from the vocabulary.</p></li></ol><h3>The Historical Context</h3><p>The Bag-of-Words approach has roots in information retrieval systems from the 1950s and gained popularity in the 1990s with the growth of digital document collections. Despite its simplicity compared to modern deep learning approaches, <strong>BoW remains relevant because it's computationally efficient and surprisingly effective for many text classification tasks, including sentiment analysis.</strong></p><h3>Why It Works</h3><p>Since natural language contains thousands or even millions of unique words, but <strong>each individual document</strong> (like a movie review)<strong> only uses a small subset of those words</strong>, most positions in these vectors are zero. This "sparsity" makes it computationally efficient to store and process, even for large datasets.</p><p>Additionally, <strong>certain words strongly correlate with sentiment</strong>. For example, words like "excellent" and "terrible" are powerful indicators of positive and negative sentiment, respectively. </p><p><strong>The BoW model captures these correlations effectively</strong>, making it well-suited for sentiment analysis tasks.</p><h3>A Concrete Example</h3><p>Let's bring this to life with a more detailed example to really understand how BoW works:</p><p>Imagine we have three movie reviews in our corpus:</p><ol><li><p><em>"The movie was excellent and I enjoyed it."</em></p></li><li><p><em>"The acting was terrible but the plot was good."</em></p></li><li><p><em>"I thought the film was boring and predictable."</em></p></li></ol><p>The BoW approach transforms these text documents into numerical vectors through a systematic process:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!seOy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!seOy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 424w, https://substackcdn.com/image/fetch/$s_!seOy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 848w, https://substackcdn.com/image/fetch/$s_!seOy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 1272w, https://substackcdn.com/image/fetch/$s_!seOy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!seOy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png" width="606" height="271.2429099876695" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:363,&quot;width&quot;:811,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:34692,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!seOy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 424w, https://substackcdn.com/image/fetch/$s_!seOy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 848w, https://substackcdn.com/image/fetch/$s_!seOy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 1272w, https://substackcdn.com/image/fetch/$s_!seOy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac5cecdd-c75d-45dc-a80a-5924880f32f1_811x363.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dHlJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dHlJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 424w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 848w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 1272w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dHlJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png" width="610" height="176.32226322263222" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:235,&quot;width&quot;:813,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:20590,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dHlJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 424w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 848w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 1272w, https://substackcdn.com/image/fetch/$s_!dHlJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e923d2-354b-4ab3-8c15-075ffc426f1e_813x235.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GZEz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GZEz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 424w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 848w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 1272w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GZEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png" width="610" height="397.46943765281173" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:818,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:35665,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GZEz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 424w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 848w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 1272w, https://substackcdn.com/image/fetch/$s_!GZEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a119b19-d8f5-4126-99b6-48fdeac98932_818x533.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NuFA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NuFA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 424w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 848w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NuFA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png" width="604" height="755.9355638166047" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1010,&quot;width&quot;:807,&quot;resizeWidth&quot;:604,&quot;bytes&quot;:46439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NuFA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 424w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 848w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!NuFA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faced3514-3aea-4b9c-9248-5f39a6c771ce_807x1010.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pd7H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pd7H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 424w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 848w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pd7H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png" width="604" height="774.7760097919216" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:817,&quot;resizeWidth&quot;:604,&quot;bytes&quot;:54635,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pd7H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 424w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 848w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!pd7H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7faeb809-1708-42e1-b113-c59c3381c9d8_817x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>&#128204; Key Insights:</strong></p><ol><li><p>BoW ignores word order and just counts occurrences</p></li><li><p>Documents with similar sentiment often share key words</p></li><li><p>Notice how sentiment words like "excellent", "terrible", and "boring" stand out in their respective documents</p></li><li><p>Perfect for sentiment analysis as positive/negative words get their own dimensions</p></li><li><p>Each review becomes a point in a multi-dimensional space where ML algorithms can find patterns</p></li></ol><h3>Limitations and Considerations</h3><p>While powerful, BoW has some limitations to keep in mind:</p><ul><li><p>It ignores word order, which can sometimes be important ("not good" has a different meaning than "good not").</p></li><li><p>It creates high-dimensional, sparse vectors when vocabulary is large.</p></li><li><p>Common words like "the" or "and" may dominate counts without contributing much meaning.</p></li></ul><p>We'll address some of these limitations in our preprocessing steps, and many advanced NLP techniques build upon this foundation by addressing these weaknesses.</p><p>Now that we understand the conceptual framework, let's dive into implementing Bag-of-Words for our sentiment analysis task!</p><div><hr></div><h1><strong>&#8205;&#127891;Further Learning*</strong></h1><p>Let us present: &#8220;<a href="https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f">From Beginner to Advanced LLM Developer</a>&#8221;. This comprehensive course takes you <strong>from foundational skills to mastering scalable LLM products</strong> through <em>hands-on projects, fine-tuning, RAG, and agent development</em>. Whether you're building a standout portfolio, launching a startup idea, or enhancing enterprise solutions, this program equips you to lead the LLM revolution and thrive in a fast-growing, in-demand field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6iMW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 424w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 848w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1272w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png" width="612" height="338.2105263157895" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:760,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=3b122f&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6iMW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 424w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 848w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1272w, https://substackcdn.com/image/fetch/$s_!6iMW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f077642-4adc-4b0f-8afc-2c1ea26f05ab_760x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Who Is This Course For?</strong></p><p>This certification is for software developers, machine learning engineers, data scientists or computer science and AI students to rapidly convert to an LLM Developer role and start building</p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h1>&#128736;&#65039; Do It Yourself</h1><p>Since last week we missed the issue, this one we will compensate with a longer one: theory + practice! Apologies for that and I hope you like it! </p><blockquote><p><strong>At the end, as usual, you&#8217;ll get a notebook with all the code!!</strong></p></blockquote><p>These are the steps:</p><ol><li><p><strong>Load and Prepare Data</strong>: Import necessary tools and load the movie review data, setting the stage for analysis.</p></li><li><p><strong>Clean the Text</strong>: Remove unwanted elements like HTML and punctuation, making the text ready for analysis by standardizing it.</p></li><li><p><strong>Convert Text to Numbers</strong>: Transform the cleaned text into numerical data using a "bag-of-words" model, so the computer can understand it.</p></li><li><p><strong>Train a Predictive Model</strong>: Split the data into training and testing sets, and train a logistic regression model to predict sentiment.</p></li><li><p><strong>Analyze the Results</strong>: Interpret the model's performance and understand which words contribute to positive or negative reviews.</p></li></ol><p>Let&#8217;s finally begin!</p><h2>Step 1: Importing Libraries &amp; Loading the Data</h2><p>First things first: we import necessary libraries and load the data from GitHub.<br>We&#8217;re using pandas to load a CSV file directly from a GitHub repository that contains thousands of IMDb reviews. This dataset has two columns: one for the review text and one for the sentiment (labeled "positive" or "negative").</p><pre><code><code>import numpy as np
import pandas as pd

# Set a fixed seed to ensure our results are reproducible.
np.random.seed(42)

# Load the IMDb movie reviews dataset from GitHub.
# Note: This dataset contains text reviews and sentiment labels.
url = (
    "https://raw.githubusercontent.com/Ankit152/IMDB-sentiment-analysis/refs/heads/master/IMDB-Dataset.csv"
)
df = pd.read_csv(url)

# Check the shape of the data (rows, columns) and preview the first few records.
print("Data shape:", df.shape)
print(df.head())
</code></code></pre><p><em>What&#8217;s happening?</em></p><ul><li><p>We set the seed for reproducibility.</p></li><li><p>We load the CSV from GitHub.</p></li><li><p>We print out the shape and a preview so we know the data is what we expect.</p></li></ul><p></p><h2>Step 2: Preprocessing the Text</h2><p>Before we train a model, our text needs some cleaning. This step removes unwanted HTML tags, punctuation, and normalizes everything to lowercase. We also extract emoticons because they can be full of sentiment (think of :-) versus :().</p><pre><code><code>import re

def preprocessor(text):
    """
    Clean the input text:
    - Remove HTML markup.
    - Extract emoticons and preserve them.
    - Remove non-word characters (like punctuation) and convert to lowercase.
    - Append cleaned emoticons (without the hyphen) back to the text.
    """
    # Remove HTML tags using regex
    text = re.sub(r"&lt;[^&gt;]*&gt;", "", text)
    
    # Find emoticons (patterns like :), :-), :D, etc.)
    emoticons = re.findall(r"(?::|;|=)(?:-)?(?:\)|\(|D|P)", text)
    
    # Remove non-word characters, change text to lowercase, and append emoticons at the end.
    text = re.sub(r"[\W]+", " ", text.lower()) + " " + " ".join(emoticons).replace("-", "")
    return text

# Apply the preprocessor to our reviews
df["review_clean"] = df["review"].apply(preprocessor)

# Print a sample cleaned review (displaying the last 100 characters for brevity)
print("\nSample cleaned review:", df.loc[0, "review_clean"][-100:])
</code></code></pre><p><em>More details:</em></p><ul><li><p><strong>Removing HTML Tags (</strong><code>&lt;[^&gt;]*&gt;</code><strong>)</strong><br>&#8226; Looks for the <code>&lt;</code> character and captures everything until the next <code>&gt;</code>, removing HTML elements such as <code>&lt;div&gt;</code> or <code>&lt;br&gt;</code>.</p></li><li><p><strong>Extracting Emoticons</strong><br>&#8226; The pattern <code>(?::|;|=)(?:-)?(?:\)|\(|D|P)</code> finds common emoticons (like <code>:-)</code>, <code>:D</code>), ensuring emotional cues are not lost.</p></li><li><p><strong>Cleaning Text with </strong><code>[\W]+</code><br>&#8226; Replaces sequences of non-word characters (anything besides letters, digits, or underscores) with a space, standardizing the text while keeping the emoticons to preserve sentiment.</p></li></ul><p></p><h2>Step 3: Tokenizing &amp; Building the Bag-of-Words Model</h2><p>Tokenization is simply breaking text into individual words (tokens). The bag-of-words approach will create a vocabulary of unique words and count the number of times each appears in a review. Here, we use scikit-learn&#8217;s <code>CountVectorizer</code> to do the heavy lifting.</p><pre><code><code>from sklearn.feature_extraction.text import CountVectorizer

# Instantiate the CountVectorizer.
# This will first split our text into tokens and then count occurrences.
vectorizer = CountVectorizer()

# Fit the vectorizer on our cleaned review texts and transform them into numerical feature vectors.
X = vectorizer.fit_transform(df["review_clean"])

# Let&#8217;s inspect a small portion of the resulting vocabulary.
print("\nSample vocabulary mapping (word -&gt; index):")
sample_vocab = dict(list(vectorizer.vocabulary_.items())[:10])
print(sample_vocab)

# Also print the bag-of-words array for the first 3 reviews.
print("\nBag-of-words representation for the first 3 reviews:")
print(X[:3].toarray())
</code></code></pre><p><em>Explanation:</em></p><ul><li><p><strong>Tokenization &amp; Vocabulary Building:</strong><br>&#8226; <code>CountVectorizer</code> splits each review into words (tokens) and collects unique words into a large vocabulary.</p></li><li><p><strong>Vector Representation:</strong><br>&#8226; Each review is then transformed into a vector where each number shows how many times a word from the vocabulary appears.</p></li><li><p><strong>Sparse Matrix:</strong><br>&#8226; Since most reviews only use a small fraction of the full vocabulary, many entries are zero, resulting in a sparse matrix.</p></li><li><p><strong>Example:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fOuU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fOuU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 424w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 848w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fOuU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png" width="607" height="504.9051987767584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1088,&quot;width&quot;:1308,&quot;resizeWidth&quot;:607,&quot;bytes&quot;:84594,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mlpills.substack.com/i/158641950?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fOuU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 424w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 848w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!fOuU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11c998ea-dedb-4512-9723-acd1f5ba71a4_1308x1088.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><h2>Step 4: Splitting Data &amp; Training a Logistic Regression Model</h2><p>Next, we transform the sentiment labels into a binary format (e.g., positive = 1, negative = 0), split our data into training and test sets, and then train a logistic regression classifier. Logistic regression is a popular choice due to its simplicity and interpretability&#8212;its coefficients tell us which words influence predictions.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy13-sentiment-analysis-with-bag">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #12 - SHAP in Action: Making ML Explainable]]></title><description><![CDATA[Have you ever trained a machine learning model and thought, &#8220;Okay, it works&#8230; but why does it work?&#8221; Or maybe you&#8217;ve been asked, &#8220;Why did the model make this prediction?&#8221; and found yourself fumbling for an answer.Well, you&#8217;re not alone. Machine learning models can feel like mysterious black boxes sometimes. But today, we&#8217;re going to crack that box open with SHAP (SHapley Additive exPlanations). SHAP is like a detective for your model&#8212;it tells you exactly how each feature contributes to a prediction.]]></description><link>https://mlpills.substack.com/p/diy-12-shap-in-action-making-ml-explainable</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-12-shap-in-action-making-ml-explainable</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 15 Feb 2025 14:50:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fcc23edb-d62b-4b1d-b9d3-068c86893092_1449x915.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the Week</h1><p>Have you ever trained a machine learning model and thought, <em>&#8220;Okay, it works&#8230; but why does it work?&#8221;</em> Or maybe you&#8217;ve been asked, <em>&#8220;Why did the model make this prediction?&#8221;</em> and found yourself fumbling for an answer.</p><p>Well, you&#8217;re not alone. Machine learning models can feel like mysterious black boxes sometimes. But today, we&#8217;re going to crack that box open with <strong>SHAP (SHapley Additive exPlanations)</strong>. SHAP is like a detective for your model&#8212;it tells you exactly how each feature contributes to a prediction.</p><p>Do you want more info? You can check our previous article:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;aef8dde6-cbf5-4926-9ae0-e1e0c9ff5627&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #88 - Introduction to SHAP values&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:236084597,&quot;name&quot;:&quot;Muhammad Anas&quot;,&quot;bio&quot;:&quot;random drops multiple times a week to keep you up to date with the field&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68943441-be86-431c-9ca5-876167a3ab9e_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-01-29T21:40:39.246Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e77d19b8-164e-4166-8f77-3ccb72999a61_1449x915.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-88-introduction-to-shap-values&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:154824629,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><h1>&#128736;&#65039; DIY</h1><p>In this post, we&#8217;ll:</p><ol><li><p>Load the California Housing dataset.</p></li><li><p>Train a simple linear regression model.</p></li><li><p>Use SHAP to explain the model&#8217;s predictions.</p></li><li><p>Create <strong>four awesome visualizations</strong> to understand feature importance and individual predictions.</p></li></ol><p>By the end, you&#8217;ll not only understand your model better but also have some cool visualizations to show off. </p><blockquote><p><em>&#9999;&#65039; Article and code by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Muhammad Anas&quot;,&quot;id&quot;:236084597,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68943441-be86-431c-9ca5-876167a3ab9e_3024x4032.jpeg&quot;,&quot;uuid&quot;:&quot;d38223d8-e69f-4bbe-8155-b7600f8c0b21&quot;}" data-component-name="MentionToDOM"></span>. </em></p></blockquote><p>Let&#8217;s dive in! </p><h2>Step 1: Importing Libraries</h2><p>Before we start, let&#8217;s grab the tools we need. Think of this as packing your backpack before a hike.</p><pre><code><code># --- NUMERICAL COMPUTATION ---
import numpy as np  # Provides fast array operations and random number functions

# --- DATA MANIPULATION ---
import pandas as pd  # Allows us to work with tabular (DataFrame) data

# --- PLOTTING LIBRARIES ---
import matplotlib.pyplot as plt  # Essential for creating plots
import seaborn as sns  # Enhances the aesthetics of our matplotlib plots

# --- MODELING LIBRARY ---
from sklearn.linear_model import LinearRegression  # Implements simple linear models

# --- MODEL EXPLANATION ---
import shap  # Enables us to explain machine learning predictions using SHAP values

# Set a fixed seed to ensure our results are reproducible every time we run the script.
np.random.seed(42)
</code></code></pre><p><strong>Why these libraries?</strong></p><ul><li><p><strong>NumPy &amp; Pandas:</strong> To handle and manipulate data.</p></li><li><p><strong>Matplotlib &amp; Seaborn:</strong> To create beautiful visualizations.</p></li><li><p><strong>scikit-learn:</strong> To train a simple, interpretable linear regression model.</p></li><li><p><strong>SHAP:</strong> The star of the show&#8212;used to explain model predictions.</p></li></ul><p></p><h2>Step 2: Loading the Data</h2><p>Let&#8217;s load the California Housing dataset. This dataset contains information about housing blocks in California, such as median income, house age, and population.</p><pre><code><code># Load the California Housing dataset
X, y = shap.datasets.california(n_points=1000)

# Create a background dataset for SHAP analysis
# This background dataset is used to estimate the expected values when features are "missing" in SHAP value calculation
X_background = shap.utils.sample(X, 100)
</code></code></pre><p><strong>What&#8217;s happening here?</strong></p><ul><li><p><code>shap.datasets.california</code><strong>:</strong> Loads the dataset.</p></li><li><p><code>X</code><strong>:</strong> The features (e.g., median income, house age).</p></li><li><p><code>y</code><strong>:</strong> The target variable (median house value).</p></li><li><p><code>X_background</code><strong>:</strong> A smaller sample of the data used by SHAP to calculate expected predictions.</p></li></ul><p></p><h2>Step 3: Training the Model</h2><p>Now that we have the data, let&#8217;s train a simple linear regression model.</p><pre><code><code># Initialize the Linear Regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)
</code></code></pre><p><strong>Why linear regression?</strong></p><ul><li><p>It&#8217;s simple and interpretable.</p></li><li><p>Each feature gets a coefficient, which tells us how much it contributes to the prediction.</p></li></ul><p>At the moment we are this step (image taken from <a href="https://mlpills.substack.com/p/issue-88-introduction-to-shap-values">previous article</a>), we have our kind of black box model, that we is working well but we don&#8217;t know exactly why:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mdil!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mdil!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 424w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 848w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 1272w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mdil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png" width="391" height="309.9037037037037" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:535,&quot;width&quot;:675,&quot;resizeWidth&quot;:391,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mdil!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 424w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 848w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 1272w, https://substackcdn.com/image/fetch/$s_!Mdil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49cd2bdd-b97e-485d-ac2a-0a49319885ee_675x535.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Step 4: Analyzing Model Coefficients</h2><p>Here&#8217;s a question: <em>Which features matter the most in predicting house prices?</em></p><p>Let&#8217;s look at the model&#8217;s coefficients to find out:</p><pre><code><code>print("Model Coefficients:")
for idx, feature in enumerate(X.columns):
    coef = model.coef_[idx]
    direction = "increases" if coef &gt; 0 else "decreases"
    print(f"{feature}: {coef:.5f} ({direction} house value)")
</code></code></pre><pre><code>Model Coefficients:
MedInc: 0.42563 (increases house value)
HouseAge: 0.01033 (increases house value)
AveRooms: -0.11610 (decreases house value)
AveBedrms: 0.66385 (increases house value)
Population: 0.00003 (increases house value)
AveOccup: -0.26096 (decreases house value)
Latitude: -0.46734 (decreases house value)
Longitude: -0.46272 (decreases house value)</code></pre><p><strong>What does this tell us?</strong></p><ul><li><p>A <strong>positive coefficient</strong> means the feature increases house value.</p></li><li><p>A <strong>negative coefficient</strong> means the feature decreases house value.</p></li></ul><p>For example, if the coefficient for <code>MedInc</code> (median income) is 0.5, it means that for every 1-unit increase in <code>MedInc</code>, the predicted house value increases by 0.5 units.</p><p></p><h2>Step 5: Explaining Predictions with SHAP</h2><p>Here&#8217;s where the magic happens. Let&#8217;s use SHAP to explain our model&#8217;s predictions. Something like this (from our <a href="https://mlpills.substack.com/p/issue-88-introduction-to-shap-values">previous article</a>):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yMa0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yMa0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 424w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 848w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 1272w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yMa0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png" width="415" height="231.54719235364396" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:467,&quot;width&quot;:837,&quot;resizeWidth&quot;:415,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yMa0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 424w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 848w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 1272w, https://substackcdn.com/image/fetch/$s_!yMa0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd13732cc-bded-4d42-bb80-8886fffb2fb0_837x467.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Initializing the SHAP Explainer</h3><pre><code><code># Create a SHAP KernelExplainer using our model&#8217;s predict function, with the background dataset.
explainer = shap.KernelExplainer(model.predict, X_background)
</code></code></pre><p><strong>What&#8217;s happening here?</strong></p><ul><li><p><strong>KernelExplainer:</strong> Works with any model (even non-linear ones).</p></li><li><p><strong>Background dataset:</strong> Helps SHAP calculate the expected prediction when features are &#8220;missing.&#8221;</p></li></ul><h3>Computing SHAP Values</h3><pre><code><code># Compute SHAP values for the first 100 samples
X_display = X.iloc[:100]
shap_values = explainer.shap_values(X_display)
</code></code></pre><p><strong>Why only 100 samples?</strong><br> SHAP calculations can be computationally expensive. Using a subset makes it faster while still providing meaningful insights.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://everydaynews.substack.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7m8L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 424w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 848w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 1272w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7m8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png" width="232" height="49.594202898550726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2352debf-0501-4a26-adca-06356ad3118c_276x59.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:59,&quot;width&quot;:276,&quot;resizeWidth&quot;:232,&quot;bytes&quot;:9243,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://everydaynews.substack.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7m8L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 424w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 848w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 1272w, https://substackcdn.com/image/fetch/$s_!7m8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2352debf-0501-4a26-adca-06356ad3118c_276x59.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Before you continue&#8230; <strong>Have you ever wondered how to build an AI agent that thinks and acts intelligently?</strong> Creating one requires several key steps. Check out this article from our colleagues at <a href="https://everydaynews.substack.com/">Everyday | AI for all</a> to learn more about this:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:156450168,&quot;url&quot;:&quot;https://everydaynews.substack.com/p/40-the-guide-for-developing-ai-agents&quot;,&quot;publication_id&quot;:3396369,&quot;publication_name&quot;:&quot;Everyday | AI for all&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e30e3b4-4ca0-4c46-ad83-7418ec82fc26_500x500.png&quot;,&quot;title&quot;:&quot;#40 The guide for developing AI Agents succesfully&quot;,&quot;truncated_body_text&quot;:&quot;Have you ever wondered how to create an AI agent that thinks and acts intelligently?&quot;,&quot;date&quot;:&quot;2025-02-04T12:24:02.317Z&quot;,&quot;like_count&quot;:3,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:193950065,&quot;name&quot;:&quot;Everyday by Pol&quot;,&quot;handle&quot;:&quot;everydaynews&quot;,&quot;previous_name&quot;:&quot;Everyday&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35d2197c-fe60-4449-a001-38d2c48e7829_432x432.jpeg&quot;,&quot;bio&quot;:&quot;Welcome to Everyday, a newsletter designed to demystify Artificial Intelligence and make it accessible to everyone. &quot;,&quot;profile_set_up_at&quot;:&quot;2024-11-22T10:56:08.977Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:3460959,&quot;user_id&quot;:193950065,&quot;publication_id&quot;:3396369,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:3396369,&quot;name&quot;:&quot;Everyday | AI for all&quot;,&quot;subdomain&quot;:&quot;everydaynews&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;AI for Everyone. No prior AI knowledge needed! Get concise, easy-to-understand insights into the essential concepts driving today&#8217;s AI revolution.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e30e3b4-4ca0-4c46-ad83-7418ec82fc26_500x500.png&quot;,&quot;author_id&quot;:193950065,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2024-11-22T10:56:39.259Z&quot;,&quot;email_from_name&quot;:&quot;Pol from Everyday&quot;,&quot;copyright&quot;:&quot;Everyday | AI for all&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:&quot;en&quot;,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;handle&quot;:&quot;mlpills&quot;,&quot;previous_name&quot;:&quot;David&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;profile_set_up_at&quot;:&quot;2023-01-18T10:52:00.083Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1314891,&quot;user_id&quot;:38707812,&quot;publication_id&quot;:1354140,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:1354140,&quot;name&quot;:&quot;Machine Learning Pills&quot;,&quot;subdomain&quot;:&quot;mlpills&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Weekly pill-sized articles on key Machine Learning and AI concepts.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;author_id&quot;:38707812,&quot;theme_var_background_pop&quot;:&quot;#45D800&quot;,&quot;created_at&quot;:&quot;2023-01-29T11:26:31.488Z&quot;,&quot;email_from_name&quot;:&quot;Machine Learning Pills&quot;,&quot;copyright&quot;:&quot;MLPills&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}},{&quot;id&quot;:1281747,&quot;user_id&quot;:38707812,&quot;publication_id&quot;:1322272,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:1322272,&quot;name&quot;:&quot;DSBoost&quot;,&quot;subdomain&quot;:&quot;dsboost&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Boost your Data Science knowledge&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/816ead9b-04fa-4c58-8b84-17d4846348b3_410x410.png&quot;,&quot;author_id&quot;:75251854,&quot;theme_var_background_pop&quot;:&quot;#2096FF&quot;,&quot;created_at&quot;:&quot;2023-01-17T17:33:39.470Z&quot;,&quot;email_from_name&quot;:&quot;DSBoost by David &amp; Levi&quot;,&quot;copyright&quot;:&quot;David &amp; Levi&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;daansan_ml&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://everydaynews.substack.com/p/40-the-guide-for-developing-ai-agents?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Jjyu!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e30e3b4-4ca0-4c46-ad83-7418ec82fc26_500x500.png" loading="lazy"><span class="embedded-post-publication-name">Everyday | AI for all</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">#40 The guide for developing AI Agents succesfully</div></div><div class="embedded-post-body">Have you ever wondered how to create an AI agent that thinks and acts intelligently&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 3 likes &#183; Everyday by Pol and David Andr&#233;s</div></a></div><div><hr></div><h2>Step 6: Creating SHAP Visualizations</h2><p>Let&#8217;s create some awesome visualizations to understand our model better.</p><h3>SHAP Summary Plot</h3><p><strong>Question:</strong> <em>Which features have the biggest impact on predictions?</em></p><pre><code><code>sns.set_style("whitegrid")
plt.figure(figsize=(14, 10))
shap.summary_plot(shap_values, X_display, show=False)
plt.title("SHAP Summary Plot: Distribution of Feature Impacts", fontsize=14)
plt.xlabel("SHAP Value (Effect on Prediction)", fontsize=12)
plt.ylabel("Features", fontsize=12)
plt.tight_layout()
plt.show()
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2oMg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2oMg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 424w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 848w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2oMg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:258815,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2oMg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 424w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 848w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!2oMg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F366ecc41-b8ab-4bd2-9dcf-f8a309968344_2390x1348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What does this plot show?</strong><br> This plot is like a feature leaderboard. The most important features are at the top, and the least important ones are at the bottom. Each dot represents a single data point, and the position of the dot shows how much that feature influenced the prediction.</p><ul><li><p><strong>Red dots:</strong> High feature values.</p></li><li><p><strong>Blue dots:</strong> Low feature values.</p></li></ul><p>For example, if you look at <code>MedInc</code> (median income), you&#8217;ll see that higher values (red dots) tend to push predictions up, while lower values (blue dots) pull them down.<br></p><h3>Partial Dependence Plot for Median Income</h3><p><strong>Question:</strong> <em>How does median income affect house prices?</em></p><pre><code><code>plt.figure(figsize=(14, 10))
shap.partial_dependence_plot("MedInc", model.predict, X_background,
                             ice=False, model_expected_value=True,
                             feature_expected_value=True, show=False)
plt.title("Partial Dependence: Impact of Median Income", fontsize=14)
plt.xlabel("Median Income (MedInc)", fontsize=12)
plt.ylabel("Effect on Prediction", fontsize=12)
plt.tight_layout()
plt.show()
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8QZx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8QZx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 424w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 848w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 1272w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8QZx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png" width="1456" height="1045" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1045,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:188232,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8QZx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 424w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 848w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 1272w, https://substackcdn.com/image/fetch/$s_!8QZx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39587a76-3e79-4e74-9336-d1ee16954d5e_1917x1376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What does this plot tell us?</strong><br> The blue line shows how house prices change as median income increases. It&#8217;s almost perfectly straight, meaning the relationship is linear&#8212;when median income goes up, house prices go up proportionally.</p><p>The gray band around the line is the 95% confidence interval, showing the range of uncertainty in the predictions.<br> </p><h3>Feature Importance Bar Plot</h3><p><strong>Question:</strong> <em>Which features are the most important overall?</em></p><pre><code><code>plt.figure(figsize=(14, 10))
shap.summary_plot(shap_values, X_display, plot_type="bar", show=False)
plt.title("Feature Importance: Average Impact on Predictions", fontsize=14)
plt.xlabel("Mean |SHAP Value|", fontsize=12)
plt.tight_layout()
plt.show()
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Jb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Jb7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 424w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 848w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Jb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173380,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1Jb7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 424w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 848w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!1Jb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a23797-eefa-471b-9745-c14b2f59f390_2390x1348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What&#8217;s happening here?</strong><br> This plot ranks features by their average impact on predictions. Longer bars mean more important features. For example, <code>Latitude</code> and <code>Longitude</code> are the most important features in our model.</p><p></p><h3>Force Plot for an Individual Prediction</h3><p><strong>Question:</strong> <em>Why did the model make this specific prediction?</em></p><pre><code><code>plt.figure(figsize=(16, 6))
shap.force_plot(explainer.expected_value, shap_values[0], X_display.iloc[0],
                matplotlib=True, show=False, text_rotation=0)
plt.title("Force Plot: Detailed Breakdown of an Individual Prediction", fontsize=14)
plt.tight_layout()
plt.show()</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D7Hh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D7Hh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 424w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 848w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 1272w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D7Hh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png" width="1305" height="262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:262,&quot;width&quot;:1305,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37812,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D7Hh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 424w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 848w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 1272w, https://substackcdn.com/image/fetch/$s_!D7Hh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae409c7-4794-4cf9-8ee8-e0454b9e850f_1305x262.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>What&#8217;s going on here?</strong><br>This plot breaks down the prediction into its individual components. Think of it like a tug-of-war:</p><ul><li><p><strong>Red bars:</strong> Features that push the prediction higher.</p></li><li><p><strong>Blue bars:</strong> Features that pull the prediction lower.</p></li></ul><p>The base value (the model&#8217;s average prediction) is the starting point, and the final prediction is the sum of all the feature contributions.</p><div class="pullquote"><p><em><strong>&#9888;&#65039;Notebook with all the code at the end of the issue &#9888;&#65039;</strong></em></p></div><h2>Wrapping Up</h2><p>In this post, we:</p><ol><li><p>Trained a simple linear regression model.</p></li><li><p>Used SHAP to explain the model&#8217;s predictions.</p></li><li><p>Created four visualizations to understand feature importance and individual predictions.</p></li></ol><p>SHAP helps you build trust in your models by making them explainable. It&#8217;s like having a flashlight in a dark room&#8212;you can finally see what&#8217;s going on.</p><p>What&#8217;s next? Try using SHAP with a more complex model, like a random forest or neural network. You&#8217;ll be amazed at the insights you can uncover!</p><div class="pullquote"><p>As a <strong>premium subscriber</strong>, you&#8217;ll get an<strong> exclusive deep dive into handling SHAP in complex models, including tree-based and ensemble methods</strong>. We&#8217;ll cover best practices for <strong>interpreting non-linear relationships, common pitfalls in SHAP analysis, and how to optimize performance for large datasets.</strong> Plus, you&#8217;ll receive the <strong>full Jupyter Notebook</strong> with all the code from this tutorial&#8212;so you can experiment with everything hands-on.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mlpills.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mlpills.substack.com/subscribe?"><span>Subscribe now</span></a></p></div><h1><strong>&#8205;&#127891;Further Learning*</strong></h1><p>Are you ready to go from zero to building real-world machine learning projects?</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://gumroad.com/a/374949139/atrfr" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BJ86!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BJ86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png" width="380" height="213.75" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://gumroad.com/a/374949139/atrfr&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!BJ86!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!BJ86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39bb89f-a602-45cc-8f22-5dbbd3cd346e_1280x720.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Join the<strong><a href="https://gumroad.com/a/374949139/atrfr"> AI Learning Hub</a></strong>, a program that will take you through every step of AI mastery&#8212;from Python basics to deploying and scaling advanced AI systems.</p><p><strong>Why Join?</strong><br>&#10004; 10+ hours of content, from fundamentals to cutting-edge AI.<br>&#10004; Real-world projects to build your portfolio.<br>&#10004; Lifetime access to all current and future materials.<br>&#10004; A private community of learners and professionals.<br>&#10004; Direct feedback and mentorship.</p><p><strong>What You&#8217;ll Learn:</strong></p><ul><li><p>Python, Pandas, and Data Visualization</p></li><li><p>Machine Learning &amp; Deep Learning fundamentals</p></li><li><p>Model deployment with MLOps tools like Docker, Kubernetes, and MLflow</p></li><li><p>End-to-end projects to solve real-world problems</p></li></ul><blockquote><p>&#128279; <a href="https://gumroad.com/a/374949139/atrfr">Join the AI Learning Hub </a><em><a href="https://gumroad.com/a/374949139/atrfr">(Lifetime Learning Access)</a></em></p><p>&#128279; <a href="https://gumroad.com/a/374949139/gawze">Join the AI Learning Hub </a><em><a href="https://gumroad.com/a/374949139/gawze">(Monthly Membership)</a></em></p></blockquote><p>Take the leap into AI with the roadmap designed for continuous growth, hands-on learning, and a vibrant support system.</p><p><em>*Sponsored: by purchasing any of their courses you would also be <strong>supporting MLPills</strong>.</em></p><div><hr></div><h1>&#9889;Power-Up Corner</h1><p>You now have a solid understanding of SHAP and how to interpret its outputs&#8212;but what if we go a step further? Let&#8217;s explore <strong>advanced SHAP techniques</strong> and some <strong>real-world pitfalls</strong> to watch out for when applying it in production.</p><h3><strong>Global vs. Local Interpretability: When SHAP Can Mislead You</strong></h3><p>SHAP values provide both <strong>global</strong> and <strong>local</strong> interpretability, but there&#8217;s a catch:</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-12-shap-in-action-making-ml-explainable">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #12 - Sentiment Analysis with Naive Bayes]]></title><description><![CDATA[In this DIY issue, we&#8217;ll roll up our sleeves and craft a simple yet effective text sentiment analysis project using the Naive Bayes algorithm. Our mission? Classifying SMS messages as either 'spam' or 'ham' (not spam). This will cover data loading, preprocessing, building the model, and evaluating its performance.]]></description><link>https://mlpills.substack.com/p/diy-12-sentiment-analysis-with-naive</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-12-sentiment-analysis-with-naive</guid><dc:creator><![CDATA[Muhammad Anas]]></dc:creator><pubDate>Sun, 10 Nov 2024 19:14:11 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1e22c6eb-1c17-47fb-9a5d-a7bb5e7c6c85_1920x1280.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the Week</h1><p>In this DIY issue, we&#8217;ll roll up our sleeves and craft a simple yet effective text sentiment analysis project using the <strong>Naive Bayes algorithm</strong>. Our mission? Classifying SMS messages as either 'spam' or 'ham' (not spam). This will cover data loading, preprocessing, building the model, and evaluating its performance. </p><p>Before starting you can <strong>review the theory</strong> in this previous issue:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;fd0a1123-23e3-4fb6-be7a-ed3ef0a75266&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #78 - Naive Bayes Classifier&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:236084597,&quot;name&quot;:&quot;Muhammad Anas&quot;,&quot;bio&quot;:&quot;random drops multiple times a week to keep you up to date with the field&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68943441-be86-431c-9ca5-876167a3ab9e_3024x4032.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-10-27T06:51:17.122Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/038425ef-95bd-4855-b052-4298d859923a_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-78-naive-bayes-classifier&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:150723894,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Ready? This is what we will be covering:</p><ul><li><p>Loading and Exploring the Dataset</p></li><li><p>Data Preprocessing</p></li><li><p>Splitting Data for Training and Testing</p></li><li><p>Text Vectorization with TF-IDF</p></li><li><p>Training the Naive Bayes Model</p></li><li><p>Evaluating Model Performance</p></li></ul><p>Let's dive in!</p><p></p><h3>1. Loading and Exploring the Dataset</h3><p>First, we need data! For this project, we&#8217;ll be using a dataset containing labeled SMS messages as either spam or ham. Here&#8217;s how we get started:</p><pre><code>import pandas as pd 

# Load the dataset directly from a URL 
df = pd.read_csv(url, encoding='latin-1') 

# Select relevant columns and rename them for clarity 

df = df[['v1', 'v2']] df.columns = ['label', 'message']</code></pre><p>What just happened?</p><ul><li><p>We loaded data using pandas straight from a URL.</p></li><li><p>Kept two important columns: 'label' and 'message'.</p></li><li><p>Renamed them for better readability. Quick, simple, and clean!</p></li></ul><p></p><h4>2. Data Preprocessing</h4><p>Now, let's prep our data for modeling:</p><ul><li><p>Convert labels to numerical values: <code>0</code> for ham, <code>1</code> for spam. This helps our model understand better.</p></li><li><p>Check and remove duplicates, ensuring cleaner data for accurate training.</p></li></ul><pre><code># Transform labels: 0 for 'ham', 1 for 'spam'
df['label'] = df['label'].map({'ham': 0, 'spam': 1})

# Remove duplicates and check for missing values
df = df.drop_duplicates()
print(df.isnull().sum())  # Expect all zeros (no missing values)
</code></pre><p>How balanced is our data? Quick peek:</p><pre><code>print(df['label'].value_counts())</code></pre><p>This gives us an idea of how evenly distributed our spam and ham messages are.</p><p></p><h3>3. Splitting Data for Training and Testing</h3><p>To evaluate our model fairly, we split our data into training and testing sets.</p><pre><code>from sklearn.model_selection import train_test_split

# Features and target labels
X = df['message']
y = df['label']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)</code></pre><p>What just happened?</p><ul><li><p>We defined our X and y (features and labels/targets) </p></li><li><p>Then we split them into 75% training and 25% testing</p></li></ul><p></p><h3>4. Text Vectorization with TF-IDF</h3><p>Text data needs numerical representation for modeling. Enter <strong>TF-IDF (Term Frequency-Inverse Document Frequency)</strong>, which assigns importance to words based on their frequency across messages.</p><pre><code>from sklearn.feature_extraction.text import TfidfVectorizer

# Convert text messages to numerical features using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)</code></pre><p>Explanation:</p><ul><li><p><strong>TF-IDF</strong> transforms text into a matrix of numerical values, emphasizing unique and relevant words.</p></li><li><p>Common stop words (like 'the', 'and') are removed for better focus.</p></li></ul><p></p><h3>5. Training the Naive Bayes Model</h3><p>Now for the magic! We&#8217;ll use the <strong>Multinomial Naive Bayes</strong> algorithm, perfect for text classification.</p><pre><code>from sklearn.naive_bayes import MultinomialNB

# Initialize and train the model
nb_model = MultinomialNB()
nb_model.fit(X_train_tfidf, y_train)

# Make predictions on the test set
y_pred = nb_model.predict(X_test_tfidf)</code></pre><p><strong>Why Multinomial Naive Bayes (MultinomialNB)?</strong><br>MultinomialNB is tailored for discrete features like word counts or term frequencies, making it ideal for text classification tasks such as spam detection and sentiment analysis. It works by applying Bayes' Theorem under the assumption that features (words) are conditionally independent given the class label.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(C_i \\mid d) \\propto P(C_i) \\prod_{j=1}^{n} P(w_j \\mid C_i)^{f(w_j, d)}&quot;,&quot;id&quot;:&quot;PUIIRUCZCS&quot;}" data-component-name="LatexBlockToDOM"></div><p><strong>Breaking it Down Simply:</strong></p><ul><li><p><strong>P(C&#7522;)</strong>: The prior probability of class C<strong>&#7522;</strong>&#8203;, indicating how common a class is (e.g., how likely it is that a message is "spam" versus "not spam").</p></li><li><p><strong>P(w&#11388;|C&#7522;)</strong>: The probability that word w&#11388;&#8203; appears in class C<strong>&#7522;</strong>&#8203;. For instance, if "win" frequently appears in spam messages, P("win"|spam) would be high.</p></li><li><p><strong>f(w&#11388;, d)</strong>: The frequency of word w&#11388;&#8203; in document d.</p></li></ul><p><strong>What Does the Formula Do?</strong></p><ul><li><p>It multiplies the prior probability of the class P(C&#7522;) by the probabilities of each word in the document appearing in that class, raised to the power of how often the word appears.</p></li><li><p>The higher this product, the more likely the document belongs to that class.</p></li></ul><p>In simple terms, <strong>Multinomial Naive Bayes</strong> predicts the class of a document by considering how common each word is in that class, adjusted for how often the class itself appears overall. It&#8217;s like scoring how "spammy" or "hammy" the words in a message are!</p><p><strong>Key Advantages:</strong></p><ul><li><p><strong>Count-Based Data:</strong> Handles features represented by counts or frequencies efficiently (e.g., word occurrence in text).</p></li><li><p><strong>High-Dimensional Data:</strong> Performs well with large vocabularies and sparse matrices, which are common in text data.</p></li></ul><p><strong>Comparison to GaussianNB:</strong></p><ul><li><p><strong>Discrete vs. Continuous Data:</strong> MultinomialNB works with discrete data, like word counts. In contrast, GaussianNB assumes continuous features that follow a normal distribution, making it more suitable for numerical data.</p></li><li><p><strong>Memory Efficiency:</strong> MultinomialNB leverages word counts directly, avoiding complex calculations of distribution parameters, which makes it faster for sparse data.</p></li></ul><p><strong>Other Naive Bayes Variants:</strong></p><ul><li><p><strong>BernoulliNB:</strong> Suitable for binary/boolean features (presence/absence of words). Works well when the focus is on word presence rather than frequency.</p></li><li><p><strong>GaussianNB:</strong> Assumes continuous data with a Gaussian distribution (e.g., features like height, weight).</p></li><li><p><strong>ComplementNB:</strong> Designed to handle imbalanced classes better, making it effective for datasets where one class is significantly smaller than the other.</p></li></ul><p></p><h3>6. Evaluating Model Performance</h3><p>Did it work well? Let&#8217;s find out!</p><pre><code>from sklearn.metrics import accuracy_score, classification_report

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Model Accuracy: {accuracy:.2f}')

# Display a detailed classification report
print(classification_report(y_test, y_pred))</code></pre><p>The <strong>accuracy score</strong> tells us the overall success rate of our model, while the <strong>classification report</strong> provides deeper insights into precision, recall, and F1-score for spam and ham messages.</p><p></p><h2>&#127881; Recap</h2><ul><li><p><strong>Data Loading &amp; Preprocessing</strong>: Cleaned and prepared data.</p></li><li><p><strong>Text Vectorization</strong>: Transformed text to numerical features using TF-IDF.</p></li><li><p><strong>Model Training &amp; Evaluation</strong>: Trained a Naive Bayes classifier and checked its performance.</p></li></ul><p>Quick, effective, and surprisingly powerful for such a lightweight approach! Stay tuned for more optimizations and advanced techniques in future issues!</p><p>If you want the <strong><a href="https://mlpills.substack.com/i/151445198/time-to-play">full notebook</a></strong> you can find it at the <strong><a href="https://mlpills.substack.com/i/151445198/time-to-play">end of the issue</a></strong>!</p><p>It has an <strong>additional 3 sections</strong>:</p><ol><li><p>Data Loading and Exploration</p></li><li><p>Data Preprocessing</p></li><li><p>Data Splitting</p></li><li><p>Text Vectorization</p></li><li><p>Model Training</p></li><li><p>Model Evaluation</p></li><li><p><strong>Hyperparameter Tuning &#127381;</strong></p></li><li><p><strong>Model and Vectorizer Saving &#127381;</strong></p></li><li><p><strong>Custom Predictions &#127381;</strong></p></li></ol><div><hr></div><h1>&#9889;Power-Up Corner</h1><p>To make your text classification model more robust, here are some advanced tips and common pitfalls to watch for:</p><ol><li><p><strong>Feature Engineering Beyond TF-IDF:</strong></p></li></ol>
      <p>
          <a href="https://mlpills.substack.com/p/diy-12-sentiment-analysis-with-naive">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #11 - Build RAG System with LangChain]]></title><description><![CDATA[In the rapidly evolving world of artificial intelligence, the need for accurate, contextually relevant information has become essential. Traditional language models, while powerful, sometimes generate responses that lack factual accuracy or up-to-date information. This is where Retrieval-Augmented Generation (RAG) comes into play.]]></description><link>https://mlpills.substack.com/p/diy-11-build-rag-system-with-langchain</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-11-build-rag-system-with-langchain</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 31 Aug 2024 17:45:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/23dcf484-a4ae-4f60-81e8-66a7cedf7072_1920x1280.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>&#128138; Pill of the week</h1><p>In the rapidly evolving world of artificial intelligence, the need for accurate, contextually relevant information has become essential. Traditional language models, while powerful, sometimes generate responses that lack factual accuracy or up-to-date information. This is where <strong>Retrieval-Augmented Generation</strong> (RAG) comes into play. </p><div class="pullquote"><p>RAG systems enhance the performance of generative models by grounding their outputs in real-world, retrievable documents</p></div><p>This approach combines the best of both worlds: the creativity and fluency of generative models with the factual accuracy of retrieval-based systems.</p><p>In this issue of MLPills, we will explore how to <strong>build a RAG system using LangChain</strong>, a versatile framework that simplifies the integration of language models with external data sources. We'll cover everything from the basics of RAG and LangChain to slightly more advanced optimization techniques and real-world applications.</p><blockquote><p>&#128104;&#8205;&#128187; We include the <strong>notebook</strong> and <strong>data used</strong> at the <em><strong>end of the issue</strong></em>!</p></blockquote><h2>What is LangChain?</h2><p><strong>LangChain is an open-source framework designed to assist developers in creating applications that leverage the capabilities of large language models (LLMs).</strong> It provides a set of tools, components, and abstractions that make it easier to integrate language models with various data sources and services, whether for retrieval, processing, or generation tasks.</p><p>Key features of LangChain include:</p><ol><li><p><strong>Chainability:</strong> Allows the creation of complex workflows by chaining together different components.</p></li><li><p><strong>Prompts Management:</strong> Offers tools for creating, managing, and optimizing prompts for language models.</p></li><li><p><strong>Memory Integration:</strong> Provides mechanisms to give language models short-term and long-term memory capabilities.</p></li><li><p><strong>Agent Framework:</strong> Enables the creation of AI agents that can make decisions and take actions.</p></li><li><p><strong>Data Connection:</strong> Facilitates easy integration with various data sources and vector stores.</p></li></ol><p>LangChain is particularly useful for building complex AI applications, such as RAG systems, where multiple components need to work together seamlessly. It supports various language models, including those from OpenAI, Hugging Face, and others, making it a flexible choice for developers.</p><h2>Why Use RAG?</h2><p>Before proceeding maybe you want to revise what RAG is:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;201ae136-bc9d-420a-86ff-c8ecdf160f1c&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #56 - Retrieval-Augmented Generation&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-04-27T16:04:33.052Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ac9c1ea-0a3f-40a9-86f4-bdba47a61df5_1920x1221.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-56-retrieval-augmented-generation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144067322,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/daansan_ml/status/1784250030884569387" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WYdD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 424w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 848w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 1272w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WYdD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png" width="514" height="532.4771241830065" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:634,&quot;width&quot;:612,&quot;resizeWidth&quot;:514,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://x.com/daansan_ml/status/1784250030884569387&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!WYdD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 424w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 848w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 1272w, https://substackcdn.com/image/fetch/$s_!WYdD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F498d8160-a8d1-4554-8ff9-6263165f5ad2_612x634.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Retrieval-Augmented Generation (RAG) offers several <strong>key advantages</strong> over traditional language generation techniques:</p><ol><li><p><strong>Improved Accuracy:</strong> By grounding generative outputs in real-world data, RAG systems produce responses that are more accurate and less prone to hallucinations. This is crucial for applications where factual correctness is paramount.</p></li><li><p><strong>Real-Time Information Retrieval:</strong> RAG systems can retrieve up-to-date information from their knowledge base, making them ideal for scenarios where current data is crucial. This is particularly useful in fields like news reporting, market analysis, or providing the latest scientific information.</p></li><li><p><strong>Context-Aware Responses:</strong> By using retrieved documents as a basis for generation, RAG systems can provide more contextually relevant answers. This is particularly useful in specialized domains like legal advice, medical information, or technical support, where domain-specific knowledge is critical.</p></li><li><p><strong>Transparency and Explainability:</strong> RAG systems can provide the sources of their information, allowing users to verify the generated content. This transparency builds trust and allows for fact-checking.</p></li><li><p><strong>Customizability:</strong> The knowledge base used by RAG systems can be easily updated or customized for specific use cases, allowing for greater flexibility compared to traditional language models with fixed training data.</p></li><li><p><strong>Reduced Training Costs:</strong> Instead of fine-tuning large language models on domain-specific data (which can be computationally expensive), RAG systems allow for the integration of domain knowledge through the retrieval component.</p></li><li><p><strong>Handling of Long-Context Tasks:</strong> RAG systems can effectively handle tasks that require processing or generating long pieces of text by breaking them down into manageable chunks and leveraging relevant information from the knowledge base.</p></li></ol><h2>Setting Up Your Environment</h2><p>Before diving into the code, let's set up the environment necessary for building a RAG system with LangChain.</p><h3>1. Install Dependencies</h3><p>To get started, you'll need to install a few Python packages. Open your terminal and run the following command:</p><pre><code>pip install langchain openai faiss-cpu python-dotenv langchain-community tiktoken</code></pre><ul><li><p><strong>LangChain:</strong> The core framework we'll be using to build our RAG system.</p></li><li><p><strong>OpenAI:</strong> To access OpenAI's language models (you can replace this with other supported LLMs if preferred).</p></li><li><p><strong>FAISS:</strong> A library for efficient similarity search and clustering, which we'll use for document indexing.</p></li><li><p><strong>python-dotenv:</strong> For managing environment variables, including API keys.</p></li><li><p><strong>langchain-community</strong>: This library contains a collection of community-contributed utilities, integrations, and modules that extend LangChain's capabilities, enabling easier interaction with a variety of tools and services.</p></li><li><p><strong>tiktoken</strong>: A library for tokenizing text in a way that's compatible with OpenAI's models, which helps manage token limits and optimize performance when working with large language models.</p></li></ul><h3>2. Get API Keys</h3><p>You'll need API keys for the services you plan to use. For this guide, we'll primarily use OpenAI's API. Follow these steps to obtain and set up your API key:</p><ol><li><p>Go to the <a href="https://openai.com">OpenAI website</a> and sign up for an account if you haven't already.</p></li><li><p>Navigate to the API section and create a new API key.</p></li><li><p>Create a <code>.env</code> file in your project directory and add your API key:</p></li></ol><pre><code>OPENAI_API_KEY=your_api_key_here</code></pre><ol start="4"><li><p>In your Python script, use the following code to load the environment variables:</p></li></ol><pre><code>from dotenv import load_dotenv 
import os 

load_dotenv() 
openai_api_key = os.getenv("OPENAI_API_KEY")</code></pre><h2>Step-by-Step Guide to Building a RAG System</h2><p>Let's walk through the process of building a RAG system using LangChain, step by step.</p><h3>1. Document Preparation</h3><p>The first step in creating a RAG system is to <strong>prepare the documents that your system will retrieve information from</strong>. These documents will serve as the knowledge base for your system.</p><p>We will use the text from the <a href="https://www.nytimes.com/article/climate-change-global-warming-faq.html">following NYTimes article about Climate Change</a> that we <em>manually cleaned and split in subdocuments (TXT files)</em> for the purpose of the explanation.</p><p>LangChain supports various document loaders for different file types and sources. Here's an example of how you can load text documents:</p><pre><code>from langchain.document_loaders import DirectoryLoader, TextLoader

# Load text files from a directory
loader = DirectoryLoader('./data', glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()

# Process the documents
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)</code></pre><p>In this example, we're using the <code>DirectoryLoader</code> to load all <code>.txt</code> files from a <code>data</code> directory. We then use a <code>RecursiveCharacterTextSplitter</code> to split the documents into smaller chunks, which is important for efficient indexing and retrieval.</p><p>You can learn more about text chunking in the following issue:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bacadc71-76e4-4625-b6a8-111cbab13f6d&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the Week&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #64 - Text chunking for RAG systems&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-06-29T17:34:54.776Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4daf8dd2-ef96-4984-8b0f-68ed0ff903c6_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-64-text-chunking-for-rag-systems&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:146103471,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>2. Indexing Documents</h3><p>Once your documents are loaded and processed, the next step is to index them so that they can be efficiently searched when generating responses. We'll use FAISS (Facebook AI Similarity Search) for this purpose.</p><pre><code>from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Initialize the embeddings
embeddings = OpenAIEmbeddings()

# Create the vector store
vectorstore = FAISS.from_documents(texts, embeddings)

# Save the vector store
vectorstore.save_local("faiss_index")</code></pre><p>In this code snippet, we use OpenAI's embeddings to convert the document chunks into vectors, which are then indexed using FAISS. We also save the index locally for future use.</p><p>What is a vector store? Here you have more info:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f8defdb7-72bf-45d2-b341-3ea4c31a3c11&quot;,&quot;caption&quot;:&quot;Today we are introducing a new type of issue: &#8220;Podcast notes&#8221; &#127881;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #55 - Vector Databases and their importance&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-04-20T11:31:02.082Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6649b34a-d70f-49d5-8347-f225fef8910a_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-55-vector-databases-and-their&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:143749725,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>3. Setting Up the Retrieval Mechanism</h3><p>Now that we have our documents indexed, we can set up the retrieval mechanism. This will allow our system to find and return the most relevant documents based on a user's query.</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-11-build-rag-system-with-langchain">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #10 - ARIMA model from zero to hero]]></title><description><![CDATA[Time series data, characterized by observations over sequential time intervals, often exhibits patterns and trends that can be analyzed and forecasted using sophisticated statistical models. One such powerful model is ARIMA (AutoRegressive Integrated Moving Average), which combines autoregressive (AR), differencing (I), and moving average (MA) components to capture the temporal dependencies within the data.]]></description><link>https://mlpills.substack.com/p/issue-62-arima-model-from-zero-to</link><guid isPermaLink="false">https://mlpills.substack.com/p/issue-62-arima-model-from-zero-to</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 15 Jun 2024 11:40:57 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0fb7ce4d-345d-43ff-b4dc-df71adc64433_1920x1280.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128138; Pill of the week</h2><p>Time series data, characterized by observations over sequential time intervals, often exhibits patterns and trends that can be analyzed and forecasted using sophisticated statistical models. One such powerful model is ARIMA (AutoRegressive Integrated Moving Average), which combines autoregressive (AR), differencing (I), and moving average (MA) components to capture the temporal dependencies within the data.</p><p></p><h2>&#128736;&#65039; DIY</h2><p>In this article we will show how to train an ARIMA model. We will focus on the <strong>results and steps</strong>, without showing the code, since it was already introduced in previous issues of MLPills. However, we will share a <strong><a href="https://mlpills.substack.com/i/145665252/time-to-practise">notebook with all the code</a></strong> you need to train your ARIMA model <strong>at the end of the issue</strong>! </p><div class="pullquote"><p><strong>&#9888;&#65039;Notebook at the end of the issue &#9888;&#65039;</strong></p></div><h4>Step 1: Loading the data and Understanding the process</h4><p>To begin our analysis, we start by importing essential libraries like <code>numpy</code> and <code>pandas</code>, which facilitate numerical operations and data manipulation, respectively. Loading our time series data using <code>pd.read_csv()</code> allows us to set appropriate timestamps as the index, ensuring accurate time-based analysis.</p><p>You can also revise this issue in which we introduced the ARIMA model and its components:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;449bc870-fed1-4b14-b3e1-0c929e5119da&quot;,&quot;caption&quot;:&quot;This week: &#128138; Discover the components of ARIMA models and the parameters that characterise them. &#8205;&#127891; Find out how to deploy and maintain your models &#129302; Tech RoundUp - the top 5 AI news of the week Enjoy! &#128138; Pill of the week This week, let&#8217;s talk about ARIMA, a popular method for analysing and forecasting Time Series data. ARIMA, an acronym for Auto-Regressive Integrated Moving Average, stands out as a cornerstone in traditional statistical methods for time series forecasting. This approach comprises three components:&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #47 - ARIMA models: components and parameters&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-02-16T08:48:01.134Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a04a3f18-00e8-4f30-b197-14e8e2fd8588_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-47-arima-models-components&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:141659004,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>We will follow the so-called Box-Jenkins method:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8f6e4c17-8b37-4d64-9936-60c3c50ba79c&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week This week is the time to give an overall view of the ARIMA model methodology, also called the Box-Jenkins method. We will link each step to previous issues of MLPills, so you can revise each step and become an ARIMA master! The Box-Jenkins method&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #51 - ARIMA models: Box-Jenkins method&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-03-23T12:34:07.525Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a64dd257-f3de-4b56-8640-bd33e5f97880_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-51-arima-models-box-jenkins&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:142882075,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><h4>Step 2: Assessing Stationarity</h4><p>Before applying ARIMA, it's crucial to ensure that our time series data is stationary. This will be allow us to determine the value of &#8220;d&#8221;. Stationarity implies that the statistical properties such as mean and variance remain constant over time, which is essential for modeling.</p><p>We employ two primary tests for stationarity:</p><ul><li><p><strong>Augmented Dickey-Fuller (ADF) Test</strong>: This test examines whether a unit root is present in the series. A lower p-value (&lt; 0.05) from this test suggests that the series is stationary.</p></li><li><p><strong>Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test</strong>: In contrast, this test checks for stationarity around a deterministic trend. A higher p-value (&gt; 0.05) indicates stationarity.</p></li></ul><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bf0bcb63-5484-45f1-92bd-a45ac0af428c&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week Previously we introduced ARIMA and its components: AR, MA and I. This week we will learn what conditions our data must meet and how to transform it to meet them. Here is a hint: stationarity. We will also select the first parameter of our model: &#8220;d&#8221;.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #48 - ARIMA models: stationarity and differencing&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-03-02T07:00:29.451Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f32deb0e-628b-48aa-b8df-1ea0b72b681e_1920x1272.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-48-arima-models-stationarity&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:141955924,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>In the example we observe that the initial data is not stationary, we could actually see if just by plotting the graph:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1nIP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1nIP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 424w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 848w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 1272w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1nIP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png" width="833" height="428" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:428,&quot;width&quot;:833,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1nIP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 424w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 848w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 1272w, https://substackcdn.com/image/fetch/$s_!1nIP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a0e48c1-968a-496a-b4a2-3127b1c088e8_833x428.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This means that we will need to difference the data, &#8220;d&#8221; won&#8217;t be 0.</p><p></p><h4>Step 3: Preparing the Data - Differencing</h4><p>The series is not stationary based on the initial tests, so we apply differencing. Differencing involves subtracting the previous observation from the current one, a process denoted by the parameter &#8220;<code>d</code>&#8221; in ARIMA. This transformation helps stabilize the mean and variance of the series.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d6px!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d6px!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 424w, https://substackcdn.com/image/fetch/$s_!d6px!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 848w, https://substackcdn.com/image/fetch/$s_!d6px!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 1272w, https://substackcdn.com/image/fetch/$s_!d6px!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d6px!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png" width="825" height="428" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:428,&quot;width&quot;:825,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d6px!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 424w, https://substackcdn.com/image/fetch/$s_!d6px!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 848w, https://substackcdn.com/image/fetch/$s_!d6px!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 1272w, https://substackcdn.com/image/fetch/$s_!d6px!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc274bfca-b610-4875-bd1f-3bd0e7ab11b6_825x428.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This looks way better! </p><p>If we perform the stationarity tests we get that it is stationary:</p><p><strong>Augmented Dickey-Fuller (ADF) Test</strong>: </p><pre><code>ADF Statistic: -8.770582534427128
p-value: 2.5274599738010693e-14
Critical Values:
&#9;1%: -3.437
&#9;5%: -2.864
&#9;10%: -2.568
ADF test: Series is stationary</code></pre><p><strong>Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test</strong>:</p><pre><code>KPSS Statistic: 0.19664101759123429
p-value: 0.1
Lags Used: 11
Critical Values:
&#9;10%: 0.347
&#9;5%: 0.463
&#9;2.5%: 0.574
&#9;1%: 0.739
KPSS test: Series is stationary</code></pre><p>Since we had to differenciate once before making our data stationary, we know that the parameter <strong>d</strong> of the ARIMA model will be equal to 1.</p><p>We can proceed with the other two parameters: <code>p</code> and <code>q</code>.</p><p></p><h4>Step 4: Visualizing Correlations - ACF and PACF</h4><p>To determine the appropriate parameters <code>p</code> (AR) and <code>q</code> (MA) for our ARIMA model, we plot:</p><ul><li><p><strong>Autocorrelation Function (ACF)</strong>: This plot shows the correlation between the series and its lagged values, aiding in identifying the MA order (<code>q</code>).</p></li><li><p><strong>Partial Autocorrelation Function (PACF)</strong>: This plot indicates the direct relationship between a lag and the series, helping to determine the AR order (<code>p</code>).</p></li></ul><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;74c1c5fa-b6b2-41ee-aa1b-b07780b90972&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week This week we will introduce the ACF and PACF, which are two really useful tools to select the order p of the AR model, and the order q of the MA model, two of the basic components of the ARIMA model. ACF and PACF plots If you're working with time series data and need to build an ARIMA model, understanding the concepts of ACF (Autocorre&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #50 - ARIMA models: selection of p and q&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-03-16T14:53:09.869Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9bcef0a-53fa-495a-bb2d-a01b4118faf3_1920x1438.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-50-arima-models-selection-of&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:142662541,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>First we can check the ACF plot of the undifferenced data:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WYzy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WYzy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WYzy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png" width="587" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:587,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WYzy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!WYzy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57cbc2b-34ce-46c5-93bc-75ee79577975_587x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can see that the ACF plot shows a slow decay, which indicates that differencing was indeed necessary.</p><p>Now let's plot the PACF and ACF of the differenced data (<strong>d=1</strong>) to see the difference:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MsJq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MsJq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MsJq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png" width="587" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:587,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MsJq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!MsJq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c955455-8ebe-4137-9766-ef03e06bf970_587x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yUEb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yUEb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yUEb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png" width="587" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:587,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yUEb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 424w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 848w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 1272w, https://substackcdn.com/image/fetch/$s_!yUEb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1a2131-8bfc-4865-a9fe-67aba854ac7a_587x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A sharp decline in autocorrelation is observed in both plots, which indicates stationarity, confirming that the selected <code>d</code> value (1) was appropriate.</p><p>This will also help us estimate the values for <strong>p</strong> and <strong>q</strong>:</p><ul><li><p>The order of the AR term (p) is typically chosen based on the Partial Autocorrelation Function (PACF) plot, in this case: <strong>p</strong> = 6</p></li><li><p>The order of the MA term (q) is typically chosen based on the Autocorrelation Function (ACF) plot, in our case: <strong>q</strong> = 3</p></li></ul><p></p><h4>Step 5: Building and Refining the ARIMA Model</h4><p>Armed with insights from the ACF and PACF plots, we construct our ARIMA model. Iteratively adjusting parameters based on model diagnostics such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion), we aim to find the best-fit model that minimizes these metrics.</p><p>We start by checking the model summary:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1cNR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1cNR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 424w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 848w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 1272w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1cNR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png" width="499" height="574.0632478632479" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:585,&quot;resizeWidth&quot;:499,&quot;bytes&quot;:148648,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1cNR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 424w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 848w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 1272w, https://substackcdn.com/image/fetch/$s_!1cNR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c687ac9-952c-47ee-8252-69ef31de45cb_585x673.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We observe that the AR terms 4 to 6 are not significant (p-value [P&gt;|z|] &lt; 0.05). So that may indicate that we've overestimated the value of <strong>p</strong>, let's reduce it to 4 and do the same again:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gc8r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gc8r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 424w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 848w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 1272w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gc8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png" width="437" height="479.7592954990215" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:511,&quot;resizeWidth&quot;:437,&quot;bytes&quot;:110698,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gc8r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 424w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 848w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 1272w, https://substackcdn.com/image/fetch/$s_!gc8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd176d-9f6b-4685-ada9-a648a79d02da_511x561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now all terms are significant! Great! Let's compare the AIC and BIC to see if they are better (lower) too:</p><ul><li><p><strong>ARIMA(6,1,3)</strong> &#8594; AIC = 2805.486, BIC = 2854.553</p></li><li><p><strong>ARIMA(4,1,3)</strong> &#8594; AIC = 2803.404, BIC = 2842.658</p></li></ul><p>It is not a big change but the new model is better! We ideally should <strong>repeat this with several combinations of parameters</strong> (around our estimation and verifying that the AR and MA terms are significant) until we achieve the optimal AIC and BIC (the lowest).</p><p>Here an introduction to these concepts:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d826abe9-a46c-4051-9cb3-3d5718f201ac&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week Last week we defined the concept of stationarity, introduced some tests to check if our data met those requirements and finally estimated the parameter &#8220;d&#8221; of our ARIMA model. This week we will see the criteria we will follow to select the rest of the parameters of our model.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #49 - ARIMA models: Criteria for selection&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-03-09T15:04:32.346Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85f6af69-1ed2-4495-a4ae-3dc2877f1f18_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-49-arima-models-criteria-for&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:142206737,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><h4>Step 6: Evaluating Model Performance</h4><p>Once the ARIMA model is trained, we proceed to evaluate its performance:</p><ul><li><p><strong>Residual Analysis</strong>: We inspect the residuals to ensure they exhibit no discernible patterns, resembling white noise. Diagnostic checks like Q-Q plots and histogram of residuals verify this assumption.</p></li><li><p><strong>Forecasting</strong>: Using the fitted model, we generate forecasts for future time points and compare them against actual values. Metrics such as MAE, RMSE, and others quantify the accuracy of our predictions.</p></li></ul><p>In this issue we will focus solely on the first one, residual analysis, leaving the second one for a future issue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nxQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nxQA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 424w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 848w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 1272w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nxQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png" width="999" height="701" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:999,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nxQA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 424w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 848w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 1272w, https://substackcdn.com/image/fetch/$s_!nxQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c27125-1a4d-4832-8a0a-d074927f522e_999x701.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We observe:</p><ul><li><p>No obvious patterns in the residuals &#8594; random noise</p></li><li><p>Histogram looks like a normal distribution</p></li><li><p>The majority of datapoints lie over the line in the normal Q-Q graph</p></li><li><p>All lags (apart from the 0) are within the confidence range (blue-shaded area).</p></li></ul><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e8647591-c1d3-4f88-b923-29d07fea1dbe&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week An essential step in any time series modelling project is to verify the assumptions and accuracy of the fitted model. After fitting a time series model such as ARIMA, it is crucial to assess the residuals to ensure that the model accurately captures all patterns in the data. Residuals are the differences between the observed values an&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #52 - ARIMA models: Residual Diagnostics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-03-30T16:37:32.968Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b82304ec-d8de-4241-8ea9-5d8a43884619_1920x2400.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-52-arima-models-residual-diagnostics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:143097387,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Great news, our model is capturing the data!</p><p></p><h4>In a future issue&#8230;</h4><p>The next steps would be to assess the accuracy of the model:</p><ol><li><p><strong>Split your data</strong>: Set aside a portion of your data as a test set. Typically, this is done by holding out the last portion of your time series data. For example, you could use the last 20% of your data for testing and the first 80% for training.</p></li><li><p><strong>Refit the model on the training data</strong>: Fit your ARIMA model using only the training data. This ensures that your model does not have knowledge of the future values that it is supposed to predict.</p></li><li><p><strong>Forecast</strong>: Use your trained model to forecast the values for the period covered by the test set. Compare these forecasted values with the actual values in the test set.</p></li><li><p><strong>Evaluate the forecast accuracy</strong>: Calculate accuracy metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). These metrics will give you an indication of how well your model is performing.</p></li><li><p><strong>Visualize the results</strong>: Plot your forecasted values against the actual values to visually inspect how well your model is capturing the trends and patterns in the data.</p></li></ol><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a0826099-4a3a-4ef8-86d7-8348f70aa789&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week Time to go back to Time Series with ARIMA. Let&#8217;s continue where we left it last time. If you want a reminder here you have: Today is time to learn how to evaluate the performance of your ARIMA model: Evaluating ARIMA Performance When working with time series data and forecasting models like ARIMA, it is essential to evaluate the performa&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #57 - Evaluating ARIMA Performance&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-04T19:59:16.213Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cc138e1-faf5-499f-9361-1d265de490f0_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-57-evaluating-arima-performance&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144299087,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p>In conclusion, ARIMA stands as a robust methodology for analyzing and forecasting time series data, offering insights into trends and patterns that are invaluable for decision-making across various domains. By following these systematic steps, analysts and data scientists can leverage ARIMA to derive actionable insights and make informed predictions from time series data.</p><div><hr></div><h1><strong>&#8205;&#127891;Learn Real-World Machine Learning!*</strong></h1><p>Do you want to learn <strong><a href="https://t.co/WUM8SpmBaK">Real-World Machine Learning</a></strong>?</p><p>Data Science doesn&#8217;t finish with the model training&#8230; There is much more!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://t.co/WUM8SpmBaK" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fjSP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fjSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg" width="376" height="211.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:376,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://t.co/WUM8SpmBaK&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!fjSP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fjSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe176c191-4164-4780-9cbe-8c9c5496aa4e_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here you will learn how to deploy and maintain your models, so they can be used in a Real-World environment:</p><ul><li><p><strong>Elevate your ML skills</strong> with "Real-World ML Tutorial &amp; Community"! &#128640;</p></li><li><p><strong>Business to ML</strong>: Turn real business challenges into ML solutions.</p></li><li><p><strong>Data Mastery</strong>: Craft perfect ML-ready data with Python.</p></li><li><p><strong>Train Like a Pro</strong>: Boost your models for peak performance.</p></li><li><p><strong>Deploy with Confidence</strong>: Master MLOps for real-world impact.</p></li></ul><p>&#127873; <strong>Special Offer</strong>: Use "MASSIVE50" for <strong>50% off</strong>. <strong>Only this time!</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://t.co/WUM8SpmBaK&quot;,&quot;text&quot;:&quot;Learn Real-World Machine Learning&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://t.co/WUM8SpmBaK"><span>Learn Real-World Machine Learning</span></a></p><p>*<em>Sponsored</em></p><div><hr></div><h1>&#129302; Tech Round-Up</h1><p>No time to check the news this week?</p><p>This week's TechRoundUp comes full of <strong>AI news</strong>. from Apple's new steps towards AI to Spotify's new in-house Creative Lab.</p><p>Let's dive into the latest Tech highlights you probably shouldn&#8217;t this week &#128165;</p><p>1&#65039;&#8419; <a href="https://techcrunch.com/2024/06/11/the-top-ai-features-apple-announced-at-wwdc-2024/">Apple Unveils New AI Features at WWDC 2024!</a> &#127823;&#129302; </p><blockquote><p>Dive into Apple's latest AI innovations that promise to revolutionize user experience. </p><p>From smarter Siri to enhanced privacy controls and real-time language translation, here&#8217;s all you need to know. </p></blockquote><p>2&#65039;&#8419; <a href="https://techcrunch.com/2024/06/13/linkedin-leans-on-ai-to-do-the-work-of-job-hunting/">Job Hunting Made Easy with LinkedIn's New AI Tools</a> &#129302;&#128269; </p><blockquote><p>LinkedIn is leveraging AI to streamline job searches, suggesting roles that fit your skills and experiences perfectly. </p><p>Say goodbye to endless scrolling and hello to your dream job! </p></blockquote><p>3&#65039;&#8419;  <a href="https://techcrunch.com/2024/06/14/meta-pauses-plans-to-train-ai-using-european-users-data-bowing-to-regulatory-pressure/">Meta Pauses AI Training Using European User Data</a> &#127757;&#128202; </p><blockquote><p>Amid regulatory pressures, Meta halts plans to train its AI with European data. </p><p>This move underscores the growing importance of data privacy and compliance.</p></blockquote><p>4&#65039;&#8419; <a href="https://techcrunch.com/2024/06/13/spotify-creative-labs-ad-agency-for-advertisers/">Spotify Launches Creative Labs</a> &#127912;&#127911; </p><blockquote><p>Spotify&#8217;s new in-house ad agency, Creative Labs, aims to revolutionize audio advertising. </p><p>Get ready for more personalized and engaging ads during your music sessions. </p></blockquote><p>5&#65039;&#8419; <a href="https://techcrunch.com/2024/06/11/apples-ai-apple-intelligence-is-boring-and-practical-thats-why-it-works/">Why Apple's AI Approach is Genius in its Simplicity</a> &#128241;&#10024; </p><blockquote><p>Apple focuses on practical, user-centric AI features rather than flashy gimmicks. </p><p>This approach ensures seamless integration into daily use, making tech genuinely useful.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://x.com/rfeers&quot;,&quot;text&quot;:&quot;Follow Josep here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://x.com/rfeers"><span>Follow Josep here</span></a></p><div><hr></div><h1>&#128221; Time to practise!</h1><p>Here you have the <strong>notebook </strong>with all the <strong>code </strong>omited in this article:</p>
      <p>
          <a href="https://mlpills.substack.com/p/issue-62-arima-model-from-zero-to">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #9 - Feature Importance in ML models]]></title><description><![CDATA[Feature importance is the quantification of the predictive power of individual input variables or features in a model. It provides insights into which features contribute most significantly to the model's performance in making accurate predictions or classifications. Understanding feature importance aids feature selection, model interpretation, and identifying critical factors driving the outcomes.]]></description><link>https://mlpills.substack.com/p/diy-9-feature-importance-in-ml-models</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-9-feature-importance-in-ml-models</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Sat, 24 Feb 2024 12:30:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/44e5b9e0-fbc0-45c5-91db-494aa230a716_1920x1438.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128138; Pill of the week</h2><p>Feature importance is the quantification of the predictive power of individual input variables or features in a model. It provides insights into which features contribute most significantly to the model's performance in making accurate predictions or classifications. Understanding feature importance aids feature selection, model interpretation, and identifying critical factors driving the outcomes.</p><p>We <strong>introduced</strong> this concept a month ago:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c95c4441-6f95-48c0-9b00-522dfe36ca33&quot;,&quot;caption&quot;:&quot;&#128138; Pill of the week Last week we generated multiple features for our Time Series data, however, some of them may not be useful at all. How to determine which ones are the best? That&#8217;s what we are talking about this week in the following article: feature importance. You can read it&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Issue #44 - How important is each feature in your model?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:38707812,&quot;name&quot;:&quot;David Andr&#233;s&quot;,&quot;bio&quot;:&quot;&#128188; Data Scientist &#8226; &#128013; Python enthusiast&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6423b2-36bc-440c-be7d-b54be5bad1b0_1447x1448.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:132707413,&quot;name&quot;:&quot;Josep Ferrer&quot;,&quot;bio&quot;:&quot;Outstand using data -- Data Science, Design and Tech \n\nTech Writer @KDnuggets @DataCamp\n\n&#128073;&#127995;Inquiries in rfeers@gmail.com&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d196b5a6-59f2-46dd-99b3-e10ab1bbd27d_604x604.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-01-26T07:43:02.769Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4664bf04-de0f-4f55-ae65-485a5b35261a_1920x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://mlpills.substack.com/p/issue-44-how-important-are-each-feature&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:141061195,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Machine Learning Pills&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dba4244-97d2-48f0-a2bb-b01c7ea74212_118x118.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>But today we are going to put it into practice!</p><h3>The data</h3><p>We will analyse the most important factors (diagnostic measurements) that allow for an accurate prediction of whether a patient has diabetes. We will use a <a href="https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset">dataset</a> from the National Institute of Diabetes and Digestive and Kidney<br>Diseases. We can find several variables, the independent variables<br>(several medical predictor variables) and the target or dependent variable (whether the person has diabetes or not). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8dJe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8dJe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 424w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 848w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8dJe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png" width="752" height="383" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3129c8a-c438-464c-b947-8baa7c563636_752x383.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:383,&quot;width&quot;:752,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20750,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8dJe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 424w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 848w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3129c8a-c438-464c-b947-8baa7c563636_752x383.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Importance</h3><p>But before starting, why is Feature Importance important? It is crucial for several reasons:</p><ol><li><p><strong>Model Interpretability</strong>: Understanding which features are most important in making predictions helps to interpret the model's behavior. </p></li><li><p><strong>Feature Selection</strong>: Feature importance helps in identifying the most relevant features for making predictions. In many cases, datasets contain a large number of features, some of which may be irrelevant or redundant. </p></li><li><p><strong>Dimensionality Reduction</strong>: By identifying and focusing on the most important features, feature importance analysis aids in reducing the dimensionality of the dataset. This simplifies the model, making it more computationally efficient and less prone to overfitting.</p></li><li><p><strong>Insights for Feature Engineering</strong>: Feature importance analysis provides insights into which features contribute most significantly to the target variable. This information can guide feature engineering efforts by highlighting areas where additional domain knowledge or feature transformations might be beneficial.</p></li><li><p><strong>Model Debugging and Diagnosis</strong>: Understanding feature importance can help diagnose issues with the model. For instance, if a highly important feature is not available or has incorrect values, it could signal data quality issues or feature engineering mistakes.</p></li><li><p><strong>Communication</strong>: Feature importance analysis facilitates communication between data scientists and domain experts. It provides a common ground for discussing which features are driving the predictions and helps in explaining model behavior to stakeholders who may not have a deep understanding of machine learning algorithms.</p></li></ol><h3>Random Forest model</h3><p>We will start by training a Random Forest classifier model. If you remember, decision-tree-based models allow for the estimation of the feature importance based on the assessment of the contribution of each feature to reducing impurity or error during the decision-making process. </p><p>This first approach yielded the following results:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hmUo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hmUo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hmUo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png" width="985" height="547" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:547,&quot;width&quot;:985,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hmUo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!hmUo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e789041-30d9-4fb9-8cf7-4895837d9426_985x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It seems that glucose is the main factor, followed by BMI, age and diabetes pedigree function. </p><p>Getting this straight from the model is not always possible. That is why there exist techniques that address this issue. </p><h3>Correlation matrix</h3><p>One basic approach would be to compute the correlation matrix and see how each feature correlates with the outcome feature:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xlI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xlI5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 424w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 848w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 1272w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xlI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png" width="530" height="465.08838383838383" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:695,&quot;width&quot;:792,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xlI5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 424w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 848w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 1272w, https://substackcdn.com/image/fetch/$s_!xlI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f9bc3d-d88c-4a57-a9e8-fb8676d359d3_792x695.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s simplify this information in a bar chart:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f_O6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f_O6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f_O6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png" width="985" height="547" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:547,&quot;width&quot;:985,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f_O6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!f_O6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3087fa7-366e-4bfe-a806-cf750a19411e_985x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Again we see that glucose is the main predictor, whereas in this case it&#8217;s followed by the BMI, age and pregnancies.</p><p>The correlation matrix approach is not the best for assessing feature importance because it assumes linear relationships, ignores non-linear connections, struggles with multicollinearity, and cannot capture complex interactions or the combined effects of multiple features on the target variable. Instead, other more advanced techniques are preferred for their ability to handle these complexities and provide more accurate assessments of feature importance.</p><h3>Permutation importance</h3><p>Permutation importance is another model-agnostic method that computes feature importance for any model by shuffling the values of each feature one at a time and measuring the resulting change in model performance. The process works by breaking the relationship between each feature and the target variable, then observing the impact on the model&#8217;s performance. A higher decrease in performance upon shuffling indicates greater feature importance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jg73!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jg73!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!jg73!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!jg73!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!jg73!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jg73!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png" width="985" height="547" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:547,&quot;width&quot;:985,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jg73!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 424w, https://substackcdn.com/image/fetch/$s_!jg73!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 848w, https://substackcdn.com/image/fetch/$s_!jg73!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 1272w, https://substackcdn.com/image/fetch/$s_!jg73!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fc132bd-c482-4342-b0fa-a595d7cdd110_985x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It matches the previous results, but it seems that glucose is way more important than the other predictors. </p><p>This technique is better because it captures non-linear relationships and interactions, handles multicollinearity, reflects true predictive power, can be applied to complex models, and is resilient to outliers.</p><p>However, it lacks the individualized interpretations, capturing of interactions, and directionality of feature effects that SHAP provides.</p><h3>SHAP</h3><p>SHAP (SHapley Additive exPlanations) values offer a unified measure of feature importance that allocates the contribution of each feature to the prediction for every possible combination of features. This method ensures a fair distribution of contributions, as it respects both efficiency and symmetry among features. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lSAB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lSAB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 424w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 848w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 1272w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lSAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png" width="790" height="459" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:459,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lSAB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 424w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 848w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 1272w, https://substackcdn.com/image/fetch/$s_!lSAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8380ca-cb7e-4db4-b8ee-568ad84fc417_790x459.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Again, glucose is the clear winner, followed by BMI, age and diabetes pedigree function.</p><p>However, this method can be computationally expensive and other simpler methods can be just fine depending on your problem.<br></p><h3>Recursive Feature Elimination</h3><p>This is not a feature importance technique, but I think it is interesting for you to also know it. It is an iterative method used for feature selection. The goal of feature selection is to identify and remove unnecessary features from the data that do not contribute, or may even decrease, the predictive performance of the model. RFE achieves this by recursively fitting the model, ranking the features based on their impact on model performance, and removing the least important feature at each step. This process continues until all features have been evaluated and ranked.</p><p>However, it&#8217;s worth noting that it can also be computationally expensive for models with a large number of features, as it involves repeatedly fitting the model and evaluating its performance. Also, it doesn&#8217;t calculate importance, it only ranks features by importance order. </p><pre><code>  1: Glucose
  2: BMI
  3: Age
  4: DiabetesPedigreeFunction
  5: BloodPressure
  6: Insulin
  7: Pregnancies
  8: SkinThickness</code></pre><p>Again, glucose is the clear winner, followed by BMI and age.</p><h3>The code</h3><p>You can find a <strong>notebook</strong> with all the <strong>code</strong> at the <strong>end of this newsletter issue</strong>!</p><p></p><h2><strong>&#8205;&#127891;Learn Advanced Machine Learning Concepts!*</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png" width="598" height="312.9642857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:306085,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Do you want to learn more about <a href="https://www.trainindata.com/p/machine-learning-interpretability?affcode=1218302_nkuq2dk8">Model Interpretability</a> and <a href="https://www.trainindata.com/p/feature-engineering-for-machine-learning?affcode=1218302_nkuq2dk8">Feature Selection</a>? Ready for a deeper dive?</strong></p><ul><li><p>Explore <a href="https://www.trainindata.com/p/feature-engineering-for-machine-learning?affcode=1218302_nkuq2dk8">feature engineering</a> and <a href="https://www.trainindata.com/p/feature-selection-for-machine-learning?affcode=1218302_nkuq2dk8">feature selection</a> methods</p></li><li><p><a href="https://www.trainindata.com/p/machine-learning-interpretability?affcode=1218302_nkuq2dk8">Explain interpretable and black box models</a> with LIME, Shap, partial dependency plots and more.</p><p></p></li></ul><p>Enroll today and take the next step in mastering the world of data science!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;text&quot;:&quot;Unlock Knowledge!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8"><span>Unlock Knowledge!</span></a></p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h2>&#129302; Tech Round-Up</h2><p>This week's TechRoundUp comes full of <strong>AI news</strong>. From Google&#8217;s brand new model Gemma to AI giants pact, the future is zooming towards us! &#128640;</p><p>Let's dive into the latest Tech highlights you probably shouldn&#8217;t this week &#128165;</p><p>1&#65039;&#8419; <a href="https://blog.google/technology/developers/gemma-open-models/">Google launches Gemma. </a></p><blockquote><p>It is the next-gen open AI models for responsible development.</p><p>Offering tools &amp; models for safer AI applications, it's set to revolutionize AI use responsibly. </p></blockquote><p>2&#65039;&#8419; <a href="https://techcrunch.com/2024/02/21/samsung-is-bringing-galaxy-ai-features-to-more-devices/">Samsung expands Galaxy AI </a></p><blockquote><p>Samsung is planning to expand AI features to more devices, enhancing user experiences with One UL 6.1 update. </p><p>Get ready for smarter, more intuitive device interactions. </p></blockquote><p>3&#65039;&#8419; <a href="https://techcrunch.com/2024/02/21/google-deepmind-forms-a-new-org-focused-on-ai-safety/">Google DeepMind commits to AI safety </a></p><blockquote><p>It is founding a new organization, focusing on ethical AI development. </p><p>A bold step towards responsible AI innovation. </p></blockquote><p>4&#65039;&#8419; <a href="https://techxplore.com/news/2024-02-amazon-unveils-largest-text-speech.html">Amazon unveils its largest text-to-speech model</a></p><blockquote><p>Pushing the boundaries of natural sounding digital voices. </p><p>A leap forward in making technology more accessible.</p></blockquote><p>5&#65039;&#8419; <a href="https://techcrunch.com/2024/02/20/eu-merger-control-ai/">The EU tightens controls </a></p><blockquote><p>With AI scrutiny, aiming to maintain fair competition and innovation in the AI landscape. </p><p>A critical move for the future of AI regulation. </p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/rfeers&quot;,&quot;text&quot;:&quot;Follow Josep on &#120143;&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://twitter.com/rfeers"><span>Follow Josep on &#120143;</span></a></p><div><hr></div><h2>&#128736;&#65039; Do It Yourself!</h2><p>You can practise all these concepts thanks to this notebook. You will need to do some research but don&#8217;t worry, next week I&#8217;ll unveil the answers! </p><p>Practise here:</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-9-feature-importance-in-ml-models">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #8 - Feature Engineering for Time Series]]></title><description><![CDATA[Feature engineering is crucial for making time series models better at predicting outcomes. It is all about extracting important information from time-based data. Time series data naturally have patterns and connections, so creating effective features is important for accurate predictions. There are multiple ways of generating these features and each of them is used to capture specific aspects of data. Today we will share the most common ones.]]></description><link>https://mlpills.substack.com/p/diy-8-feature-engineering-for-time</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-8-feature-engineering-for-time</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Fri, 19 Jan 2024 08:05:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c01a6ead-3055-4570-b4d8-af74bf54d342_1920x1280.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128138; Pill of the week</h2><p>Feature engineering is crucial for making time series models better at predicting outcomes. It is all about extracting important information from time-based data. Time series data naturally have patterns and connections, so creating effective features is important for accurate predictions. There are multiple ways of generating these features and each of them is used to capture specific aspects of data. Today we will share the most common ones.</p><blockquote><p>Are you already familiar with this topic and you just want to <strong>practice</strong>? Check the <strong>notebook at the end</strong> of this newsletter &#128071;&#128071;</p></blockquote><ol><li><p><strong>Date/Time-Related Features</strong>: These features are derived from the date-time value of each observation. They can include the year, month, day,  hour, minute, second, day of the week, and whether the day is a weekend or a holiday. </p><ol><li><p>These features can be useful when there is a trend or seasonality in your data that corresponds to these periods. </p></li><li><p>For example, retail sales might increase during weekends or holidays or website traffic might peak during certain hours.</p></li></ol></li><li><p><strong>Lag Features</strong>: Lag features are values at prior time steps. They can help capture the temporal dependencies in the data. </p><ol><li><p>These features can be useful when past values of a series are useful for predicting its future values. </p></li><li><p>For example, in stock price prediction, the price of a stock in the past few days can be useful for predicting its price tomorrow.</p></li></ol></li><li><p><strong>Rolling Window Features</strong>: These features are statistical measures like mean, median, standard deviation, etc., over a sliding or rolling window of time periods. </p><ol><li><p>These features can be useful when you want to capture local trends and patterns in your data. </p></li><li><p>For example, a rolling mean of temperature readings can smooth out daily fluctuations and highlight longer-term trends. Also, it can show the sales inertia of the period of interest.</p></li></ol></li><li><p><strong>Expanding Window Features</strong>: These are similar to rolling window features but the window size increases with time. </p><ol><li><p>These features can be useful when you want to capture all the past information up to the current point. </p></li><li><p>For example, the expanding mean of a stock&#8217;s price can give you its average price since the beginning of the time series, i.e. the historic price average.</p></li></ol></li><li><p><strong>Domain-Specific Features</strong>: These are features that are specific to the problem at hand. </p><ol><li><p>These features can be useful when you have domain knowledge that can help you create informative features. </p></li><li><p>For example, in a stock price prediction problem, features like the company&#8217;s earnings, the sector&#8217;s performance, etc., can be used. </p></li></ol></li><li><p><strong>Time Since an Event</strong>: This feature measures the time that has passed since a particular event occurred. </p><ol><li><p>This can be useful in scenarios where the occurrence of an event significantly impacts the time series data. </p></li><li><p>For example, in predicting website traffic, an event could be a marketing campaign, and the feature could be the time since the campaign started.</p></li></ol></li><li><p><strong>Autoregressive Features</strong>: These are based on the idea that past values have an influence on current values. An autoregressive feature of order&nbsp;<code>p</code>&nbsp;would use the last&nbsp;<code>p</code>&nbsp;values. This is similar to lag features but instead of using the raw values, we use the values predicted by an autoregressive model.</p><ol><li><p>These features can be useful when modeling time series data with dependencies on its own past, such as predicting stock prices where historical trends impact future values.</p></li><li><p>For example, when predicting stock prices, an autoregressive feature of order 3 would consider the last three days' predicted values to capture short-term trends.</p></li></ol></li><li><p><strong>Difference Features</strong>: These features represent the difference between consecutive values in the time series. Differences can highlight trends or abrupt changes in the data.</p><ol><li><p>These features can be useful when identifying and analyzing patterns that emerge as changes or trends between consecutive observations.</p></li><li><p>For example, in weather forecasting, difference features might highlight sudden temperature changes, aiding in the prediction of weather patterns.</p></li></ol></li><li><p><strong>Exponential Moving Averages</strong>: Similar to moving averages, exponential moving averages give more weight to recent observations. </p><ol><li><p>These features can be useful when modelling data with evolving patterns or trends that change more rapidly over time.</p></li><li><p>For example, in predicting user engagement on a website, exponential moving averages can give more weight to recent user activity, providing insights into current trends.</p></li></ol></li><li><p><strong>Seasonal Features</strong>: Seasonal features capture recurring patterns in the data related to specific seasons or periods. You can create binary features indicating whether the observation falls within a particular season or period.</p><ol><li><p>These features can be useful when predicting phenomena influenced by recurring cycles, like sales patterns associated with holidays or special events.</p></li><li><p>For example, in retail sales forecasting, seasonal features could help account for increased sales during holiday seasons, improving the accuracy of sales predictions.</p></li></ol></li><li><p><strong>Cyclical Features</strong>: Some time series data exhibits cyclical patterns that may not align with standard date-related features. Creating cyclical features can help capture such patterns.</p><ol><li><p>These features can be useful when modelling time series data with recurring but non-linear patterns, enabling the model to better capture and understand cyclic variations.</p></li><li><p>For example, in predicting electricity consumption throughout the day, cyclical features involving sine and cosine transformations of the hour could account for the daily cycle, helping the model adapt to the fluctuating demand patterns.</p></li></ol></li></ol><blockquote><p>Do you want to <strong>put all this into practice</strong>? Check the <strong>notebook at the end</strong> of the newsletter &#128071;&#128071;</p></blockquote><h2>&#129302; Tech Round-Up</h2><p>No time to check the news this week?</p><p>This week's TechRoundUp comes full of <strong>AI news</strong>. From AI in literature to robotics in manufacturing, the future is zooming towards us! &#128640;</p><p>Let's dive into the latest Tech highlights you shouldn&#8217;t miss this week &#128165;</p><p>1&#65039;&#8419; <a href="https://techxplore.com/news/2024-01-japan-literary-laureate-unashamed-chatgpt.html">&#120278;&#120309;&#120302;&#120321;&#120282;&#120291;&#120295;'&#120320; &#120287;&#120310;&#120321;&#120306;&#120319;&#120302;&#120319;&#120326; &#120287;&#120306;&#120302;&#120317;</a> &#128218;</p><blockquote><p>Japan's literary world embraces AI!</p><p>A renowned author openly uses ChatGPT for writing assistance. </p><p>She says it's not just a tool; it's a creative partner! &#129302;&#9997;&#65039; </p></blockquote><p>2&#65039;&#8419; <a href="https://techcrunch.com/2024/01/17/samsungs-galaxy-s24-will-feature-google-gemini-powered-ai-features/">&#120294;&#120302;&#120314;&#120320;&#120322;&#120315;&#120308; &#120294;24: &#120276;&#120284;-&#120280;&#120315;&#120309;&#120302;&#120315;&#120304;&#120306;&#120305; &#120280;&#120313;&#120306;&#120308;&#120302;&#120315;&#120304;&#120306; &#128241;</a></p><blockquote><p>The upcoming Galaxy S24 is all set to dazzle with Google Gemini's AI power.</p><p>Expect smarter features and seamless experiences!</p><p>Will it be the first phone to have native LLM? &#129300;</p></blockquote><p>3&#65039;&#8419; <a href="https://techcrunch.com/2024/01/16/pinecones-vector-database-gets-a-new-serverless-architecture/">&#120291;&#120310;&#120315;&#120306;&#120304;&#120316;&#120315;&#120306;'&#120320; &#120291;&#120310;&#120316;&#120315;&#120306;&#120306;&#120319;&#120310;&#120315;&#120308; &#120291;&#120302;&#120321;&#120309; &#127794;</a></p><blockquote><p>Pinecone revamps its vector database with a serverless architecture, promising more efficient data management and scalability.</p><p>A game-changer in data processing! </p></blockquote><p>4&#65039;&#8419; <a href="https://techcrunch.com/2024/01/18/bmw-will-deploy-figures-humanoid-robot-at-south-carolina-plant/">&#120277;&#120288;&#120298;'&#120320; &#120293;&#120316;&#120303;&#120316;&#120321;&#120310;&#120304; &#120293;&#120306;&#120323;&#120316;&#120313;&#120322;&#120321;&#120310;&#120316;&#120315; &#129302;&#128663;</a></p><blockquote><p>BMW's South Carolina plant gears up for a futuristic makeover with Figure's humanoid robots. </p><p>Say hello to a new era of automated efficiency!</p></blockquote><p>5&#65039;&#8419; <a href="https://techcrunch.com/2024/01/17/deepminds-latest-ai-can-solve-geometry-problems/">&#120279;&#120306;&#120306;&#120317;&#120288;&#120310;&#120315;&#120305;'&#120320; &#120282;&#120306;&#120316;&#120314;&#120306;&#120321;&#120319;&#120310;&#120304; &#120282;&#120306;&#120315;&#120310;&#120322;&#120320; &#128290;</a></p><blockquote><p>DeepMind's AlphaGeometry AI cracks complex geometry problems, rivaling Olympiad champs. </p><p>A breakthrough in AI problem-solving and mathematical reasoning! </p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/rfeers&quot;,&quot;text&quot;:&quot;Follow Josep on &#120143;&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://twitter.com/rfeers"><span>Follow Josep on &#120143;</span></a></p><div><hr></div><h3><strong>Learn Advanced Machine Learning Concepts!*</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png" width="598" height="312.9642857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:306085,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Have you outgrown introductory courses? Ready for a deeper dive?</strong></p><ul><li><p>Explore <a href="https://www.trainindata.com/p/feature-engineering-for-machine-learning?affcode=1218302_nkuq2dk8">feature engineering</a> and <a href="https://www.trainindata.com/p/feature-selection-for-machine-learning?affcode=1218302_nkuq2dk8">feature selection</a> methods</p></li><li><p>Discover tactics for <a href="https://www.trainindata.com/p/hyperparameter-optimization-for-machine-learning?affcode=1218302_nkuq2dk8">optimizing hyperparameters</a> and addressing<a href="https://www.trainindata.com/p/machine-learning-with-imbalanced-data?affcode=1218302_nkuq2dk8"> imbalanced data</a></p></li><li><p>Master fundamental <a href="https://www.trainindata.com/p/all-our-courses?affcode=1218302_nkuq2dk8">machine learning methods</a> and their Python application</p></li></ul><p>Enroll today and take the next step in mastering the world of data science!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;text&quot;:&quot;Unlock Knowledge!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8"><span>Unlock Knowledge!</span></a></p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h2>&#128736;&#65039; Do It Yourself!</h2><p>Time for you to <strong>play with the code</strong>!</p><p>I share with you the <strong>notebook</strong> with almost everything you need. Your task is to <strong>make it work </strong>and get the results I shared in this newsletter. I provide you with <strong>some hints</strong>. The best way of learning is by checking the documentation.</p><p>Contact me at <strong>david@mlpills.dev</strong> if you need any help!</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-8-feature-engineering-for-time">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #7 - Data distribution]]></title><description><![CDATA[In the last DIY, we introduced the concept of Exploratory Data Analysis (EDA). This is a fundamental step in Data Science. One of the key components of the EDA is finding out about the data distribution. Let&#8217;s inspect this in more depth!]]></description><link>https://mlpills.substack.com/p/diy-7-data-distribution</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-7-data-distribution</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Thu, 07 Dec 2023 13:07:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/dc90ce14-90ce-498e-854c-e987a6034103_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128138; Pill of the week</h2><p>In the last <a href="https://mlpills.substack.com/p/diy-6-how-to-train-a-regression-model">DIY</a>, we introduced the concept of Exploratory Data Analysis (EDA). This is a fundamental step in Data Science. One of the key components of the EDA is finding out about the data distribution.</p><p><a href="https://twitter.com/daansan_ml/status/1728337758236881289">Here </a>is a reminder of what EDA is:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://twitter.com/daansan_ml/status/1728337758236881289" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kRFY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kRFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg" width="428" height="510.19867549668874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:755,&quot;resizeWidth&quot;:428,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://twitter.com/daansan_ml/status/1728337758236881289&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!kRFY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kRFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9ca370-fede-4f84-baf0-69b299f39e9f_755x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s inspect the data distribution in more depth!</p><h3>What is data distribution and why do we need to be aware of it?</h3><p>It refers to <strong>how data points are spread across various values</strong>, showing the frequency of each value. Being aware of data distributions is crucial because it <strong>informs the choice of statistical methods and algorithms for analysis</strong>, ensuring <strong>accurate and meaningful insights</strong>. Different distributions, like normal, skewed, or uniform, can significantly impact data interpretation and the effectiveness of predictive models.</p><p>You can check <a href="https://twitter.com/daansan_ml/status/1729054971071738134">here </a>some of the <a href="https://twitter.com/daansan_ml/status/1729054971071738134">most common distributions in data science</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://twitter.com/daansan_ml/status/1729054971071738134" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0j-R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0j-R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg" width="414" height="533.4230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:414,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://twitter.com/daansan_ml/status/1729054971071738134&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!0j-R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0j-R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a375b4-4338-4c03-9bdd-e6e431bdf783_1529x1970.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Check the distribution of your data</h3><p>Here you&#8217;ll find out how you can investigate how data in each of your features are distributed.</p><p>Let&#8217;s start by importing the essential libraries for data analysis. These are:</p><ul><li><p>Pandas is a powerful data manipulation library that provides data structures for efficient data handling. </p></li><li><p>Matplotlib and Seaborn are visualization libraries used for creating various types of plots.</p></li><li><p>SciPy is employed for its statistical functions, including normality tests.</p></li></ul><h4>Histograms and Density Plots</h4><p>Histograms provide a visual representation of the distribution of each variable. By specifying the number of bins, we can control the granularity of the representation. Density Plots, overlaid on histograms, provide a smoothed representation of the data distribution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j02_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j02_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 424w, https://substackcdn.com/image/fetch/$s_!j02_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 848w, https://substackcdn.com/image/fetch/$s_!j02_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 1272w, https://substackcdn.com/image/fetch/$s_!j02_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j02_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png" width="598" height="325.47860696517415" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:547,&quot;width&quot;:1005,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j02_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 424w, https://substackcdn.com/image/fetch/$s_!j02_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 848w, https://substackcdn.com/image/fetch/$s_!j02_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 1272w, https://substackcdn.com/image/fetch/$s_!j02_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadb9480-0545-4aec-af47-2d7b64a4e304_1005x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Box Plots</h4><p>Box Plots visually summarize the distribution of each variable by displaying the median, quartiles, and potential outliers. They are handy for identifying the spread of data and understanding central tendencies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hwXH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hwXH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 424w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 848w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 1272w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hwXH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png" width="586" height="388.38872691933915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:682,&quot;width&quot;:1029,&quot;resizeWidth&quot;:586,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hwXH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 424w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 848w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 1272w, https://substackcdn.com/image/fetch/$s_!hwXH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f96fc3d-f985-47df-ade5-e17476b06525_1029x682.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Summary Statistics</h4><p>Summary Statistics provide a snapshot of key metrics such as mean, standard deviation, and quartiles. This step helps gain a quick understanding of the central tendencies and variability within the dataset.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q1ql!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q1ql!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 424w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 848w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 1272w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q1ql!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png" width="409" height="180" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/806e717b-4228-4486-844f-c16b6f1abed7_409x180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:180,&quot;width&quot;:409,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12017,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q1ql!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 424w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 848w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 1272w, https://substackcdn.com/image/fetch/$s_!Q1ql!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806e717b-4228-4486-844f-c16b6f1abed7_409x180.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Skewness and Kurtosis</h4><p>Skewness measures the asymmetry of the distribution, indicating whether it is skewed to the left or right. Kurtosis provides insights into the tails of the distribution, helping to understand the peakedness.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fgqj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fgqj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 424w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 848w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 1272w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fgqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png" width="275" height="89" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:89,&quot;width&quot;:275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4907,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fgqj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 424w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 848w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 1272w, https://substackcdn.com/image/fetch/$s_!Fgqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66c8b60f-a058-460a-adcc-e9425750cb12_275x89.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>Normality Tests with Q-Q Plots</h4><p>Normality Tests, such as the Shapiro-Wilk test, assess whether a variable follows a normal distribution. Q-Q Plots visually inspect the degree of agreement between the observed data and the theoretical quantiles of a normal distribution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mDR0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mDR0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 424w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 848w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 1272w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mDR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png" width="597" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:597,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mDR0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 424w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 848w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 1272w, https://substackcdn.com/image/fetch/$s_!mDR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7808224-87c4-47a8-9dd5-80b50a3c1449_597x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Scatter Plots</h4><p>Scatter Plots provide a visual representation of relationships between pairs of variables. The pairplot from Seaborn generates scatter plots for all combinations of variables, offering insights into potential correlations and patterns.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Utk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Utk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 424w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 848w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 1272w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Utk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png" width="654" height="679.5951417004048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:741,&quot;resizeWidth&quot;:654,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Utk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 424w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 848w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 1272w, https://substackcdn.com/image/fetch/$s_!6Utk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c9add61-21b0-479a-afb8-e489a19aa1aa_741x770.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These technical details provide a deeper understanding of each step in the Exploratory Data Analysis process, elucidating the specific functions and methodologies used for data exploration and visualization.</p><p>You can check how to build each of them in the <strong>&#129680;Jupyter notebook</strong> you&#8217;ll find at the <strong>end of the newsletter</strong>!</p><p></p><h3>&#129302; Tech Round-Up </h3><p>This second week's TechRoundUp comes full of <strong>AI news</strong>. </p><p>Let's discover the top AI &amp; tech highlights of the week! </p><p>From Amazon to Google search, the future of tech is unfolding before our eyes.</p><p>1&#65039;&#8419; <a href="https://www.theverge.com/2023/12/6/23990466/google-gemini-llm-ai-model">&#120282;&#120316;&#120316;&#120308;&#120313;&#120306;'&#120320; &#120282;&#120306;&#120314;&#120310;&#120315;&#120310; &#120276;&#120284;: &#120276; &#120282;&#120302;&#120314;&#120306; &#120278;&#120309;&#120302;&#120315;&#120308;&#120306;&#120319; &#120316;&#120319; &#120285;&#120322;&#120320;&#120321; &#120283;&#120326;&#120317;&#120306;?</a></p><blockquote><p>Google unveils Gemini, their latest LLM &#127381;</p><p>With advanced capabilities, Gemini is set to beat GPT-4 and revolutionize the way we interact with AI. </p></blockquote><p>2&#65039;&#8419; <a href="https://techcrunch.com/2023/12/05/microsoft-bings-deep-search-offers-more-comprehensive-answers-complex-search-queries-gpt-4/">&#120288;&#120310;&#120304;&#120319;&#120316;&#120320;&#120316;&#120307;&#120321; &#120277;&#120310;&#120315;&#120308;'&#120320; &#120282;&#120291;&#120295;-4 &#120284;&#120315;&#120321;&#120306;&#120308;&#120319;&#120302;&#120321;&#120310;&#120316;&#120315;</a></p><blockquote><p>Microsoft Bing amps up search with GPT-4 &#129302;<br><br>Their new 'Deep Search' digs deeper, delivering comprehensive answers to complex queries.</p></blockquote><p>3&#65039;&#8419; <a href="https://techcrunch.com/2023/12/05/meta-and-ibm-form-an-ai-alliance-but-to-what-end/">&#120288;&#120306;&#120321;&#120302; &#120302;&#120315;&#120305; &#120284;&#120277;&#120288;'&#120320; &#120276;&#120284; &#120276;&#120313;&#120313;&#120310;&#120302;&#120315;&#120304;&#120306;</a></p><blockquote><p>Meta and IBM's AI Alliance is here! &#129309;</p><p>This collaboration fosters open innovation in AI, uniting a wide range of sectors for responsible, inclusive AI advancement </p></blockquote><p>4&#65039;&#8419;<a href="https://techcrunch.com/2023/11/29/with-neptune-analytics-aws-combines-the-power-of-vector-search-and-graph-data/"> &#120276;&#120298;&#120294; &#120289;&#120306;&#120317;&#120321;&#120322;&#120315;&#120306; &#120276;&#120315;&#120302;&#120313;&#120326;&#120321;&#120310;&#120304;&#120320;</a></p><blockquote><p>A new era for data! &#128165;<br><br>It combines vector search and graph data, offering unique insights and tackling high-dimensional data and relationships with ease </p></blockquote><p>5&#65039;&#8419; <a href="https://techxplore.com/news/2023-12-scientists-personal-virtual-reality-based-safety.html">&#120297;&#120293;-&#120277;&#120302;&#120320;&#120306;&#120305; &#120294;&#120302;&#120307;&#120306;&#120321;&#120326; &#120295;&#120319;&#120302;&#120310;&#120315;&#120310;&#120315;&#120308; &#120310;&#120315; &#120286;&#120316;&#120319;&#120306;&#120302;</a></p><blockquote><p>VR meets construction safety training in Korea! &#128679;<br><br>Researchers propose a machine learning model using VR and biometrics for personalized training.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/rfeers&quot;,&quot;text&quot;:&quot;Follow Josep on &#120143;&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://twitter.com/rfeers"><span>Follow Josep on &#120143;</span></a></p><div><hr></div><h3><strong>Learn Advanced Machine Learning Concepts!*</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png" width="598" height="312.9642857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:306085,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Have you outgrown introductory courses? Ready for a deeper dive?</strong></p><ul><li><p>Explore <a href="https://www.trainindata.com/p/feature-engineering-for-machine-learning?affcode=1218302_nkuq2dk8">feature engineering</a> and <a href="https://www.trainindata.com/p/feature-selection-for-machine-learning?affcode=1218302_nkuq2dk8">feature selection</a> methods</p></li><li><p>Discover tactics for <a href="https://www.trainindata.com/p/hyperparameter-optimization-for-machine-learning?affcode=1218302_nkuq2dk8">optimizing hyperparameters</a> and addressing<a href="https://www.trainindata.com/p/machine-learning-with-imbalanced-data?affcode=1218302_nkuq2dk8"> imbalanced data</a></p></li><li><p>Master fundamental <a href="https://www.trainindata.com/p/all-our-courses?affcode=1218302_nkuq2dk8">machine learning methods</a> and their Python application</p></li></ul><p>Enroll today and take the next step in mastering the world of data science!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;text&quot;:&quot;Unlock Knowledge!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8"><span>Unlock Knowledge!</span></a></p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h2>&#128736;&#65039; Do It Yourself!</h2><p>Time for you to <strong>play with the code</strong>!</p><p>I will share with you the <strong>notebook</strong> with almost everything you need. Your task is to <strong>make it work </strong>and get the results I shared in this newsletter. I provide you with <strong>some hints</strong>. The best way of learning is by checking the documentation. </p><p>Enjoy and good luck!</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-7-data-distribution">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #6 - How to train a regression model and make predictions?]]></title><description><![CDATA[Welcome to the sixth DIY issue!]]></description><link>https://mlpills.substack.com/p/diy-6-how-to-train-a-regression-model</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-6-how-to-train-a-regression-model</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Fri, 24 Nov 2023 10:00:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1392147c-1175-41e7-91af-4c4320e9bf37_1920x830.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the sixth DIY issue! Today we are going back to basics. This will serve as a base to get into more advanced concepts in future issues. I understand that this may be too simple for some of you, but I considered it necessary to bring everyone to a minimum level. </p><h2>&#128138; Pill of the week: Build a regression ML model</h2><p>We are going to build a very simple model that will allow us to predict the salary of a person based on their years of experience and age.</p><p>You can get the notebook at the end of the issue &#128071;</p><pre><code><strong>import</strong> pandas <strong>as</strong> pd
<strong>from</strong> sklearn.model_selection <strong>import</strong> train_test_split
<strong>from</strong> sklearn.linear_model <strong>import</strong> LinearRegression</code></pre><p>Read dataset. You can download it <a href="https://www.kaggle.com/datasets/codebreaker619/salary-data-with-age-and-experience">here</a>. It is a very simple dataset with three columns:</p><ul><li><p>Years of experience</p></li><li><p>Age</p></li><li><p>Salary: this is the target, the values we want to predict. During the training, this will work as the labels, to teach the model how to predict it based on the other features. </p></li></ul><p>We can read this dataset (CSV file) easily using pandas <code>read_csv</code>.</p><pre><code>df <strong>=</strong> pd.read_csv('/kaggle/input/salary-data-with-age-and-experience/Salary_Data.csv')</code></pre><p>It is always good to have a look at your data to verify that it was correctly imported. You can check the top 5 or 10 rows (<code>df.head(5)</code>), the bottom 5 or 10 rows (<code>df.tail(5)</code>) or simply a random sample (<code>df.sample(5)</code>). Here we checked 5 random rows of the dataset:</p><pre><code>df.sample(5)</code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SPxP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SPxP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 424w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 848w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 1272w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SPxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png" width="279" height="220" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f02f3a8-8956-4402-a657-1200167269da_279x220.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:220,&quot;width&quot;:279,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8123,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SPxP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 424w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 848w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 1272w, https://substackcdn.com/image/fetch/$s_!SPxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f02f3a8-8956-4402-a657-1200167269da_279x220.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Exploratory Data Analysis</h3><p>Exploratory Data Analysis (<strong>EDA</strong>) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods. This approach is essential for <strong>understanding the data's underlying structure and characteristics</strong> before applying more formal statistical or Machine Learning methods. Some key points that are normally checked are:</p><ol><li><p><strong>Distribution of Data:</strong> Assessing the distribution of data (e.g., normal, skewed) using histograms, box plots, and summary statistics helps understand the central tendency and variability.</p></li><li><p><strong>Missing Values:</strong> Identifying and addressing missing data is crucial, as it can significantly affect analyses. Techniques include imputation, deletion, or understanding the reasons for missingness.</p></li><li><p><strong>Outliers:</strong> Detecting and examining outliers to understand their impact on the dataset and deciding how to handle them (e.g., removal, transformation).</p></li><li><p><strong>Correlations:</strong> Analyzing correlations between variables using correlation coefficients and scatter plots to identify relationships and potential dependencies.</p></li><li><p><strong>Patterns and Trends:</strong> Looking for patterns, trends, or anomalies in the data, which can be visualized using line graphs, bar charts, or time-series analysis.</p></li><li><p><strong>Group Comparisons:</strong> Comparing metrics across different groups (e.g., categories, time periods) to identify significant differences or similarities.</p></li><li><p><strong>Data Type Assessment:</strong> Understanding the types of data (numerical, categorical, ordinal) and their appropriate treatment in analysis.</p></li><li><p><strong>Data Quality Assessment:</strong> Evaluating data quality to identify errors or inconsistencies that may need correction.</p></li><li><p><strong>Visual Exploration:</strong> Employing various visualization tools (like heatmaps, pair plots) to intuitively understand complex relationships in the data.</p></li></ol><p>The purpose of this issue is to introduce the general Data Science process, so we will not deep dive into the EDA. However, we can do a very basic one. For example, we can check the datatypes, missing values and number of records using the "info" method:</p><pre><code>df.info()</code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O5Bj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O5Bj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 424w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 848w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 1272w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O5Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png" width="410" height="195" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:195,&quot;width&quot;:410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12061,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O5Bj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 424w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 848w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 1272w, https://substackcdn.com/image/fetch/$s_!O5Bj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb1a15f0-e898-42ab-ad8f-be9f44d460f6_410x195.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We can see that there are no null or missing values, all columns contain numbers (floats or integers) and there is a total of 30 rows. I know, this is very simple and not a real-world example! Normally this won&#8217;t be that easy&#8230; </p><p>Before continuing&#8230; <strong>why are missing values problematic?</strong> Check <a href="https://twitter.com/daansan_ml/status/1725487502746866028">this</a> if you want to find out more:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://twitter.com/daansan_ml/status/1725487502746866028" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6lQ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6lQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg" width="464" height="508.728813559322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:944,&quot;resizeWidth&quot;:464,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://twitter.com/daansan_ml/status/1725487502746866028&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!6lQ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6lQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5510cb-01e4-49bb-a700-e3a0f531fa6a_944x1035.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can also use the method "describe" to check some descriptive statistics. For example, the average value, the maximum and minimum, etc.</p><pre><code>df.describe()</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NocS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NocS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 424w, https://substackcdn.com/image/fetch/$s_!NocS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 848w, https://substackcdn.com/image/fetch/$s_!NocS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 1272w, https://substackcdn.com/image/fetch/$s_!NocS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NocS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png" width="378" height="317" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:317,&quot;width&quot;:378,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16026,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NocS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 424w, https://substackcdn.com/image/fetch/$s_!NocS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 848w, https://substackcdn.com/image/fetch/$s_!NocS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 1272w, https://substackcdn.com/image/fetch/$s_!NocS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dcd6401-8a2c-407c-bfe2-55637148012c_378x317.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a good way of quickly checking the data distribution, but it would be good to also do some plots to understand it better. But it is fine for today&#8230; We will do this in future issues.</p><h3><strong>Learn Advanced Machine Learning Concepts!*</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png" width="598" height="312.9642857142857" data-attrs="{&quot;src&quot;:&quot;https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:306085,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GuA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!GuA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79b4d2db-d535-4566-8ab0-af0662d5ebf0_2400x1256.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Have you outgrown introductory courses? Ready for a deeper dive?</strong></p><ul><li><p>Explore <a href="https://www.trainindata.com/p/feature-engineering-for-machine-learning?affcode=1218302_nkuq2dk8">feature engineering</a> and <a href="https://www.trainindata.com/p/feature-selection-for-machine-learning?affcode=1218302_nkuq2dk8">feature selection</a> methods</p></li><li><p>Discover tactics for <a href="https://www.trainindata.com/p/hyperparameter-optimization-for-machine-learning?affcode=1218302_nkuq2dk8">optimizing hyperparameters</a> and addressing<a href="https://www.trainindata.com/p/machine-learning-with-imbalanced-data?affcode=1218302_nkuq2dk8"> imbalanced data</a></p></li><li><p>Master fundamental <a href="https://www.trainindata.com/p/all-our-courses?affcode=1218302_nkuq2dk8">machine learning methods</a> and their Python application</p></li></ul><p>Enroll today and take the next step in mastering the world of data science!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;text&quot;:&quot;Unlock Knowledge!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8"><span>Unlock Knowledge!</span></a></p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h3>Split data in train and test</h3><p>You need to <strong>train your model using only some data (training set)</strong>, and then <strong>check if it behaves properly with the other part that you left aside (testing set)</strong>. This is to check whether your model will behave correctly in the real world, with data that hasn't been seen before.</p><p>First, we divide the dataset into the label <em>y</em> (salary) and features <em>X</em> (years of experience and age):</p><pre><code>X <strong>=</strong> df.iloc[:,:<strong>-</strong>1]
y <strong>=</strong> df.iloc[:,<strong>-</strong>1]</code></pre><p>Now we need to get the training set and the testing set. We need the majority of the data in the training set because it is the one the model will use to learn. For example, we can use 80% of the data for training and 20% for testing. This needs to be adapted depending on the data. If you have a lot of data you could reduce the testing set because even a small percentage will include many samples. </p><pre><code>X_train, X_test, y_train, y_test <strong>=</strong> train_test_split(X, y, test_size<strong>=</strong>0.20, random_state<strong>=</strong>1)</code></pre><h3>Train the model</h3><p>We will use <strong><a href="https://mlpills.dev/machine-learning/linear-regression/">Linear Regression</a></strong>, which is the simplest model. I introduced it <a href="https://twitter.com/daansan_ml/status/1725164377865626023">here</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://twitter.com/daansan_ml/status/1725164377865626023" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PNUI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 424w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 848w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 1272w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PNUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png" width="540" height="540.6775407779172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:798,&quot;width&quot;:797,&quot;resizeWidth&quot;:540,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://twitter.com/daansan_ml/status/1725164377865626023&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!PNUI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 424w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 848w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 1272w, https://substackcdn.com/image/fetch/$s_!PNUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bc253c8-09fc-42c8-90c2-d76ae60497f6_797x798.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find more details <a href="https://mlpills.dev/machine-learning/linear-regression/">here</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://mlpills.dev/machine-learning/linear-regression/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2URg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 424w, https://substackcdn.com/image/fetch/$s_!2URg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 848w, https://substackcdn.com/image/fetch/$s_!2URg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 1272w, https://substackcdn.com/image/fetch/$s_!2URg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2URg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png" width="1456" height="396" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c85f9119-6252-47c0-a235-13b60e131bee_1557x423.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:396,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:558348,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://mlpills.dev/machine-learning/linear-regression/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2URg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 424w, https://substackcdn.com/image/fetch/$s_!2URg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 848w, https://substackcdn.com/image/fetch/$s_!2URg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 1272w, https://substackcdn.com/image/fetch/$s_!2URg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc85f9119-6252-47c0-a235-13b60e131bee_1557x423.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>First, let's instantiate (load) the model. We could also specify some parameters if we wished. You can get more details <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html">here</a>.</p><pre><code>lr <strong>=</strong> LinearRegression()</code></pre><p>Fit the training data. This is the model training, which is the process of teaching the machine learning algorithm to make predictions or decisions by learning from a dataset. This involves using a dataset to tune the parameters of the model so that it can accurately generalize from the training data to new, unseen data.</p><pre><code>lr <strong>=</strong> lr.fit(X_train, y_train)</code></pre><p>After fitting the training data, we should assess the performance of the model on the testing set. This is to make sure that it can accurately generalise to unseen data (testing set).</p><pre><code>lr.score(X_test, y_test)</code></pre><p>For our dataset, it achieved an R-squared of 0.77. R-squared or coefficient of determination of the prediction is a statistical measure that is commonly used to assess the performance of regression models. </p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-6-how-to-train-a-regression-model">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #5 - Set a benchmark for your model]]></title><description><![CDATA[Do It Yourself is part of Machine Learning Pills: mlpills.dev]]></description><link>https://mlpills.substack.com/p/diy-5-set-a-benchmark-for-your-model</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-5-set-a-benchmark-for-your-model</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Thu, 28 Sep 2023 12:01:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c950fbfa-ce48-4501-b50e-b882a25bd44a_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128138; Pill of the week</h2><p>In this issue, we will talk about <strong>base models for Time Series forecasting</strong>.</p><h3>What are base models?</h3><p>In Time Series Analysis and Forecasting, a base model is often a <strong>simple model used as a benchmark to compare the performance of more complex models</strong>. Here are several basic or naive methods that are commonly used as base models in time series forecasting:</p><h4><strong>1. Naive Forecast (NF)</strong></h4><ul><li><p>This method simply uses the last observed value as the forecast for all future time points. It is naive because it does not consider any other information from the past.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wtna!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wtna!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 424w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 848w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 1272w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wtna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png" width="149" height="64.72398190045249" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8099955-915d-479e-90de-efe6db4b4300_221x96.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:96,&quot;width&quot;:221,&quot;resizeWidth&quot;:149,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wtna!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 424w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 848w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 1272w, https://substackcdn.com/image/fetch/$s_!Wtna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8099955-915d-479e-90de-efe6db4b4300_221x96.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div></li></ul><p></p><h4><strong>2. Simple Average (SA)</strong></h4><ul><li><p>This method calculates the average of all past observations and uses this value as the forecast. It assumes that future values will revolve around the average of past values.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cZEe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cZEe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 424w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 848w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 1272w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cZEe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png" width="194" height="80.35502958579882" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7f11dae-46f6-4981-a03d-370002b70237_338x140.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:140,&quot;width&quot;:338,&quot;resizeWidth&quot;:194,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cZEe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 424w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 848w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 1272w, https://substackcdn.com/image/fetch/$s_!cZEe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f11dae-46f6-4981-a03d-370002b70237_338x140.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>where &#119879; is the number of observations.<br></p></li></ul><h4><strong>3. Moving Average (MA)</strong></h4><ul><li><p>This method averages the last <em>n</em> observations to forecast the next value. It smoothens short-term fluctuations and highlights longer-term trends or cycles.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JK8r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JK8r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 424w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 848w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 1272w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JK8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png" width="241" height="89.86440677966101" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:154,&quot;width&quot;:413,&quot;resizeWidth&quot;:241,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JK8r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 424w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 848w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 1272w, https://substackcdn.com/image/fetch/$s_!JK8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492e8fa3-7f88-43d5-83d9-12ac04a2242a_413x154.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul><p></p><h4><strong>4. Exponential Smoothing (ES)</strong></h4><ul><li><p>This method gives more weight to the most recent observations and less weight to the older ones, with weights decreasing exponentially.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_SZA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_SZA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 424w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 848w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 1272w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_SZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png" width="273" height="52.28953229398664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:86,&quot;width&quot;:449,&quot;resizeWidth&quot;:273,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_SZA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 424w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 848w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 1272w, https://substackcdn.com/image/fetch/$s_!_SZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c77b093-5fa5-4631-8798-68beb6ad924c_449x86.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>where <em>&#945;</em> is the smoothing parameter.</p></li></ul><div><hr></div><p><strong>Do you want to take your ML skills to the next level?</strong> Discover <strong><a href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8">Train In Data</a></strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ipDC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 424w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 848w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 1272w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ipDC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png" width="300" height="86" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:86,&quot;width&quot;:300,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ipDC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 424w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 848w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 1272w, https://substackcdn.com/image/fetch/$s_!ipDC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb866f26-43e4-453e-85d9-645abb8c4b1c_300x86.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Self-passed online learning courses, featuring in-depth modules on feature engineering and selection, working with imbalanced data, optimizing hyperparameters, time series forecasting and more. A great complement to MLPills to take your skills to the next level!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8&quot;,&quot;text&quot;:&quot;Check Train In Data's courses&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.trainindata.com/courses?affcode=1218302_nkuq2dk8"><span>Check Train In Data's courses</span></a></p><p><em>*Sponsored: by purchasing any of their courses you would also be supporting MLPills.</em></p><div><hr></div><h4><strong>5. Seasonal Naive Forecast (SNF)</strong></h4><ul><li><p>This method assumes that the future value will be equal to the last observed value from the same season.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6bEI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6bEI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 424w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 848w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 1272w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6bEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png" width="236" height="57.94011976047904" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:82,&quot;width&quot;:334,&quot;resizeWidth&quot;:236,&quot;bytes&quot;:5833,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6bEI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 424w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 848w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 1272w, https://substackcdn.com/image/fetch/$s_!6bEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2525bd39-bd3e-4289-9228-8c3fec3a6bb7_334x82.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p> where <em>s</em> is the seasonal period and <em>m</em> is the forecast horizon.<br></p></li></ul><h4><strong>6. Drift Method (DM)</strong></h4><ul><li><p>This method extrapolates the line connecting the first and the last observation to forecast future values.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cfoS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cfoS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 424w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 848w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 1272w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cfoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png" width="322" height="82.01886792452831" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:135,&quot;width&quot;:530,&quot;resizeWidth&quot;:322,&quot;bytes&quot;:9734,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!cfoS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 424w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 848w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 1272w, https://substackcdn.com/image/fetch/$s_!cfoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35cfdaa4-57ae-44dc-ae71-5eabb66def99_530x135.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>where <em>h</em> is the forecast horizon and <em>T</em> is the number of observations.<br></p></li></ul><h4><strong>7. Random Walk (RW)</strong></h4><ul><li><p>This method assumes that changes in the time series are random, and future values are unpredictable, being equal to the last observed value plus a random error.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cK-4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cK-4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 424w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 848w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 1272w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cK-4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png" width="200" height="50.6578947368421" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:77,&quot;width&quot;:304,&quot;resizeWidth&quot;:200,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cK-4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 424w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 848w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 1272w, https://substackcdn.com/image/fetch/$s_!cK-4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4091b-2eb5-4b25-b3b2-7f68c0d31f40_304x77.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>where <em>&#1013;&#8348;</em>&#8203; is a white noise error term.</p><p></p></li></ul><h4><strong>8. Mean Reversion (MR)</strong></h4><ul><li><p>This method assumes that the series will revert to its mean over time, forecasting future values based on the mean and the last observed value.</p></li><li><p><strong>Formula:</strong> </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tVBw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tVBw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 424w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 848w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 1272w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tVBw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png" width="294" height="52.60294117647059" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:73,&quot;width&quot;:408,&quot;resizeWidth&quot;:294,&quot;bytes&quot;:7908,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tVBw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 424w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 848w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 1272w, https://substackcdn.com/image/fetch/$s_!tVBw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7724c84b-b45d-42c3-a4de-b5ad1ea4f956_408x73.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>where <em>&#956;</em> is the mean and <em>&#946;</em> is the reversion coefficient.</p></li></ul><p></p><h3><strong>How to Choose a Base Model?</strong></h3><ul><li><p><strong>Simplicity:</strong> Base models are typically simple and easy to understand.</p></li><li><p><strong>Data Characteristics:</strong> Consider the characteristics of the data, such as seasonality and trend, when choosing a base model.</p></li><li><p><strong>Performance Metric:</strong> Evaluate the base model using appropriate performance metrics like MAE, RMSE, or MAPE to set a benchmark for more sophisticated models.</p></li></ul><p>Here is a simple diagram to help you choose the best benchmark model. You could also select multiple ones and keep the one that shows a better performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dKuH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dKuH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 424w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 848w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 1272w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dKuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png" width="422" height="391.05333333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1050,&quot;resizeWidth&quot;:422,&quot;bytes&quot;:98825,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dKuH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 424w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 848w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 1272w, https://substackcdn.com/image/fetch/$s_!dKuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c4acf3-bbb9-40ae-8b12-1c1bc2a32895_1050x973.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3><strong>Conclusion</strong></h3><p>These naive methods serve as a starting point and help in establishing a baseline performance. Any sophisticated model should ideally perform significantly better than these naive models to be considered useful.</p><p></p><p><strong>&#128073; Check the next section to put what you've learned into practice!</strong></p><blockquote><p>This could perfectly be a Data Science interview question. You can check additional questions on the <a href="https://mlpills.dev/interview-questions/">website</a>!</p></blockquote><p>If you like it, subscribe for free to support us:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mlpills.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mlpills.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>&#128736;&#65039; Do It Yourself!</h2><p>Now that you know the theory about <strong>base models</strong> and <strong>when you should use</strong> each of them, it&#8217;s time to apply it!</p><blockquote><p><strong>How does it work?</strong></p><ul><li><p>&#128220;I will share a notebook with some guided initial steps.</p></li><li><p>&#128204;I will ask you some tasks that you should complete.</p></li><li><p>&#127919;I will share the outcome so you can check if you did well or not!</p></li></ul></blockquote><p>Now it&#8217;s your turn. <strong>Let&#8217;s play!</strong></p><p>I want you to apply each of the five techniques I previously introduced:</p><ol><li><p><strong>&#128424;Naive Forecast</strong> - Difficulty &#11088;</p></li><li><p><strong>&#9878;&#65039;Simple Average</strong> - Difficulty &#11088;</p></li><li><p><strong>&#129695;Moving Average</strong> - Difficulty &#11088;</p></li><li><p><strong>&#129361;Exponential Smoothing</strong> - Difficulty &#11088;&#11088;</p></li><li><p><strong>&#127958;Seasonal Naive Forecast</strong> - Difficulty &#11088;&#11088;</p></li><li><p><strong>&#128733;Drift Method</strong> - Difficulty &#11088;&#11088;</p></li><li><p><strong>&#127922;Random Walk</strong> - Difficulty &#11088;</p></li><li><p><strong>&#127906;Mean Reversion</strong> - Difficulty &#11088;</p><p></p></li></ol><p>I provide you with the average monthly temperature in London for the last 40 years (1982 - 2022). <strong>Create eight base models to see which one behaves better when forecasting the last 2 years</strong>. You will then be able to use this model as the benchmark when you train a more complex model like SARIMA, Holt-Winters, LSTM&#8230;</p><p>You should get something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cSbm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cSbm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 424w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 848w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 1272w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cSbm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png" width="885" height="491" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:491,&quot;width&quot;:885,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cSbm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 424w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 848w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 1272w, https://substackcdn.com/image/fetch/$s_!cSbm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b642ac6-97b9-44d1-8060-2aab614d0f31_885x491.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here you can find a <strong>Kaggle notebook</strong> with everything you need below:</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-5-set-a-benchmark-for-your-model">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #4 - Encode your data]]></title><description><![CDATA[Do It Yourself is part of Machine Learning Pills: mlpills.dev]]></description><link>https://mlpills.substack.com/p/diy-4-encode-your-data</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-4-encode-your-data</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Thu, 14 Sep 2023 11:00:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3d1a44ce-3409-4bdc-a215-291f9fca17d9_1920x1244.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the fourth issue of DIY (Do It Yourself). In this section, every week, a key concept in Data Science will be introduced to you. After that, you will be able to practice what you learned! We continue this section with data or feature encoding. I hope you enjoy it!</p><h2>&#128138; Pill of the week</h2><p>In this issue, we will deal with <strong>encoding</strong>. </p><h3>What are categorical variables?</h3><p>Categorical variables are those that take on a limited, fixed number of distinct categories or classes. They are often human-understandable labels such as "red," "blue," "apple," "banana," etc., and don't have a natural numerical representation. To convert these variables into a format that can be provided to machine learning algorithms, we use encoding. </p><h3>What is encoding?</h3><p>Encoding is the process of converting data from one form to another. In the context of machine learning and data science, encoding often refers to the transformation of categorical variables into a numerical format that can be easily used by algorithms. Most machine learning algorithms require numerical input and output variables, hence the need for encoding.</p><p>The <strong>main goal of encoding</strong> in machine learning is to <strong>translate the data into a form that is useful and understandable by the algorithm</strong>, without distorting the characteristic properties and relationships in the data.</p><p>It's worth noting that improper encoding can introduce biases or inaccuracies in the model, so choosing the right encoding method is crucial.</p><h3>Main techniques</h3><p></p><h4><strong>1. Label Encoding</strong></h4><p>In this technique, each unique category is mapped to an integer starting from 0. It does not assume any relationship of order or magnitude between the categories. Categories are numbered arbitrarily.</p><p><strong>When to Use</strong>: Best suited for ordinal data where the order matters but can be used for nominal data when the algorithm can handle it correctly (e.g., decision trees).</p><p><strong>Pros</strong>:</p><ul><li><p>Simple to implement.</p></li><li><p>Does not increase dimensionality.</p></li></ul><p><strong>Cons</strong>:</p><ul><li><p>Because it introduces ordinality, it can lead to a misleading sense of magnitude or ordinal relationship.</p></li></ul><p></p><h4><strong>2. One-Hot Encoding</strong></h4><p>It converts each unique category into a new binary column of 1 or 0.</p><p><strong>When to Use</strong>: For nominal categories where no ordinal relationship exists.</p><p><strong>Pros</strong>:</p><ul><li><p>Easy to use and interpret.</p></li><li><p>No ordinal relationships are introduced.</p></li></ul><p><strong>Cons</strong>:</p><ul><li><p>Dimensionality: Introduces as many new columns as there are unique values in the original column, which can explode dimensionality.</p></li><li><p>Multicollinearity: The encoding can introduce multicollinearity which can be problematic for certain algorithms (like linear regression).</p></li></ul><p></p><h4><strong>3. Ordinal Encoding</strong></h4><p>It maps each unique category to an integer based on the inherent ordinal nature of the category.</p><p><strong>When to Use</strong>: For ordinal data where the order of categories is important.</p><p><strong>Pros</strong>:</p><ul><li><p>Keeps the ordinal relationship.</p></li><li><p>Does not increase dimensionality.</p></li></ul><p><strong>Cons</strong>:</p><ul><li><p>Difficult to set up correctly.</p></li><li><p>Subjectivity: Determining the correct order of categories can be subjective and/or data-dependent.</p></li><li><p>Misinterpretation: Incorrect ordering may lead the model to learn inaccurate patterns from the data.</p></li></ul><p></p><h4><strong>4. Frequency or Count Encoding</strong></h4><p>Categories are replaced with their frequency or count in the data set.</p><p><strong>When to Use</strong>: When there are too many categories and one-hot encoding increases dimensionality too much.</p><p><strong>Pros</strong>:</p><ul><li><p>Effective for high cardinality features.</p></li><li><p>Does not increase dimensionality.</p></li></ul><p><strong>Cons</strong>:</p><ul><li><p>Loses information about the categories.</p></li><li><p>Different categories might end up having the same frequency, causing a collision.</p></li></ul><p></p><h4><strong>5. Target Encoding</strong></h4><p>Categories are replaced with the mean of the target variable for that category.</p><p><strong>When to Use</strong>: When the category has some correlation with the target variable. Be cautious of data leakage.</p><p><strong>Pros</strong>:</p><ul><li><p>Can capture information within the category that can aid in prediction.</p></li><li><p>Useful for high cardinality features.</p></li></ul><p><strong>Cons</strong>:</p><ul><li><p>Prone to data leakage: If not done correctly, target encoding can result in data leakage that can inflate the performance metrics.</p></li><li><p>Overfitting: This encoding is sensitive to outliers and can result in overfitting if the number of categories is small.</p></li></ul><p></p><p>&#9888;&#65039; Each encoding technique has its own advantages and disadvantages, and the choice of which to use often depends on the specific problem and the type of data you're working with. Always remember to <strong>test out different approaches and validate</strong> their effectiveness using cross-validation or a separate validation set.</p><p></p><p><strong>&#128073; Check the next section to put what you've learned into practice!</strong></p><blockquote><p>This could perfectly be a Data Science interview question. You can check additional questions on the <a href="https://mlpills.dev/interview-questions/">website</a>!</p></blockquote><p><em>If you like it, subscribe for free to support us:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mlpills.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://mlpills.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>&#128736;&#65039; Do It Yourself!</h2><p>Now that you know the theory about <strong>feature scaling</strong> and <strong>when you should use</strong> each of the techniques, it&#8217;s time to apply it!</p><blockquote><p><strong>How does it work?</strong></p><ul><li><p>&#128220;I will share a notebook with some guided initial steps.</p></li><li><p>&#128204;I will ask you some tasks that you should complete.</p></li><li><p>&#127919;I will share the outcome so you can check if you did well or not!</p></li></ul></blockquote><p>Now it&#8217;s your turn. <strong>Let&#8217;s play!</strong></p><p>I want you to apply each of the five techniques I previously introduced:</p><ol><li><p><strong>&#127922;Label Encoding</strong> - Difficulty &#11088;</p></li><li><p><strong>&#129518;One-Hot Encoding</strong> - Difficulty &#11088;&#11088;</p></li><li><p><strong>&#128290;Ordinal Encoding</strong> - Difficulty &#11088;</p></li><li><p><strong>&#9201;Frequency Encoding</strong> - Difficulty &#11088;&#11088;</p></li><li><p><strong>&#9878;&#65039;Target Encoding</strong> - Difficulty &#11088;&#11088;</p></li></ol><p>Here you can find a Kaggle notebook with everything you need:</p>
      <p>
          <a href="https://mlpills.substack.com/p/diy-4-encode-your-data">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DIY #3 - Adjust the range and distribution of your data]]></title><description><![CDATA[Welcome to the third issue of DIY (Do It Yourself). In this section, every week, a key concept in Data Science will be introduced to you. After that, you will be able to practice what you learned! We continue this section with data or feature scaling. I hope you enjoy it!]]></description><link>https://mlpills.substack.com/p/diy-3-adjust-the-range-and-distribution</link><guid isPermaLink="false">https://mlpills.substack.com/p/diy-3-adjust-the-range-and-distribution</guid><dc:creator><![CDATA[David Andrés]]></dc:creator><pubDate>Thu, 31 Aug 2023 11:00:22 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/83a26057-c934-407a-b3dd-eea24feacfda_1920x2590.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the third issue of DIY (Do It Yourself). In this section, every week, a key concept in Data Science will be introduced to you. After that, you will be able to practice what you learned! We continue this section with data or feature scaling. I hope you enjoy it!</p><h2>&#128138; Pill of the week</h2><p>In this issue, we will deal with <strong>feature scaling</strong>. Are you an advanced user and you don&#8217;t care about the theory? Go straight to the <a href="https://www.kaggle.com/davidandressanchez/diy003-feature-scaling">Kaggle notebook</a> that I have prepared:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.kaggle.com/davidandressanchez/diy003-feature-scaling&quot;,&quot;text&quot;:&quot;Go to the Kaggle notebook!&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.kaggle.com/davidandressanchez/diy003-feature-scaling"><span>Go to the Kaggle notebook!</span></a></p><p>Otherwise, stay with me, I will introduce you to the concept.</p><h4>What is data or feature scaling?</h4><p>Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training. One essential aspect of data preprocessing is <strong>feature scaling, which involves adjusting the range and distribution of the data</strong>. Feature scaling techniques, such as normalization and standardization, ensure that <strong>each attribute contributes equally to the algorithm's performance</strong>, thereby enhancing the predictive model's accuracy and efficiency. Both normalization and standardization aim to re-scale features, but they do so in different ways and are suitable for different types of data and machine learning algorithms.</p><h4>Normalization</h4><p>Normalization is a technique used to <strong>scale the features (or variables) of a dataset to a similar range</strong>. The goal is to transform each feature to lie in a certain interval, usually [0,1], making it easier for algorithms to interpret these features. </p><p>Normalization ensures that each feature contributes equally to the computation of distances or gradients. This is why it is especially important for distance-based algorithms like k-nearest neighbours (KNN), or when your features have different units or vastly different scales. </p><p></p><h4><strong>Standardization</strong></h4><p>Standardization, unlike normalization, <strong>transforms the features in a way that the resulting distribution has a mean of 0 and a standard deviation of 1</strong>. It doesn't bound values to a specific range. </p><p>This is especially useful for algorithms that assume the input variables to have a Gaussian distribution.</p><p></p><h4>Introduction to the Different Techniques</h4><ul><li><p><strong>Min-Max Scaling</strong>: </p><ul><li><p>This is the simplest form of normalization. Each feature is scaled linearly between the minimum and maximum value to a range [0,1].</p></li><li><p>Use it <em>when the distribution of the feature is not Gaussian and you need values in a bounded interval</em>. However, this method is sensitive to outliers.</p></li><li><p>Mathematically, the simplest form of normalization for a feature <em>x</em> is calculated as:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s4j8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s4j8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 424w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 848w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 1272w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s4j8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png" width="327" height="67.62773722627738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:85,&quot;width&quot;:411,&quot;resizeWidth&quot;:327,&quot;bytes&quot;:9391,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!s4j8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 424w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 848w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 1272w, https://substackcdn.com/image/fetch/$s_!s4j8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4498ca7a-c39e-4dab-9ecf-c095803bb71d_411x85.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul></li><li><p><strong>Z-score Normalization (Standardization)</strong>: </p><ul><li><p>Here, features are scaled so that they have the properties of a standard normal distribution with &#956;=0 and &#963;=1.</p></li><li><p>Use it <em>when the algorithm assumes that the distribution of your features is Gaussian</em>. This method is also useful as a general technique when you <em>don't know the distribution</em> of your feature and you're not particularly concerned about robustness to outliers.</p></li><li><p>Mathematically, standardization for a feature <em>x</em> is calculated as follows. Where &#956; is the mean and &#963; is the standard deviation of the feature.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Wke!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Wke!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 424w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 848w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 1272w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Wke!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png" width="252" height="60" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:294,&quot;resizeWidth&quot;:252,&quot;bytes&quot;:6074,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!5Wke!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 424w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 848w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 1272w, https://substackcdn.com/image/fetch/$s_!5Wke!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7ff572-8228-4c9b-9bf4-d326e594d625_294x70.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul></li><li><p><strong>Decimal Scaling</strong>: </p><ul><li><p>In this method, the data is scaled by moving the decimal point of values of each feature.</p></li><li><p>Useful <em>when the range of the dataset is unknown or could vary widely</em>, but this method is rarely used in practice.</p></li><li><p>To scale the feature <em>x</em>, you find <em>d</em>, which is the smallest integer such that max(|<em>x</em>|) / 10&#7496; &lt; 1, and then scale <em>x</em> by 10&#8315;&#7496;.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aSUj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aSUj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 424w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 848w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 1272w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aSUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png" width="318" height="53.26903553299493" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f1905ad-9212-4286-b231-fee873368137_394x66.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:66,&quot;width&quot;:394,&quot;resizeWidth&quot;:318,&quot;bytes&quot;:6473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aSUj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 424w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 848w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 1272w, https://substackcdn.com/image/fetch/$s_!aSUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f1905ad-9212-4286-b231-fee873368137_394x66.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul></li><li><p><strong>Robust Scaling</strong>: </p><ul><li><p>In this method, the median and the interquartile range are used for scaling, making it robust to outliers.</p></li><li><p>Use it<em> when the data contains many outliers</em> and you want to be robust against them.</p></li><li><p>It is calculated by subtracting the median and dividing by the interquartile range of the feature (IQR):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-a99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-a99!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 424w, https://substackcdn.com/image/fetch/$s_!-a99!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 848w, https://substackcdn.com/image/fetch/$s_!-a99!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 1272w, https://substackcdn.com/image/fetch/$s_!-a99!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-a99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png" width="334" height="64.16748768472907" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:78,&quot;width&quot;:406,&quot;resizeWidth&quot;:334,&quot;bytes&quot;:10166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-a99!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 424w, https://substackcdn.com/image/fetch/$s_!-a99!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 848w, https://substackcdn.com/image/fetch/$s_!-a99!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 1272w, https://substackcdn.com/image/fetch/$s_!-a99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a44bd0f-0cf9-4e1a-9a37-883a6b1ff192_406x78.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul></li><li><p><strong>Log Transformation</strong>: </p><ul><li><p>In this method, the logarithm is applied to each of the values of the feature.</p></li><li><p>Use it <em>when the feature is highly skewed</em> and transforming it could make it easier to model.</p></li><li><p>The feature <em>x</em> is transformed using the natural logarithm:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tmAu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tmAu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 424w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 848w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 1272w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tmAu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png" width="305" height="37.381615598885794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:44,&quot;width&quot;:359,&quot;resizeWidth&quot;:305,&quot;bytes&quot;:6369,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tmAu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 424w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 848w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 1272w, https://substackcdn.com/image/fetch/$s_!tmAu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d4672ab-e161-4747-a491-679d1d2fa20e_359x44.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul></li></ul><h4></h4><p><strong>&#128073; Check the next section to put what you've learned into practice!</strong></p><blockquote><p>This could perfectly be a Data Science interview question. You can check additional questions on the <a href="https://mlpills.dev/interview-questions/">website</a>!</p></blockquote><p><em>If you like it, subscribe for free to support us:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mlpills.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://mlpills.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>&#128736;&#65039; Do It Yourself!</h2><p>Now that you know the theory about <strong>feature scaling</strong> and <strong>when you should use</strong> each of the techniques, it&#8217;s time to apply it!</p><blockquote><p><strong>How does it work?</strong></p><ul><li><p>&#128220;I will share a notebook with some guided initial steps.</p></li><li><p>&#128204;I will ask you some tasks that you should complete.</p></li><li><p>&#127919;I will share the outcome so you can check if you did well or not!</p></li></ul></blockquote><p>Now it&#8217;s your turn. <strong>Let&#8217;s play!</strong></p><p>I want you to apply each of the five techniques I previously introduced:</p><ol><li><p><strong>&#127777;Min-Max Scaling</strong> - Difficulty &#11088;</p></li><li><p><strong>&#128276;Z-score Normalization (Standardization)</strong> - Difficulty &#11088;</p></li><li><p><strong>&#9971;&#65039;Decimal Scaling</strong> - Difficulty &#11088;</p></li><li><p><strong>&#9968;&#65039;Robust Scaling</strong> - Difficulty &#11088;</p></li><li><p><strong>&#127906;Log Transformation</strong> - Difficulty &#11088;&#11088;</p></li></ol><p></p><p>Here you can find a <a href="https://www.kaggle.com/davidandressanchez/diy003-feature-scaling">Kaggle notebook</a> with everything you need!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.kaggle.com/davidandressanchez/diy003-feature-scaling&quot;,&quot;text&quot;:&quot;Go to the Kaggle notebook!&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.kaggle.com/davidandressanchez/diy003-feature-scaling"><span>Go to the Kaggle notebook!</span></a></p><p>Starting from this data:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9syO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9syO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 424w, https://substackcdn.com/image/fetch/$s_!9syO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 848w, https://substackcdn.com/image/fetch/$s_!9syO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 1272w, https://substackcdn.com/image/fetch/$s_!9syO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9syO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png" width="357" height="266.62025316455697" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:553,&quot;resizeWidth&quot;:357,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9syO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 424w, https://substackcdn.com/image/fetch/$s_!9syO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 848w, https://substackcdn.com/image/fetch/$s_!9syO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 1272w, https://substackcdn.com/image/fetch/$s_!9syO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff62510e0-e2b9-4e77-a0f2-d32cc1a5bdeb_553x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You should be able to get the following outcomes:</p><ul><li><p><strong>&#127777;Min-Max Scaling</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JGvY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JGvY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JGvY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png" width="361" height="274.57274401473296" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1abf648-d262-46a9-b30f-7170da446aed_543x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:543,&quot;resizeWidth&quot;:361,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JGvY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!JGvY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1abf648-d262-46a9-b30f-7170da446aed_543x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>&#128276;Z-score Normalization (Standardization)</strong> </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cEKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cEKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cEKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png" width="347" height="263.9244935543278" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:543,&quot;resizeWidth&quot;:347,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cEKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!cEKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709f47ef-fa6e-4eb2-8537-3deed873d7d2_543x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>&#9971;&#65039;Decimal Scaling</strong> </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uGof!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uGof!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 424w, https://substackcdn.com/image/fetch/$s_!uGof!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 848w, https://substackcdn.com/image/fetch/$s_!uGof!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 1272w, https://substackcdn.com/image/fetch/$s_!uGof!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uGof!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png" width="341" height="252.84201077199282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01be777f-74de-40be-8cda-27f3e87c2348_557x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:557,&quot;resizeWidth&quot;:341,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uGof!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 424w, https://substackcdn.com/image/fetch/$s_!uGof!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 848w, https://substackcdn.com/image/fetch/$s_!uGof!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 1272w, https://substackcdn.com/image/fetch/$s_!uGof!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01be777f-74de-40be-8cda-27f3e87c2348_557x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>&#9968;&#65039;Robust Scaling</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VFnN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VFnN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VFnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png" width="321" height="244.14917127071823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:543,&quot;resizeWidth&quot;:321,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VFnN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 424w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 848w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 1272w, https://substackcdn.com/image/fetch/$s_!VFnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b4e9dbc-e590-4cbf-8cba-e7c610668d77_543x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And starting from this other data source:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pD-i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pD-i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 424w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 848w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 1272w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pD-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png" width="443" height="340.34402852049914" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:431,&quot;width&quot;:561,&quot;resizeWidth&quot;:443,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pD-i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 424w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 848w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 1272w, https://substackcdn.com/image/fetch/$s_!pD-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f603440-6adb-4ac3-8829-60b5f9ec3a36_561x431.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You should be able to get the following outcome:</p><ul><li><p><strong>&#127906;Log Transformation</strong> </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XOup!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XOup!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 424w, https://substackcdn.com/image/fetch/$s_!XOup!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 848w, https://substackcdn.com/image/fetch/$s_!XOup!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 1272w, https://substackcdn.com/image/fetch/$s_!XOup!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XOup!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png" width="448" height="335.18840579710144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:413,&quot;width&quot;:552,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XOup!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 424w, https://substackcdn.com/image/fetch/$s_!XOup!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 848w, https://substackcdn.com/image/fetch/$s_!XOup!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 1272w, https://substackcdn.com/image/fetch/$s_!XOup!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d9a4f47-a07d-46cb-bde6-63514dcc1f11_552x413.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That would mean that your data has been successfully scaled! </p><p>You will find the <strong>answer</strong> <strong>next week</strong>!</p><div class="pullquote"><p><em>Submit your response to <strong>david@mlpills.dev</strong> to enter the <strong>weekly ranking</strong>. The participant with the highest score at the end of each month will win an <strong>exciting prize as the top #1 performer</strong>!</em></p><p><em>I will reply to all emails you send me, even if it is to get some help!</em></p></div>]]></content:encoded></item></channel></rss>