<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DSAIEngineering]]></title><description><![CDATA[Technical posts aimed at data and AI practitioners. Currently relevant to junior- and mid-level professionals.]]></description><link>https://newsletter.dsaiengineering.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Jybu!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ead0d22-4d0f-4e14-86be-535e3a477b57_500x500.png</url><title>DSAIEngineering</title><link>https://newsletter.dsaiengineering.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 29 Apr 2026 07:31:06 GMT</lastBuildDate><atom:link href="https://newsletter.dsaiengineering.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mohit Saharan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dsaiengineering@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dsaiengineering@substack.com]]></itunes:email><itunes:name><![CDATA[Mohit Saharan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Mohit Saharan]]></itunes:author><googleplay:owner><![CDATA[dsaiengineering@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dsaiengineering@substack.com]]></googleplay:email><googleplay:author><![CDATA[Mohit Saharan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[[P12] Understanding tabular foundation models: time series forecasting with TabPFN]]></title><description><![CDATA[This post continues my series on tabular foundation models.]]></description><link>https://newsletter.dsaiengineering.com/p/p12-understanding-tabular-foundation</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p12-understanding-tabular-foundation</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Tue, 28 Apr 2026 13:20:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BiQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Introduction</h2><p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, and TabPFN&#8217;s predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>.</p><p>For a new reader, the minimum background is this: TabPFN is a pretrained tabular foundation model. Unlike XGBoost or Random Forest, its ordinary <code>.fit()</code> call does not update model weights to learn a new model from scratch for the current dataset. Instead, <code>.fit()</code> prepares the labelled rows as context, and TabPFN uses that context to predict new rows. This is why I have repeatedly described TabPFN as a context-conditioned predictor rather than just another sklearn-like estimator.</p><p>Today I cover the time series forecasting section of the official <a href="https://colab.research.google.com/github/PriorLabs/TabPFN/blob/main/examples/notebooks/TabPFN_Demo_Local.ipynb">TabPFN hands-on demo notebook</a>. You can find my local version of the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/b1aa38c94d824bef493a6dbaa2eaeb38c04ed4ef/blog/20260428-understanding-tfm-time-series-forecasting-tabpfn.assets/tabpfn-hands-on-demo-msaharan-20260428.ipynb">here</a>.</p><p>In P3, I left time series forecasting as &#8220;Later.&#8221; This post fills that gap. The interesting question is: if TabPFN is a tabular foundation model, how can it forecast a sequence? At first, this is not obvious because ordinary tabular models usually treat rows as examples in a table, while time series forecasting depends on the order of observations, seasonality, and future horizons. The answer is not that TabPFN suddenly becomes an ARIMA model, a recurrent neural network, or a native time-series transformer. The main idea in TabPFN-TS is to convert forecasting into a tabular regression problem.</p><p>This is where TabPFN-TS comes in. The notebook cites the work of Hoo et al., whose current arXiv paper is listed as <a href="https://arxiv.org/abs/2501.02945">From Tables to Time: Extending TabPFN-v2 to Time Series Forecasting</a>. That work is the reference behind the TabPFN-TS workflow used in the notebook. The <a href="https://github.com/PriorLabs/tabpfn-time-series">TabPFN-TS repository</a> summarizes the workflow as:</p><ol><li><p>Transform a time series into a table.</p></li><li><p>Extract temporal features and add them to the table.</p></li><li><p>Perform regression on the table using TabPFNv2.</p></li><li><p>Use the regression output as the time series forecast.</p></li></ol><p>The post is organized as follows:</p><ol><li><p>Introduction: why time series forecasting belongs in this TabPFN series.</p></li><li><p>Conceptual background: the forecasting vocabulary and the tabular-regression formulation.</p></li><li><p>Hands-on demo: loading the Chronos data, adding features, predicting, and reading the forecast plot.</p></li><li><p>Summary and conclusion: what this example shows and how to evaluate forecasts in practice.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ol><h2>2. Conceptual Background</h2><p>Before going to the hands-on demo, I want to set up the concepts that make the example meaningful. This section does five things:</p><ol><li><p>It defines the basic vocabulary of time series forecasting.</p></li><li><p>It explains how a sequence can be represented as a supervised tabular regression problem.</p></li><li><p>It separates what is standard supervised ML from what is specific to TabPFN-TS.</p></li><li><p>It explains why temporal features matter.</p></li><li><p>It connects point forecasts and quantile forecasts back to the predictive-distribution language from earlier posts.</p></li></ol><h3>2.1 Working Vocabulary</h3><p>The key terms for this post are:</p><ul><li><p>Time series: observations indexed by time, such as monthly tourism demand, hourly electricity load, daily sales, or sensor readings.</p></li><li><p>Forecast horizon: the future window we want to predict. In the notebook, <code>prediction_length = 24</code>, so the model predicts 24 future monthly values.</p></li><li><p>History/context window: the observed part of the time series that is available before the forecast starts.</p></li><li><p>Point forecast: a single predicted value for each future timestamp.</p></li><li><p>Probabilistic forecast: a forecast that describes uncertainty, often through quantiles.</p></li><li><p>Quantile forecast: a prediction for a chosen quantile level, such as the 0.1 or 0.9 quantile.</p></li><li><p>Covariates/features: extra columns known at prediction time, such as calendar features, holidays, weather, promotions, or a running time index.</p></li><li><p>Zero-shot forecasting: applying a pretrained model to a new forecasting problem without training a task-specific forecasting model from scratch.</p></li></ul><p>In ordinary supervised regression, we usually have a table: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;X =\n\\begin{bmatrix}\nx_1^\\top \\\\\nx_2^\\top \\\\\n\\vdots \\\\\nx_n^\\top\n\\end{bmatrix},\n\\quad\ny =\n\\begin{bmatrix}\ny_1 \\\\\ny_2 \\\\\n\\vdots \\\\\ny_n\n\\end{bmatrix}.&quot;,&quot;id&quot;:&quot;SRDRTWDSUD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each row \(x_i\) contains features, and \(y_i\) is the target value. A regressor learns or uses a mapping from rows to targets.</p><p>Here, \(X\) is the feature matrix, \(y\) is the target vector, \(n\) is the number of rows, and \(x_i^\top\) means that the feature vector for row \(i\) is written as a row vector. The superscript \(\top\) denotes transpose.</p><p>In time series forecasting, the data initially looks different. For one item, we observe: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_1, y_2, \\ldots, y_T,&quot;,&quot;id&quot;:&quot;ZVAOKUFAFZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and want to predict: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{T+1}, y_{T+2}, \\ldots, y_{T+H}&quot;,&quot;id&quot;:&quot;NYMWRGRVBJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(T\) is the last observed time index, and \(H\) is the forecast horizon.</p><p>The key move in TabPFN-TS is to make the second problem look like the first problem.</p><h3>2.2 Forecasting as Tabular Regression</h3><p>Suppose we have multiple time series indexed by item \(i\). For item \(i\), let \(y_{i,t}\) be the observed value at time \(t\), and let \(T_i\) be the last observed time index available before forecasting starts. The forecasting task is to estimate future values: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{i,T_i+h},\n\\quad h = 1, 2, \\ldots, H.&quot;,&quot;id&quot;:&quot;THKRNZZEUU&quot;}" data-component-name="LatexBlockToDOM"></div><p>In the single-series notation above, the last observed index was \(T\). With multiple series, I write this as \(T_i\) because each item \(i\) may have its own last observed timestamp. Here, \(H\) is the forecast horizon, and \(h\) is the number of steps ahead from the end of the observed history for item \(i\). To use a tabular model, we build a feature vector for each item-time pair: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x_{i,t} = g(i, t, \\text{calendar}(t), \\text{seasonal}(t), \\text{known covariates}_{i,t}).&quot;,&quot;id&quot;:&quot;CMDSEKCVSK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(g(\cdot)\) is the feature-construction function. It turns time information and any known covariates into ordinary tabular columns. The training table contains rows where the target is known: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}_\\text{train}\n\n= \\{(x_{i,t}, y_{i,t}) : i \\in \\mathcal{I},\\ t \\leq T_i\\}.&quot;,&quot;id&quot;:&quot;PTAXDPGMVD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{I}\) is the set of item IDs included in the forecasting task. The future table contains rows where the target is unknown: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}_\\text{future}\n\n= \\{x_{i,T_i+h} : i \\in \\mathcal{I},\\ h = 1, 2, \\ldots, H\\}.&quot;,&quot;id&quot;:&quot;MRXPFASEKJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Now the forecasting problem has become a tabular regression problem: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{y}_{i,T_i+h}\n\n= f(x_{i,T_i+h}; \\mathcal{D}_\\text{train}).\n\n&quot;,&quot;id&quot;:&quot;SGOXKUMMPH&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(f\) is a prediction function, and \(\hat{y}_{i,T_i+h}\) is the predicted value for item \(i\) at forecast step \(h\).</p><p>For a classical supervised model, \(f\) would usually be a model fitted specifically to the current training table. For example, a supervised regressor might choose: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n\n= \\arg\\min_{f \\in \\mathcal{F}}\n\n\\sum_{(x,y)\\in \\mathcal{D}_\\text{train}}\n\n\\ell(y, f(x)).\n\n&quot;,&quot;id&quot;:&quot;QJVEJFQQJL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{F}\) is the model class, \(\ell\) is the regression loss, and \(\hat{f}\) is the fitted task-specific model. Random Forest, XGBoost, LightGBM, and CatBoost all differ in how they define and optimize \(\mathcal{F}\), but in this workflow they are still learning a fresh model from the current transformed table.</p><p>For TabPFN, the meaning is different. TabPFN is already pretrained, and the current training rows become context. Conceptually, the prediction is closer to: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y_{i,T_i+h} | x_{i,T_i+h}, \\mathcal{D}_\\text{train}).&quot;,&quot;id&quot;:&quot;YUCJMEZYWM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(p(\cdot)\) denotes a predictive distribution over the future target value.</p><p>This is the same posterior-predictive-distribution viewpoint I discussed in P4 and reused in P11. The difference is that the row \(x_{i,T_i+h}\) now represents a future timestamp, not a generic tabular row.</p><p>This framing also explains why the time-series package can support point forecasts and probabilistic forecasts. If TabPFN produces a predictive distribution for the target at a future row, then the output can be summarized as a mean, median, or quantiles. This is a useful conceptual lens, not a guarantee that the output is perfectly calibrated for every dataset.</p><h3>2.3 Standard Supervised ML vs What Is New Here</h3><p>The conversion from a time series to a tabular regression problem is not unique to TabPFN. A practitioner could build the same kind of table and fit XGBoost, LightGBM, Random Forest, CatBoost, or a linear model on the generated rows. In that sense, the feature-engineering idea is a standard supervised-ML move.</p><p>What is new in the TabPFN-TS workflow is the model used after the transformation. Instead of training and tuning a new forecasting model from scratch, TabPFN-TS uses a pretrained tabular foundation model as the regression engine. The training rows act as context, the future rows act as queries, and the model returns point and quantile predictions through the time-series wrapper.</p><p>So the split is:</p><ul><li><p>Standard supervised ML part: turn timestamps into rows, create temporal features, define known-target training rows and unknown-target future rows.</p></li><li><p>TabPFN-specific part: use a pretrained, context-conditioned tabular model instead of fitting a task-specific model from scratch.</p></li><li><p>TabPFN-TS convenience: return both point forecasts and quantile forecasts through one forecasting interface.</p></li></ul><p>There is an important caveat. TabPFN-TS relies on temporal featurization; TabPFN is not modeling sequence order natively in the same way as a dedicated sequence model. The sequence structure becomes available to the model through columns such as running index, calendar features, seasonal features, and known covariates.</p><h3>2.4 Why Temporal Features Matter</h3><p>If we only create rows without useful time-derived features, a tabular model has no direct way to know that January 1980 and January 1981 are related, or that December and January are adjacent months. This is why temporal feature engineering is central to the TabPFN-TS workflow.</p><p>The notebook uses three feature groups: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">selected_features = [
    RunningIndexFeature(),
    CalendarFeature(),
    AutoSeasonalFeature(),
]</code></pre></div><p>The running index gives each timestamp an ordered numeric position within each item. If item \(i\) has \(n_i\) observed rows, the running index over the observed history is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;0, 1, 2, \\ldots, n_i - 1.&quot;,&quot;id&quot;:&quot;UIBMAQRVFQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This \(n_i\) counts rows in the observed history; it is separate from \(T_i\), which denotes the last observed time index used in the forecasting equations. The running index helps the model see trend-like behavior. Calendar features encode timestamp information such as year, month, day of week, and similar components. Seasonal features encode repeated patterns. A standard way to encode cyclic seasonality is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sin \\left(\\frac{2\\pi t}{P}\\right),\n\\quad\n\\cos \\left(\\frac{2\\pi t}{P}\\right),&quot;,&quot;id&quot;:&quot;UVPKZZEWEC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(P\) is the period. For monthly data with annual seasonality, \(P=12\). Using both sine and cosine is useful because it represents the cycle on a circle. This avoids treating the end of a period and the beginning of the next period as far apart.</p><p>In the notebook output, the transformed table contains columns such as: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">target, running_index, year, second_of_minute_sin, second_of_minute_cos, ...,
sin_#0, cos_#0, sin_#1, cos_#1, sin_#2, cos_#2</code></pre></div><p>The target column is known in the training rows and missing in the future rows. The time-derived features are known for both training and future rows. That is exactly what forecasting needs: at prediction time, we do not know the future target, but we do know the future timestamps.</p><h3>2.5 Point Forecasts, Quantiles, and Coverage</h3><p>For item \(i\) and forecast step \(h\), the future row is \(x_{i,T_i+h}\), and the random future target is \(Y_{i,T_i+h}\). A point forecast gives one value: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{y}_{i,T_i+h}.&quot;,&quot;id&quot;:&quot;XLSSBDOEIK&quot;}" data-component-name="LatexBlockToDOM"></div><p>A probabilistic forecast gives more information. The conditional cumulative distribution function is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;F_{i,h}(y)\n\n= \\mathbb{P}(Y_{i,T_i+h} \\leq y | x_{i,T_i+h}, \\mathcal{D}_\\text{train}).&quot;,&quot;id&quot;:&quot;MYMDPMTCWE&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathbb{P}\) denotes probability, and \(F_{i,h}(y)\) is the probability that the future target is less than or equal to the candidate value \(y\), given the future row and the training context.</p><p>The \(\alpha\)-quantile is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q_\\alpha(i,h)\n\n= \\inf\\{y : F_{i,h}(y) \\geq \\alpha\\}.&quot;,&quot;id&quot;:&quot;FIKRAFQOQQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\alpha\) is a quantile level between 0 and 1, and \(\inf\) means the infimum: the smallest value, or limiting lower bound, where the cumulative probability reaches at least \(\alpha\).</p><p>For example, the interval: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;[Q_{0.1}(i,h), Q_{0.9}(i,h)]&quot;,&quot;id&quot;:&quot;QPERBLFFMH&quot;}" data-component-name="LatexBlockToDOM"></div><p>is an 80% central prediction interval. In the demo, TabPFN-TS returns the point forecast and quantile columns from <code>0.1</code> to <code>0.9</code>. </p><p>As in yesterday&#8217;s post, quantile intervals should not be treated as automatic guarantees. They need to be checked on held-out data. Let the held-out future points be indexed by \((i_j,h_j)\) for \(j=1,\ldots,m\), where \(m\) is the number of held-out item-horizon pairs being evaluated. The empirical 80% coverage is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{1}{m}\\sum_{j=1}^{m}\n\n\\mathbf{1}\\{y_{i_j,T_{i_j}+h_j} \\in [Q_{0.1}(i_j,h_j), Q_{0.9}(i_j,h_j)]\\}.&quot;,&quot;id&quot;:&quot;YFBXICLHCK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathbf{1}\{\cdot\}\) is the indicator function: it equals 1 when the condition is true and 0 otherwise.</p><p>For context, when quantile models are trained directly, a common loss is the pinball loss. For quantile level \(\alpha\), true value \(y\), and quantile prediction \(q\), it is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_\\alpha(y,q)\n\n= (\\alpha - \\mathbf{1}\\{y < q\\})(y-q).&quot;,&quot;id&quot;:&quot;OYVHMXJXNY&quot;}" data-component-name="LatexBlockToDOM"></div><p>This loss penalizes under-prediction and over-prediction asymmetrically, which is exactly what is needed for quantile estimation.</p><p>The coverage calculation answers a different question from the pinball loss. If the empirical coverage value is close to 0.8, the interval is roughly calibrated on that held-out sample. If it is much lower, the forecast intervals are overconfident. If it is much higher, the intervals may be too wide to be useful.</p><h2>3. Hands-on Demo</h2><p>The conceptual background gave us the main objects: a time-indexed sequence, a transformed tabular representation, a future table with unknown targets, and point/quantile forecasts. Now I use the notebook to walk through the time-series example.</p><p>The mental model for the demo is:</p><ul><li><p>Training rows: past timestamps with known target values.</p></li><li><p>Future rows: future timestamps with <code>target = NaN</code>.</p></li><li><p>Features: running index, calendar features, seasonal features, and any known covariate columns.</p></li><li><p>Output: point forecast plus quantile columns for each future row.</p></li></ul><p>The full notebook contains the setup code and imports. Below, I show the parts that matter for understanding the workflow.</p><h3>3.1 Loading the Time Series Data</h3><p>The demo uses a dataset from the <a href="https://huggingface.co/datasets/autogluon/chronos_datasets">Chronos datasets collection</a> on Hugging Face. To keep the example small, it uses only two time series from <code>monash_tourism_monthly</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">dataset_metadata = {
    "monash_tourism_monthly": {"prediction_length": 24},
    "m4_hourly": {"prediction_length": 48},
}

dataset_choice = "monash_tourism_monthly"
num_time_series_subset = 2</code></pre></div><p>The notebook then loads the dataset, converts it into a <code>TimeSeriesDataFrame</code>, keeps only two item IDs, and creates a train/test split. The last 24 months are held out as the future window.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from datasets import load_dataset
from tabpfn_time_series import TimeSeriesDataFrame
from tabpfn_time_series.data_preparation import generate_test_X, to_gluonts_univariate

prediction_length = dataset_metadata[dataset_choice]["prediction_length"]
dataset = load_dataset("autogluon/chronos_datasets", dataset_choice)

tsdf = TimeSeriesDataFrame(to_gluonts_univariate(dataset["train"]))
tsdf = tsdf[
    tsdf.index.get_level_values("item_id").isin(tsdf.item_ids[:num_time_series_subset])
]

train_tsdf, test_tsdf_ground_truth = tsdf.train_test_split(
    prediction_length=prediction_length
)
test_tsdf = generate_test_X(train_tsdf, prediction_length)</code></pre></div><p>The first important object is <code>train_tsdf</code>: the observed history. The second is <code>test_tsdf_ground_truth</code>: the future values that we hide from the model but keep for evaluation. The third is <code>test_tsdf</code>: the future table that contains the timestamps where predictions are needed. The function <code>generate_test_X</code> creates those future timestamp rows for the forecast horizon, with unknown targets.</p><p>The following plot shows the two tourism series and the train/test split.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-LSs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-LSs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png" width="990" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110516,&quot;alt&quot;:&quot;Time series train/test split.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195744234?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time series train/test split." title="Time series train/test split." srcset="https://substackcdn.com/image/fetch/$s_!-LSs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Time series train/test split.</figcaption></figure></div><p>Both series show strong yearly seasonality. The vertical dashed red line marks the point where the training history ends and the held-out future window starts. Since the forecast horizon is 24 months, the model is asked to forecast two full seasonal cycles.</p><h3>3.2 Adding Time Features</h3><p>The next step is the most important conceptual step in the demo. The raw time series is transformed into a tabular regression problem by adding features.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series import FeatureTransformer
from tabpfn_time_series.features import (
    AutoSeasonalFeature,
    CalendarFeature,
    RunningIndexFeature,
)

selected_features = [
    RunningIndexFeature(),
    CalendarFeature(),
    AutoSeasonalFeature(),
]

feature_transformer = FeatureTransformer(selected_features)

train_tsdf, test_tsdf = feature_transformer.transform(train_tsdf, test_tsdf)</code></pre></div><p>After this transformation, the training table has a known <code>target</code> column and many feature columns. The future table has the same feature columns, but the <code>target</code> column is missing:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">item_id  timestamp    target     running_index    year    ...    sin_#0    cos_#0
0        1979-01-31   1149.8700  0                1979    ...    0.0000    1.0000
0        1979-02-28   1053.8002  1                1979    ...    0.5000    0.8660
...
0        1992-08-31   NaN        163              1992    ...   -0.5000   -0.8660</code></pre></div><p>This is the point where forecasting becomes tabular. The rows with known targets form the context. The rows with unknown targets form the query set.</p><h3>3.3 Predicting with TabPFN-TS</h3><p>In my run, I used <code>local</code> mode, which runs TabPFN on my local GPU, instead of <code>client</code> mode, which uses GPUs hosted in Prior Labs&#8217; cloud:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series import TabPFNMode, TabPFNTimeSeriesPredictor

predictor = TabPFNTimeSeriesPredictor(
    tabpfn_mode=TabPFNMode.LOCAL,
)

pred = predictor.predict(train_tsdf, test_tsdf)</code></pre></div><p>The output <code>pred</code> is again indexed by <code>item_id</code> and <code>timestamp</code>. It contains a point forecast in the <code>target</code> column and quantile forecasts in columns such as <code>0.1</code>, <code>0.2</code>, ..., <code>0.9</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">                         target          0.1          0.2  ...          0.8          0.9
item_id timestamp
0       1992-08-31  6632.519531  6147.241211  6321.268066  ...  6938.606445  7118.754395
        1992-09-30  4159.460938  3881.989502  3977.088379  ...  4355.097656  4479.076172
        1992-10-31  3012.987549  2780.682861  2859.992432  ...  3172.838623  3264.242920</code></pre></div><p>This output format is useful because it gives both a central forecast and uncertainty bands without first setting up a separate conformal wrapper or separately trained quantile model.</p><h3>3.4 Visualizing the Forecast</h3><p>The notebook visualizes the history, the held-out future values, the TabPFN-TS point forecast, and the 0.1 to 0.9 quantile band.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series.plot import plot_pred_and_actual_ts

plot_pred_and_actual_ts(
    train=train_tsdf,
    test=test_tsdf_ground_truth,
    pred=pred,
)</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BiQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BiQs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" width="990" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:142924,&quot;alt&quot;:&quot;Time series forecast.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195744234?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time series forecast." title="Time series forecast." srcset="https://substackcdn.com/image/fetch/$s_!BiQs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Time series forecast.</figcaption></figure></div><p>The blue curve is the observed history. The purple curve is the held-out future, which is available only because this is a demo. The red curve is the TabPFN-TS forecast. The shaded red region is the 0.1 to 0.9 quantile interval.</p><p>The forecast captures the most obvious structure in both series: strong annual seasonality, sharp yearly peaks, and a recurring drop after the peak. This is exactly where the feature transformation matters. The model is not seeing a raw sequence alone; it is seeing a tabular representation that exposes time position and seasonal phase.</p><p>The forecast is not perfect. For example, the sharpness and height of some future peaks are difficult to match exactly. That is expected because monthly tourism demand is not deterministic. The useful question is not whether every point lands exactly on the future curve. The useful question is whether the model captures the seasonal structure, gives sensible point forecasts, and expresses uncertainty that is reasonable for the held-out window.</p><h3>3.5 Evaluating Forecasts in Practice</h3><p>The demo is useful as a first look, but a real forecasting workflow would need numerical evaluation. At minimum, a practitioner should compute point forecast errors and quantile coverage on the held-out window.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import numpy as np

eval_df = test_tsdf_ground_truth[["target"]].rename(
    columns={"target": "actual"}
).join(
    pred.rename(columns={"target": "forecast"})
)

q10_col = 0.1 if 0.1 in eval_df.columns else "0.1"
q90_col = 0.9 if 0.9 in eval_df.columns else "0.9"

mae = (eval_df["actual"] - eval_df["forecast"]).abs().mean()
rmse = np.sqrt(((eval_df["actual"] - eval_df["forecast"]) ** 2).mean())
coverage_80 = (
    (eval_df["actual"] &gt;= eval_df[q10_col])
    &amp; (eval_df["actual"] &lt;= eval_df[q90_col])
).mean()

print(f"MAE: {mae:.3f}")
print(f"RMSE: {rmse:.3f}")
print(f"80% interval coverage: {coverage_80:.3f}")</code></pre></div><p>For a more complete evaluation, it is also useful to break the errors down by item ID and forecast horizon. In forecasting, average error can hide important behavior. A model may be good for short horizons but weak for longer horizons, or good for one item but poor for another.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">eval_df = eval_df.reset_index()
eval_df["horizon"] = eval_df.groupby("item_id").cumcount() + 1
eval_df["absolute_error"] = (eval_df["actual"] - eval_df["forecast"]).abs()

display(
    eval_df.groupby("horizon")["absolute_error"]
    .mean()
    .rename("MAE by horizon")
)</code></pre></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>4. Summary and Conclusion</h2><p>In this post, I used the time series forecasting section of the official TabPFN hands-on demo to understand how TabPFN can be applied outside ordinary static tabular prediction.</p><p>The conceptual section made the key step explicit: TabPFN-TS frames univariate time series forecasting as tabular regression. This transformation is not unique to TabPFN; supervised ML models such as XGBoost, LightGBM, Random Forest, and CatBoost can also use time-derived tabular features. What changes with TabPFN-TS is the regression engine: a pretrained, context-conditioned tabular foundation model is used instead of fitting a new task-specific model from scratch.</p><p>Operationally, the observed history becomes rows with known targets. Future timestamps become rows with missing targets. Running-index, calendar, and seasonal features give the tabular model information about trend, time position, and cyclic structure.</p><p>The hands-on demo then showed this idea in code. We loaded two monthly tourism time series from the Chronos datasets collection, held out the last 24 months, added temporal features, predicted with <code>TabPFNTimeSeriesPredictor</code>, and visualized both point forecasts and 0.1 to 0.9 quantile intervals.</p><p>The main takeaway is that TabPFN is not being used as a native sequence model here. The bridge is feature engineering. Once time is represented as tabular features, TabPFNv2 can be used as a zero-shot tabular regressor for future timestamps.</p><p>This makes the workflow conceptually simple and practically interesting. It also creates clear evaluation questions: how accurate are the point forecasts, how well calibrated are the quantile intervals, and how does the model behave across different horizons, seasonalities, and item IDs?</p><p>With this post, I have covered another remaining section of the official TabPFN hands-on demo. In the upcoming posts, I will continue exploring the parts of the TabPFN ecosystem that can translate into useful workflows for real tabular and time-dependent data problems. As I continue this series, I welcome feedback and requests from readers: what did you find most useful in this post, and which aspects of tabular foundation models should I explore next?</p>]]></content:encoded></item><item><title><![CDATA[[P11] Understanding tabular foundation models: predictive behavior of TabPFN]]></title><description><![CDATA[This post continues my series on tabular foundation models.]]></description><link>https://newsletter.dsaiengineering.com/p/p11-understanding-tabular-foundation</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p11-understanding-tabular-foundation</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 27 Apr 2026 20:14:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_64m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, and TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>.</p><p>For a new reader, the minimum background is this: TabPFN is a pretrained tabular foundation model. Unlike XGBoost or Random Forest, its ordinary <code>.fit()</code> call does not update model weights to learn a fresh model from scratch. Instead, <code>.fit()</code> prepares the labelled rows as context for the current task, and TabPFN uses that context to predict new rows. For more theory, P3, P4, P5, and P6 are the better starting points; this post gives only the background needed for today&#8217;s topic.</p><p>Today I cover the predictive behavior section of the official TabPFN hands-on demo notebook. You can find my version of the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/a9d811ea7fba03d81870933cb1d229a1d7d29221/blog/20260427-understanding-tfm-predictive-behavior-tabpfn.assets/tabpfn-hands-on-demo-msaharan-20260427.ipynb">here</a> in my GitHub repository.</p><p>Model scores such as ROC AUC, RMSE, and \(R^2\) tell us how models perform on average. In this post, I use the TabPFN demo to ask a more diagnostic question: how do TabPFN, Random Forest, and XGBoost behave across the input space? We will inspect probability surfaces, regression curves, and quantile intervals to see which behaviors are standard supervised-ML diagnostics and which reflect TabPFN's pretrained, context-conditioned workflow.</p><p>The post is organized as follows:</p><ol><li><p>Conceptual background<strong>:</strong> the terms and equations needed to understand the examples.</p><ul><li><p>Working vocabulary.</p></li><li><p>Supervised learning vs TabPFN, mathematically.</p></li><li><p>Mathematical objects we will inspect.</p></li><li><p>Classical diagnostics vs TabPFN&#8217;s workflow.</p></li></ul></li><li><p>Hands-on demo<strong>:</strong> three examples from the notebook.</p><ul><li><p>Classification decision boundaries.</p></li><li><p>Regression curve fitting.</p></li><li><p>Regression uncertainty with quantiles.</p></li></ul></li><li><p>Summary and conclusion<strong>:</strong> the main takeaways and what comes next.</p><ul><li><p>What the conceptual background prepared us to inspect.</p></li><li><p>What the examples demonstrated.</p></li><li><p>What changes when the same diagnostics are applied to TabPFN.</p></li></ul></li></ol><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>1. Conceptual Background</h2><p>Before going to the hands-on demo, I want to set up the concepts that make the examples meaningful. This section does three things:</p><ol><li><p>It defines the vocabulary used in the post.</p></li><li><p>It connects TabPFN&#8217;s prediction step to the posterior predictive distribution from P4.</p></li><li><p>It defines the mathematical objects we will inspect in the demo: classification probability surfaces, regression mean functions, quantiles, and interval coverage.</p></li></ol><h3>1.1 Working Vocabulary</h3><p>The key terms for this post are:</p><ul><li><p>Tabular foundation model<strong>:</strong> a pretrained model designed to work across many tabular prediction tasks.</p></li><li><p>In-context learning<strong>:</strong> the model uses labelled rows as context for the current task instead of updating pretrained weights in the usual task-specific training loop.</p></li><li><p>Predictive behavior<strong>:</strong> the shape and reliability of the model&#8217;s predictions, not just the final score.</p></li><li><p>Decision boundary<strong>:</strong> in classification, the region where the model switches from one class to another.</p></li><li><p>Quantiles<strong>:</strong> values below which specified fractions of a predictive distribution fall, useful for uncertainty.</p></li></ul><p>The familiar supervised ML workflow is:</p><pre><code> model = XGBClassifier()
 model.fit(X_train, y_train)
 preds = model.predict(X_test)</code></pre><p>Here, <code>.fit()</code> learns task-specific trees from the dataset.</p><p>TabPFN uses a similar interface:</p><pre><code> model = TabPFNClassifier()
 model.fit(X_train, y_train)
 preds = model.predict(X_test)</code></pre><p>But the meaning is different. TabPFN is already pretrained. In the standard prediction workflow, <code>.fit()</code> validates the data and prepares preprocessing, caching, and task context. Conceptually, the training rows and labels become context for the current task: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(X_\\text{train}, y_\\text{train}).&quot;,&quot;id&quot;:&quot;PHTTPVEWTA&quot;}" data-component-name="LatexBlockToDOM"></div><p>The test rows are queries: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;X_\\text{test}.&quot;,&quot;id&quot;:&quot;ONWGQIMFOG&quot;}" data-component-name="LatexBlockToDOM"></div><p>For one query row \(x_\text{new}\), TabPFN conceptually predicts the posterior predictive distribution I discussed in P4: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y|x_\\text{new}, X_\\text{train}, y_\\text{train}).&quot;,&quot;id&quot;:&quot;TMQQFWRYAN&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(p(\cdot)\) means a predictive probability distribution: class probabilities for classification, and a distribution over possible target values for regression.</p><p>For this post, the important point is practical. Classical supervised models can also provide more than hard class labels: classifiers such as Random Forest, XGBoost, and CatBoost can return class probabilities, and specialized methods can return uncertainty estimates or quantiles. So the diagnostic workflow itself is not unique to TabPFN. What is different here is that TabPFN produces its predictions by conditioning a pretrained tabular model on the current dataset, and TabPFN regression can expose distributional summaries such as quantiles directly through the prediction API.</p><h3>1.2 Supervised learning vs TabPFN, mathematically</h3><p>The code later in the post uses familiar sklearn-style calls such as <code>.fit()</code> and <code>.predict()</code>. To avoid treating TabPFN as just another tree model, this subsection makes the difference explicit. First, I describe the usual supervised-learning abstraction. Then I connect TabPFN back to the posterior predictive distribution from P4.</p><p>Let the labelled dataset be:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;D = \\{(x_i, y_i)\\}_{i=1}^n.&quot;,&quot;id&quot;:&quot;RHPGOIWBQU&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(D\) is the current training dataset, \(x_i\) is the feature vector for row \(i\), \(y_i\) is its target value or class label, and \(n\) is the number of labelled rows. I use lowercase \(x\) and \(y\) for concrete feature and target values. Later, uppercase \(X\) and \(Y\) refer to random variables.</p><p>As a useful abstraction, ordinary supervised learning usually chooses a function class \(\mathcal{F}\) and fits a task-specific model by minimizing an empirical loss: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n= \\arg\\min_{f \\in \\mathcal{F}}\n\\frac{1}{n}\\sum_{i=1}^{n}\\ell(y_i, f(x_i)) + \\lambda \\Omega(f).&quot;,&quot;id&quot;:&quot;NMLHWEOLAM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(f\) is a candidate prediction function, \(\ell\) is a loss function, \(\Omega(f)\) is a regularization term, \(\lambda \geq 0\) controls the strength of regularization, and \(\hat{f}\) is the model learned specifically for this dataset. The notation \(\arg\min\) means &#8220;choose the function that makes this objective as small as possible.&#8221; Tree-based models such as Random Forest, XGBoost, and CatBoost differ in how they define \(\mathcal{F}\), how they optimize the model, and how they regularize it, but the basic idea is still dataset-specific fitting.</p><p>TabPFN is conceptually different. Following the terminology I used in P4, we can insert a latent task variable \(\phi\) into the posterior predictive distribution. Here, \(\phi\) represents the underlying supervised machine learning task: the feature-target relationship, the noise pattern, and other task-level assumptions that determine how data is generated.</p><p>For a new row \(x_\text{new}\), the posterior predictive distribution can be written as: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y_\\text{new}|x_\\text{new}, D) = \\int p(y_\\text{new}|x_\\text{new}, \\phi)\\,p(\\phi|D)\\,d\\phi.&quot;,&quot;id&quot;:&quot;WEIIJJFCET&quot;}" data-component-name="LatexBlockToDOM"></div><p>The first term does not include \(D\) because, after conditioning on the latent task \(\phi\), the task itself is assumed to contain the information needed to describe how \(x_\text{new}\) maps to \(y_\text{new}\). The integral means that the prediction averages over possible tasks \(\phi\), weighted by how plausible each task is after seeing \(D\). If the set of possible tasks were discrete, this would look like a weighted sum instead of an integral.</p><p>Using Bayes&#8217; rule: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(\\phi|D) = \\frac{p(D|\\phi)p(\\phi)}{p(D)}.&quot;,&quot;id&quot;:&quot;PHZXKRJPWU&quot;}" data-component-name="LatexBlockToDOM"></div><p>The prior \(p(\phi)\) represents assumptions about what kinds of tabular tasks are likely. The likelihood \(p(D|\phi)\) says how likely the current dataset is under task \(\phi\). The posterior \(p(\phi|D)\) says which tasks remain plausible after seeing the current dataset. The denominator \(p(D)\) is a normalizing constant.</p><p>This gives the same intuition as P4: we are averaging predictions across possible latent tasks, weighted by how plausible those tasks are after seeing the context dataset.</p><p>TabPFN does not explicitly compute this integral at prediction time. Instead, it is pretrained on many synthetic tasks so that the neural network amortizes this inference. In this context, amortization means that much of the work of learning how to infer tasks has already happened during pretraining; at prediction time, the model uses \(D\) as context and \(x_\text{new}\) as a query to directly output predictions that approximate this posterior predictive behavior.</p><p>This is a useful theoretical lens, but it should not be read as a guarantee that TabPFN is perfectly Bayesian for every real dataset. It is an amortized approximation learned from the task distribution used during pretraining. The closer a real task is to the kinds of tasks represented by that prior, the more useful we should expect this behavior to be.</p><p>This is the mathematical reason why the same sklearn-looking code can mean different things:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">model.fit(X_train, y_train)</code></pre></div><p>For XGBoost, this learns task-specific trees. For TabPFN, this prepares preprocessing, cache state, and task context for a pretrained model.</p><h3>1.3 Mathematical objects we will inspect</h3><p>The introduction named three diagnostic views: probability surfaces, regression curves, and quantile intervals. This subsection defines the mathematical objects behind those views so that the hands-on demo is easier to interpret.</p><p>I will use three handles to make predictive behavior visible:</p><ol><li><p>A classification probability surface.</p></li><li><p>A regression mean function.</p></li><li><p>Regression quantiles and interval coverage.</p></li></ol><p>The next few equations define these objects before we use them in the notebook examples.</p><p>For binary classification, let \(X\) be the random feature vector and \(Y\) be the random class label. The notation \(\mathbb{P}\) means probability. For a specific input value \(x\), define:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\eta(x) = \\mathbb{P}(Y=1|X=x, D).&quot;,&quot;id&quot;:&quot;KLNYNVWBCL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\eta(x)\) is the model&#8217;s estimated probability of the positive class given the training dataset \(D\). The decision boundary at threshold \(\tau\) is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{B}_\\tau = \\{x : \\eta(x) = \\tau\\}.&quot;,&quot;id&quot;:&quot;EGSRNSVYNK&quot;}" data-component-name="LatexBlockToDOM"></div><p>For the usual threshold \(\tau=0.5\), the boundary is where a 0.5-threshold decision rule switches between classes. Looking at \(\eta(x)\), not only the predicted class, tells us how the model&#8217;s positive-class probability changes across the feature space.</p><p>For regression, we shift from class probabilities to a distribution over possible numeric target values. Its conditional cumulative distribution function is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;F_x(y) = \\mathbb{P}(Y \\leq y | X=x, D).\n&quot;,&quot;id&quot;:&quot;EXGCXHHMBP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(F_x(y)\) is the probability that the target \(Y\) is less than or equal to the candidate value \(y\), given input \(x\) and context \(D\). From this distribution, we can define the predictive mean, where \(\mathbb{E}\) means expectation: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mu(x) = \\mathbb{E}[Y|X=x,D],&quot;,&quot;id&quot;:&quot;JJZQZFCIBG&quot;}" data-component-name="LatexBlockToDOM"></div><p>and the \(\alpha\)-quantile, where \(\alpha\) is a probability level between 0 and 1: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q_\\alpha(x) = \\inf \\{y : F_x(y) \\geq \\alpha\\}.&quot;,&quot;id&quot;:&quot;YTNHRDEGVJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notation \(\inf\) means the infimum: the leftmost value, or limiting lower bound, where the cumulative probability reaches at least \(\alpha\).</p><p>This is useful because the notebook asks TabPFN for quantiles. Once we have quantiles, we can form prediction intervals. For example, an 80% central prediction interval can be written as: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;[Q_{0.1}(x), Q_{0.9}(x)].&quot;,&quot;id&quot;:&quot;SMJAYCGNYF&quot;}" data-component-name="LatexBlockToDOM"></div><p>This interval is useful only if it is calibrated. For a well-calibrated central 80% interval, on a held-out dataset \(\{(x_j, y_j)\}_{j=1}^m\), where \(m\) is the number of held-out rows, we would expect: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{1}{m}\\sum_{j=1}^{m}\n\n\\mathbf{1}\\{y_j \\in [Q_{0.1}(x_j), Q_{0.9}(x_j)]\\}\n\n\\approx 0.8.\n\n&quot;,&quot;id&quot;:&quot;OPGVAPURNO&quot;}" data-component-name="LatexBlockToDOM"></div><p>This equation is the mathematical version of the interval coverage check used later in the post.</p><p>The indicator \(\mathbf{1}\{\cdot\}\) equals 1 when the condition inside the braces is true and 0 otherwise. So the average counts the fraction of held-out targets that fall inside the predicted interval.</p><h3>1.4 Classical diagnostics vs TabPFN&#8217;s workflow</h3><p>The diagnostics in this post are familiar supervised ML tools. What changes is not the diagnostic itself, but the source of the predictions being diagnosed: TabPFN is pretrained and conditions on the current dataset as context.</p><ul><li><p>Probability surfaces and decision boundaries: standard supervised ML can visualize any probabilistic classifier with <code>predict_proba</code> in 2D. TabPFN adds a probability surface produced by conditioning a pretrained model on the current context dataset.</p></li><li><p><strong>Smooth regression curves:</strong> splines, Gaussian processes, neural networks, and tuned boosting workflows can produce smooth predictions. TabPFN may produce a smooth-looking mean function from a small context dataset without building a custom smooth model.</p></li><li><p>Quantile predictions and intervals: quantile regression, conformal prediction, Bayesian models, and ensembles can provide intervals. TabPFN regression can expose distributional summaries such as quantiles directly through the prediction API.</p></li><li><p>Meaning of <code>.fit()</code>: classical models usually fit task-specific parameters from the current dataset. Ordinary TabPFN <code>.fit()</code> prepares preprocessing, cache state, and context for a pretrained model whose weights are not updated.</p></li></ul><p>So the point is not that TabPFN invented these diagnostics. The point is to apply familiar diagnostics to TabPFN and ask whether its pretrained, context-conditioned workflow gives useful behavior with less task-specific tuning.</p><div><hr></div><h2>2. Hands-on Demo: Inspecting Predictive Behavior</h2><p>The conceptual background gave us the objects to inspect: a probability surface, a mean function, and quantile intervals. Now I use the notebook to inspect those objects directly for TabPFN, Random Forest, and XGBoost.</p><p>The notebook section has three examples:</p><ol><li><p>Classification decision boundaries.</p></li><li><p>Regression curve fitting.</p></li><li><p>Regression uncertainty with quantiles.</p></li></ol><p>The full notebook contains the helper functions and plotting code. Below, I show the parts that matter for understanding the workflow.</p><h3>2.1 Classification decision boundaries</h3><p>The first example creates a binary classification dataset made of concentric circles. This is useful for studying predictive behavior because the correct class transition is nonlinear and easy to inspect visually.</p><p>The important plotting choice is <code>response_method=&#8221;predict_proba&#8221;</code>. I do not only want to know which class the model predicts; I want to see the probability surface.</p><p>Mathematically, the plot visualizes an estimate of: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\eta(x) = \\mathbb{P}(Y=1|X=x, D)&quot;,&quot;id&quot;:&quot;URIFVJITSR&quot;}" data-component-name="LatexBlockToDOM"></div><p>over a grid of \(x\)-values. The color transition region corresponds to the decision boundary \(\mathcal{B}_{0.5}\).</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">X_train, y_train = generate_circle_data(
    num_points_per_circle=[50, 100, 200],
    radii=[1, 2, 4],
    noise_factor=0.1,
)

rf = RandomForestClassifier().fit(X_train[:, :2], y_train)
xgb = XGBClassifier().fit(X_train[:, :2], y_train)
tabpfn = TabPFNClassifier().fit(X_train[:, :2], y_train)

DecisionBoundaryDisplay.from_estimator(
    tabpfn,
    X_train[:, :2],
    response_method=&#8221;predict_proba&#8221;,
    grid_resolution=50,
)</code></pre></div><p>The notebook repeats this plotting workflow for Random Forest, XGBoost, and TabPFN. I show the TabPFN call here because the important detail is the use of <code>predict_proba</code> to visualize the probability surface.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_64m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_64m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 424w, https://substackcdn.com/image/fetch/$s_!_64m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 848w, https://substackcdn.com/image/fetch/$s_!_64m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 1272w, https://substackcdn.com/image/fetch/$s_!_64m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_64m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png" width="690" height="690" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:690,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:241452,&quot;alt&quot;:&quot;Decision boundary comparison.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195664298?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Decision boundary comparison." title="Decision boundary comparison." srcset="https://substackcdn.com/image/fetch/$s_!_64m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 424w, https://substackcdn.com/image/fetch/$s_!_64m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 848w, https://substackcdn.com/image/fetch/$s_!_64m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 1272w, https://substackcdn.com/image/fetch/$s_!_64m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe31e5010-cf3b-46a1-9e2d-beeb4e534b11_690x690.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Decision boundary comparison.</figcaption></figure></div><p>Random Forest and XGBoost learn the circular pattern, but their probability surfaces contain more block-like regions. This is consistent with how tree models partition the feature space. A tree often produces a piecewise-constant function of the form: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}(x) = \\sum_m c_m \\mathbf{1}\\{x \\in R_m\\}.&quot;,&quot;id&quot;:&quot;GDBCHDOTJK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(R_m\) is one region of the input space, such as a leaf region in a tree, and \(c_m\) is the prediction assigned to that region. Averaging many trees can smooth this behavior, but the partitioning can still be visible in simple 2D plots.</p><p>TabPFN&#8217;s smoother radial probability surface reflects a different inductive bias: the boundary comes from a pretrained model conditioning on the current rows as context rather than from task-specific tree partitions learned only from this dataset.</p><p>This 2D decision-boundary plot is useful because the toy dataset has two features. In real high-dimensional tabular projects, the same diagnostic idea usually shows up through calibration curves, residual plots by feature bins, partial dependence or ICE plots, SHAP dependence plots, and segment-level error analysis.</p><p>The lesson is not &#8220;TabPFN is always better.&#8221; The lesson is that the same probability-surface diagnostic can reveal different inductive biases across models.</p><h3>2.2 Regression curve fitting</h3><p>The classification example inspected the probability surface \(\eta(x)\). The second example shifts to regression and inspects the learned mean function \(\mu(x)\). It uses a simple one-dimensional regression problem. The noiseless data-generating function is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;f^\\star(x) = \\sin(x) + \\frac{x}{10}.&quot;,&quot;id&quot;:&quot;DEZWRZAOOG&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notebook samples 40 training points, fits Random Forest, XGBoost, and TabPFN, and predicts on a dense grid.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">X_train, y_train = generate_sinx_plus_x(N=40)
X_test = np.linspace(0, 20, 200).reshape(-1, 1)

rf = RandomForestRegressor(random_state=42).fit(X_train, y_train)
xgb = XGBRegressor(random_state=42).fit(X_train, y_train)

tabpfn = TabPFNRegressor()
tabpfn.fit(X_train, y_train)

y_pred_tabpfn = tabpfn.predict(X_test)</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZXrj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZXrj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 424w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 848w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZXrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png" width="1389" height="1189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1189,&quot;width&quot;:1389,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100609,&quot;alt&quot;:&quot;Sin curve fitting comparison&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195664298?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sin curve fitting comparison" title="Sin curve fitting comparison" srcset="https://substackcdn.com/image/fetch/$s_!ZXrj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 424w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 848w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!ZXrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808187cc-3e61-4652-9f68-96fd8b876dfc_1389x1189.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sin curve fitting comparison</figcaption></figure></div><p>The tree-based models follow the data, but their predictions are more step-like. This is expected because trees partition the feature space into regions. TabPFN produces a smoother curve that follows the sinusoidal trend.</p><p>For real regression projects, this kind of inspection maps directly to predicted-vs-actual plots, residual plots, and residuals by feature bins.</p><p>Smooth regression is not unique to TabPFN. A spline model, Gaussian process, neural network, or tuned boosting workflow may also produce smooth predictions. The TabPFN-specific point is that the estimated predictive mean \(\mu(x)\) is shaped by the pretrained model&#8217;s learned prior and the small context dataset, without building a custom smooth model for this toy function.</p><h3>2.3 Regression uncertainty with quantiles</h3><p>The regression curve example focused on the mean function \(\mu(x)\). The third example moves from mean prediction to uncertainty. The notebook creates a line with heteroscedastic noise, meaning the noise grows with \(x\). In mathematical form, the toy data is approximately: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y = 0.8x + \\sigma(x)\\epsilon,\n\n\\quad \\sigma(x) = 0.1x,\n\n\\quad \\epsilon \\sim \\mathcal{N}(0,1).&quot;,&quot;id&quot;:&quot;CVLDGAAJVJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(x\) is the input, \(0.8x\) is the noiseless line, \(\sigma(x)\) is the input-dependent noise scale, and \(\epsilon\) is standard normal random noise.</p><p>The notebook also leaves a gap in the training data. This lets us inspect whether TabPFN expresses higher uncertainty where the data is noisier or sparse.</p><p>The key call is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">reg = TabPFNRegressor()
reg.fit(x, y_noisy)
preds = reg.predict(x_test, output_type=&#8221;full&#8221;)</code></pre></div><p>With <code>output_type=&#8221;full&#8221;</code>, TabPFN returns several summaries of the predictive distribution, including mean, median, mode, and quantiles. If quantiles are not specified, the default quantiles are:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ycKQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ycKQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 424w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 848w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 1272w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ycKQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png" width="1005" height="547" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:547,&quot;width&quot;:1005,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66291,&quot;alt&quot;:&quot;Regression uncertainty with quantiles&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195664298?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Regression uncertainty with quantiles" title="Regression uncertainty with quantiles" srcset="https://substackcdn.com/image/fetch/$s_!ycKQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 424w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 848w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 1272w, https://substackcdn.com/image/fetch/$s_!ycKQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cb76e51-9ab5-4be7-a683-d2eac19b5907_1005x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Regression uncertainty with quantiles</figcaption></figure></div><p>The left panel shows the generated data. The right panel shows TabPFN&#8217;s predictive quantile bands. The bands are narrow where the data is dense and low-noise. They become wider in noisier regions and around the gap where the model has less direct context.</p><p>This behavior is consistent with useful distributional predictions, but the intervals still need to be validated with held-out coverage checks.</p><p>The TabPFN-specific point here is convenience and integration: TabPFN regression can expose distributional summaries directly through the model output interface. Traditional supervised ML can also provide uncertainty, but usually through a separate method such as quantile regression, conformal prediction, Bayesian modeling, or ensembling.</p><p>Quantiles are model outputs, not guarantees. For a held-out dataset, I would request the two interval endpoints explicitly and compute empirical coverage:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">q10_pred, q90_pred = reg.predict(
    X_holdout,
    output_type=&#8221;quantiles&#8221;,
    quantiles=[0.1, 0.9],
)

coverage_80 = np.mean((y_holdout &gt;= q10_pred) &amp; (y_holdout &lt;= q90_pred))
print(f&#8221;80% interval coverage: {coverage_80:.3f}&#8221;)</code></pre></div><p>Here, <code>X_holdout</code> and <code>y_holdout</code> are rows and targets that were not used in <code>reg.fit(...)</code>. If the value is close to <code>0.8</code>, allowing for sampling noise, the interval is roughly calibrated on that held-out sample. If it is much lower, the model is overconfident. If it is much higher, the intervals may be too wide to be operationally useful. This is a diagnostic, not a proof of calibration for every future segment.</p><div><hr></div><h2>3. Summary and Conclusion</h2><p>In this post, I used predictive behavior as a diagnostic lens for comparing TabPFN with familiar supervised ML baselines.</p><p>The conceptual section connected the notebook examples to three mathematical objects: the classification probability surface \(\eta(x)\), the regression mean function \(\mu(x)\), and the regression predictive distribution with quantiles \(Q_\alpha(x)\). This made the later plots easier to interpret because each visual had a mathematical object behind it.</p><p>The hands-on examples then showed those objects in code. The classification example compared probability surfaces, the regression curve-fitting example compared learned mean functions, and the uncertainty example used TabPFN regression quantiles to inspect how predictive intervals behave in noisy or sparse regions.</p><p>The main takeaway is that the diagnostics themselves are not new to tabular foundation models. What is different in TabPFN is the workflow: a pretrained tabular model conditions on the current dataset as context, and in regression it can expose distributional summaries such as quantiles directly through its prediction API. That makes it worth asking, example by example, whether TabPFN gives useful behavior with less task-specific tuning.</p><p>With this post, I have moved one step further through the official TabPFN hands-on demo. In the upcoming posts, I will continue exploring the remaining parts of the TabPFN ecosystem and focus on ideas that can be applied in real tabular data workflows. Stay tuned.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Search & Ranking Systems: A Practical Guide for Data Scientists]]></title><description><![CDATA[Query understanding, semantic search, learning-to-rank, personalization, transformers, two-tower architectures, LLM ontologies, evaluation &#8212; plus a runnable demo.]]></description><link>https://newsletter.dsaiengineering.com/p/search-and-ranking-systems-a-practical</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/search-and-ranking-systems-a-practical</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Sat, 06 Dec 2025 16:28:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!O-fN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O-fN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O-fN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O-fN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1772533,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O-fN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!O-fN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2c89eec-de70-4bc6-8d03-5c0e7aa37379_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Search and ranking power discovery in almost every digital product we use: ecommerce, content, jobs, maps, support, and more.</p><p>This post is a technical, hands&#8209;on introduction to the main building blocks of search and ranking systems. We&#8217;ll look at:</p><ul><li><p><strong>Query understanding</strong></p></li><li><p><strong>Semantic search</strong></p></li><li><p><strong>Spell correction</strong></p></li><li><p><strong>Ranking / Learning-to-Rank</strong></p></li><li><p><strong>Personalization</strong></p></li><li><p><strong>Transformer architectures</strong></p></li><li><p><strong>Two-tower architectures</strong></p></li><li><p><strong>LLM-based ontologies</strong></p></li><li><p><strong>Evaluating search results at scale</strong></p></li></ul><p>The goal is that by the end, you&#8217;ll know what each concept means and how it fits into a production stack.</p><div><hr></div><h2><strong>1. Big Picture: How Modern Search Systems Work</strong></h2><p>Let&#8217;s start with the mental model.</p><p>When a user types a query (<code>&#8220;sushi&#8221;</code>) into a search box, a typical production system looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8goa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8goa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 424w, https://substackcdn.com/image/fetch/$s_!8goa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 848w, https://substackcdn.com/image/fetch/$s_!8goa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!8goa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8goa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png" width="280" height="714.0520446096655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1372,&quot;width&quot;:538,&quot;resizeWidth&quot;:280,&quot;bytes&quot;:64053,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8goa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 424w, https://substackcdn.com/image/fetch/$s_!8goa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 848w, https://substackcdn.com/image/fetch/$s_!8goa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!8goa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7764b0c2-bf57-404a-9fa0-9761c3cc8e5a_538x1372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Each block maps to the buzzwords:</p><ul><li><p><strong>Query understanding</strong>: clean and interpret the text (&#8220;susshi&#8221; &#8594; &#8220;sushi&#8221;, detect intent, extract entities).</p></li><li><p><strong>Semantic search &amp; Two&#8209;Tower architectures</strong>: retrieve candidates by <em>meaning</em>, not just exact keywords.</p></li><li><p><strong>Learning&#8209;to&#8209;Rank</strong>: order candidates using ML based on relevance signals.</p></li><li><p><strong>Personalization</strong>: adjust ranking per user.</p></li><li><p><strong>Transformers &amp; NLP</strong>: power semantic understanding, embeddings, and generative components.</p></li><li><p><strong>LLM&#8209;based ontologies</strong>: structure the catalog &amp; concepts to make search smarter.</p></li><li><p><strong>Evaluation at scale</strong>: measure all of this with offline metrics and online A/B tests.</p></li></ul><p>We&#8217;ll go through these components with code and simple architecture diagrams.</p><div><hr></div><h2><strong>2. Learning&#8209;to&#8209;Rank (LTR): The Core of Ranking</strong></h2><h3><strong>2.1 Problem setup</strong></h3><p>In many supervised ML setups you predict one label per row.</p><p>In <strong>Learning&#8209;to&#8209;Rank</strong>, you care about <em>ordering</em> documents for a query:</p><ul><li><p>Input: query <code>q</code> and documents <code>d&#8321;, &#8230;, d&#8345;</code></p></li><li><p>Features: <code>x_i = f(q, d_i, user, context)</code></p></li><li><p>Output: scores <code>s_i</code> that induce a ranking</p></li></ul><p>Data is grouped by query:</p><pre><code> Query q1:  d1, d2, d3  &#8594; label: [2, 0, 1] (relevance levels)
 Query q2:  d4, d5      &#8594; label: [0, 1]
 ...</code></pre><h3><strong>2.2 Three main LTR paradigms</strong></h3><ol><li><p><strong>Pointwise</strong><br>Treat each (q, d) as an independent sample.<br>Example: regression on a relevance score (0&#8211;3) or classification (clicked / not).</p></li><li><p><strong>Pairwise</strong><br>Learn from <em>pairs</em> of documents for the same query.<br>For each pair (d+, d&#8722;): model learns <code>score(d+) &gt; score(d&#8722;)</code>.</p></li><li><p><strong>Listwise</strong><br>Optimize over the whole ranked list (e.g. approximating NDCG).</p></li></ol><p>In practice, <strong>gradient-boosted trees</strong> (LambdaMART, XGBoost ranker, LightGBM ranker) are very common in industry because they perform well, are relatively fast, and handle heterogeneous features nicely.</p><h3><strong>2.3 Minimal LTR example with XGBoost</strong></h3><p>Below is a toy example using <code>XGBRanker</code> to rank results for different queries.</p><pre><code> import numpy as np
 from xgboost import XGBRanker
 &#8203;
 # Toy data: 3 queries, each with some candidate items
 # Features could be: [BM25_score, semantic_similarity, popularity]
 X = np.array([
     [2.1, 0.4, 10],  # q1-d1
     [1.2, 0.7, 30],  # q1-d2
     [0.5, 0.2,  5],  # q1-d3
     [0.1, 0.9, 50],  # q2-d4
     [0.3, 0.3, 20],  # q2-d5
     [1.0, 0.1,  1],  # q3-d6
     [0.9, 0.6, 15],  # q3-d7
 ])
 &#8203;
 # Relevance labels (higher = more relevant)
 y = np.array([
     2,  # q1-d1
     3,  # q1-d2
     0,  # q1-d3
     3,  # q2-d4
     1,  # q2-d5
     0,  # q3-d6
     2,  # q3-d7
 ])
 &#8203;
 # Group size: number of documents per query
 group = [3, 2, 2]  # q1 has 3 docs, q2 has 2, q3 has 2
 &#8203;
 model = XGBRanker(
     objective=&#8221;rank:pairwise&#8221;,
     n_estimators=100,
     learning_rate=0.1,
     max_depth=4,
     subsample=0.8,
     colsample_bytree=0.8
 )
 &#8203;
 model.fit(X, y, group=group)
 &#8203;
 # Predict and see the ranking for query 1 documents
 scores_q1 = model.predict(X[0:3])
 ranking_q1 = np.argsort(-scores_q1)  # descending
 print(&#8221;Scores q1:&#8221;, scores_q1)
 print(&#8221;Ranking q1 (indices):&#8221;, ranking_q1)</code></pre><p>In a real system:</p><ul><li><p><code>X</code> is built from many features: lexical, semantic, user, item, context.</p></li><li><p><code>y</code> often comes from logs (clicks, purchases) with some preprocessing.</p></li><li><p><code>group</code> is derived from query IDs.</p></li></ul><p>This is the backbone ranking block in the earlier pipeline diagram.</p><div><hr></div><h2><strong>3. Query Understanding</strong></h2><p><strong>Query understanding</strong> is about turning raw user text into something the system can reason about.</p><p>Typical sub-tasks:</p><ol><li><p><strong>Normalization</strong></p><ul><li><p>Lowercasing, Unicode normalization, removing punctuation.</p></li><li><p>Handling accents (<code>&#8220;pap&#225;&#8221;</code> &#8594; <code>&#8220;papa&#8221;</code>), transliteration.</p></li></ul></li><li><p><strong>Tokenization</strong></p><ul><li><p>Split into tokens, handle multi&#8209;word entities (<code>&#8220;ice cream&#8221;</code>).</p></li></ul></li><li><p><strong>Spell correction / typo handling</strong></p><ul><li><p><code>&#8220;suhshi&#8221;</code> &#8594; <code>&#8220;sushi&#8221;</code>.</p></li><li><p><code>&#8220;iphon 15&#8221;</code> &#8594; <code>&#8220;iphone 15&#8221;</code>.</p></li></ul></li><li><p><strong>Intent classification</strong></p><ul><li><p>Is the user searching for a product, a category, a help article?</p></li><li><p>Example labels: <code>{&#8221;product_search&#8221;, &#8220;faq_search&#8221;, &#8220;navigation&#8221;, ...}</code></p></li></ul></li><li><p><strong>Entity extraction</strong></p><ul><li><p>Extract &#8220;sushi&#8221;, &#8220;Amsterdam&#8221;, &#8220;vegan&#8221;, etc.</p></li><li><p>Map to catalog entities: category IDs, locations, cuisines&#8230;</p></li></ul></li><li><p><strong>Query rewriting / expansion</strong></p><ul><li><p>Add synonyms, canonicalize terms (<code>&#8220;veggie&#8221;</code> &#8594; <code>&#8220;vegetarian&#8221;</code>).</p></li></ul></li></ol><h3><strong>3.1 Minimal intent classification example</strong></h3><p>You can treat query intent classification as a standard text classification problem:</p><pre><code> import numpy as np
 from sklearn.pipeline import Pipeline
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.linear_model import LogisticRegression
 &#8203;
 queries = [
     &#8220;iphone 15 pro max&#8221;,
     &#8220;refund policy&#8221;,
     &#8220;restaurants near me&#8221;,
     &#8220;track my order&#8221;,
     &#8220;vegan sushi&#8221;,
 ]
 &#8203;
 labels = [
     &#8220;product_search&#8221;,
     &#8220;faq_search&#8221;,
     &#8220;local_search&#8221;,
     &#8220;faq_search&#8221;,
     &#8220;product_search&#8221;,
 ]
 &#8203;
 pipe = Pipeline([
     (&#8221;tfidf&#8221;, TfidfVectorizer(ngram_range=(1, 2))),
     (&#8221;clf&#8221;, LogisticRegression(max_iter=1000)),
 ])
 &#8203;
 pipe.fit(queries, labels)
 &#8203;
 print(pipe.predict([&#8221;cancellation policy&#8221;]))
 print(pipe.predict([&#8221;best burgers in berlin&#8221;]))</code></pre><p>In production, you&#8217;d likely use:</p><ul><li><p>Better tokenization (e.g. spaCy, HuggingFace tokenizers)</p></li><li><p>Possibly a transformer encoder (see next sections)</p></li><li><p>More sophisticated labels and training data</p></li></ul><div><hr></div><h2><strong>4. Semantic Search &amp; Two&#8209;Tower Architectures</strong></h2><h3><strong>4.1 Lexical vs semantic search</strong></h3><p>Traditional search (BM25, TF&#8209;IDF) is <strong>lexical</strong>:</p><ul><li><p>Relevance is based on <em>overlap</em> of terms between query and document.</p></li><li><p><code>&#8220;cheap phone&#8221;</code> won&#8217;t match <code>&#8220;affordable smartphone&#8221;</code> very well.</p></li></ul><p><strong>Semantic search</strong> uses <em>vector representations</em> (embeddings) of text:</p><ul><li><p>Encode queries and documents into vectors in &#8477;&#7496;.</p></li><li><p>Similar meanings &#8594; close in vector space (via cosine / dot product).</p></li><li><p>Retrieval: find top&#8209;K documents with highest similarity.</p></li></ul><h3><strong>4.2 Two&#8209;Tower (Dual-Encoder) architecture</strong></h3><p>For scalable semantic retrieval, a common pattern is the <strong>Two&#8209;Tower architecture</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W1G6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W1G6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 424w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 848w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 1272w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W1G6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png" width="1456" height="403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54143,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W1G6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 424w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 848w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 1272w, https://substackcdn.com/image/fetch/$s_!W1G6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3eddba7-6c3b-4f2b-ac2e-42d10a6158a6_1568x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>The <strong>query encoder</strong> and <strong>item encoder</strong> share architecture but may (or may not) share weights.</p></li><li><p>You pre-compute and index <strong>item embeddings</strong> in an ANN index (FAISS, ScaNN, etc).</p></li><li><p>At query time, compute <code>q_vec</code>, then retrieve top&#8209;K items by similarity.</p></li></ul><h3><strong>4.3 Training objective (contrastive learning)</strong></h3><p>Given training triples <code>(q, d&#8314;, d&#8315;)</code> where <code>d&#8314;</code> is relevant and <code>d&#8315;</code> is not, you can train with a contrastive loss.</p><p>Pseudo&#8209;code:</p><pre><code> import torch
 import torch.nn as nn
 import torch.nn.functional as F
 &#8203;
 class DualEncoder(nn.Module):
     def __init__(self, text_encoder):
         super().__init__()
         self.query_encoder = text_encoder()
         self.doc_encoder = text_encoder()
 &#8203;
     def encode_query(self, queries):
         return self.query_encoder(queries)  # [batch, dim]
 &#8203;
     def encode_doc(self, docs):
         return self.doc_encoder(docs)      # [batch, dim]
 &#8203;
 def contrastive_loss(q_vecs, d_pos_vecs, d_neg_vecs, temperature=0.05):
     # q_vecs, d_pos_vecs, d_neg_vecs: [batch, dim]
     # Construct logits: [batch, 1 + num_neg]
     pos_scores = (q_vecs * d_pos_vecs).sum(dim=-1, keepdim=True)  # [B,1]
     neg_scores = (q_vecs.unsqueeze(1) * d_neg_vecs).sum(dim=-1)   # [B, num_neg]
 &#8203;
     logits = torch.cat([pos_scores, neg_scores], dim=1) / temperature
     labels = torch.zeros(q_vecs.size(0), dtype=torch.long)  # index 0 is positive
 &#8203;
     return F.cross_entropy(logits, labels)</code></pre><p>In practice you&#8217;d use a real transformer encoder (e.g., BERT-like model), proper batching, and possibly in-batch negatives (each positive is a negative for others).</p><h3><strong>4.4 Simple semantic search example with pre-trained model</strong></h3><p>If you don&#8217;t want to train from scratch, you can use existing sentence embedding models:</p><pre><code> from sentence_transformers import SentenceTransformer, util
 &#8203;
 model = SentenceTransformer(&#8221;all-MiniLM-L6-v2&#8221;)
 &#8203;
 documents = [
     &#8220;Cheap smartphone with good battery life&#8221;,
     &#8220;Italian restaurant with vegan options&#8221;,
     &#8220;Used car marketplace&#8221;,
 ]
 doc_emb = model.encode(documents, convert_to_tensor=True)
 &#8203;
 query = &#8220;affordable phone with long battery&#8221;
 q_emb = model.encode(query, convert_to_tensor=True)
 &#8203;
 cos_scores = util.cos_sim(q_emb, doc_emb)[0]
 top_k = cos_scores.topk(k=3)
 for score, idx in zip(top_k.values, top_k.indices):
     print(float(score), documents[int(idx)])</code></pre><p>This is a semantic retrieval layer that can feed candidates into your LTR model.</p><div><hr></div><h2><strong>5. Transformer Architecture: Why It Shows Up Everywhere</strong></h2><p>Transformers underpin much of modern NLP, including semantic search, query understanding, and LLMs.</p><h3><strong>5.1 Core ideas</strong></h3><ul><li><p>Input sequence is tokenized into tokens <code>t&#8321;, &#8230;, t&#8345;</code>.</p></li><li><p>Each token mapped to an embedding, plus positional encodings.</p></li><li><p>Multiple layers of <strong>self&#8209;attention + feed&#8209;forward networks</strong>.</p></li><li><p>Self&#8209;attention lets each token attend to all others in the sequence.</p></li></ul><p>Simplified encoder block:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-NSf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-NSf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 424w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 848w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 1272w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-NSf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png" width="336" height="488.03041825095056" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:526,&quot;resizeWidth&quot;:336,&quot;bytes&quot;:28196,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-NSf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 424w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 848w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 1272w, https://substackcdn.com/image/fetch/$s_!-NSf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68094b6f-ba11-4e33-a4ba-44beeb8a8760_526x764.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Encoder&#8209;only models</strong> (BERT, RoBERTa):</p><ul><li><p>Great for classification, retrieval, sentence embeddings.</p></li><li><p>Often used as the backbone in query/item encoders.</p></li></ul><p><strong>Decoder&#8209;only / LLMs</strong> (GPT&#8209;like):</p><ul><li><p>Great for generative tasks: query rewriting, summarization, plan generation, ontology induction (later section).</p></li></ul><p>You don&#8217;t need to derive the attention equations from scratch to work effectively with them in search; you need to know:</p><ul><li><p>They produce contextual embeddings (token/sequence representations).</p></li><li><p>You can fine-tune them for:</p><ul><li><p>Query intent classification</p></li><li><p>Semantic retrieval (dual encoder)</p></li><li><p>Text-to-structure tasks (ontology building, entity extraction).</p></li></ul></li></ul><div><hr></div><h2><strong>6. Spell Correction</strong></h2><p>Users type fast and on mobile; typos are guaranteed.</p><h3><strong>6.1 Classic view: edit distance + language model</strong></h3><p><strong>Step 1: Candidate generation</strong></p><ul><li><p>Generate strings within edit distance &#8804; 1 or 2 from the input.</p></li><li><p>Filter to those seen in your corpus/catalog (e.g., product names).</p></li></ul><p><strong>Step 2: Candidate scoring</strong></p><ul><li><p>Use frequency and language models:</p><ul><li><p><code>score(candidate) = P(noisy_query | candidate) * P(candidate)</code></p></li><li><p>Choose the candidate with highest score.</p></li></ul></li></ul><p>A simple implementation uses Levenshtein distance as a heuristic:</p><pre><code> def levenshtein(a, b):
     dp = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
     for i in range(len(a) + 1):
         dp[i][0] = i
     for j in range(len(b) + 1):
         dp[0][j] = j
     for i in range(1, len(a) + 1):
         for j in range(1, len(b) + 1):
             cost = 0 if a[i-1] == b[j-1] else 1
             dp[i][j] = min(
                 dp[i-1][j] + 1,      # deletion
                 dp[i][j-1] + 1,      # insertion
                 dp[i-1][j-1] + cost  # substitution
             )
     return dp[-1][-1]
 &#8203;
 def best_correction(query, vocab):
     best = query
     best_dist = float(&#8221;inf&#8221;)
     for v in vocab:
         d = levenshtein(query, v)
         if d &lt; best_dist:
             best_dist = d
             best = v
     return best
 &#8203;
 vocab = [&#8221;sushi&#8221;, &#8220;suspect&#8221;, &#8220;sandwich&#8221;]
 print(best_correction(&#8221;suhshi&#8221;, vocab))  # &#8594; &#8220;sushi&#8221;</code></pre><p>In production you&#8217;ll:</p><ul><li><p>Use more efficient algorithms (trigram indexes, BK&#8209;trees).</p></li><li><p>Include language model / semantic signals (e.g. user typed <code>&#8220;suhshi restaurant&#8221;</code> &#8594; strongly prefer <code>&#8220;sushi&#8221;</code>).</p></li><li><p>Often treat it as a ranking problem (again): generate candidates, then rank with ML.</p></li></ul><h3><strong>6.2 Neural spell correction</strong></h3><p>Modern systems often use <strong>sequence&#8209;to&#8209;sequence transformers</strong> trained on (noisy, clean) pairs:</p><ul><li><p>Input: <code>&#8220;suhshi near me&#8221;</code></p></li><li><p>Output: <code>&#8220;sushi near me&#8221;</code></p></li></ul><p>These can be more robust to complex typos, spacing issues, and can correct multi-token sequences.</p><div><hr></div><h2><strong>7. Personalization</strong></h2><p>Search relevance is not one&#8209;size&#8209;fits&#8209;all:</p><ul><li><p>Some users care about price.</p></li><li><p>Others care about delivery time, ratings, brand, etc.</p></li></ul><h3><strong>7.1 Feature-based personalization</strong></h3><p>The simplest approach: add <strong>user and context features</strong> into your LTR model:</p><ul><li><p>User features:</p><ul><li><p>Long&#8209;term engagement by category</p></li><li><p>Average price of past purchases</p></li><li><p>&#8220;Healthiness&#8221;, &#8220;vegan&#8221;, &#8220;premium&#8221; preferences</p></li></ul></li><li><p>Context features:</p><ul><li><p>Time of day, day of week</p></li><li><p>Device, location</p></li><li><p>Session-level features</p></li></ul></li></ul><p>The LTR model learns how these features interact with item features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!onC6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!onC6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 424w, https://substackcdn.com/image/fetch/$s_!onC6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 848w, https://substackcdn.com/image/fetch/$s_!onC6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 1272w, https://substackcdn.com/image/fetch/$s_!onC6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!onC6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png" width="508" height="276.42522522522523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:604,&quot;width&quot;:1110,&quot;resizeWidth&quot;:508,&quot;bytes&quot;:40675,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!onC6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 424w, https://substackcdn.com/image/fetch/$s_!onC6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 848w, https://substackcdn.com/image/fetch/$s_!onC6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 1272w, https://substackcdn.com/image/fetch/$s_!onC6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c9f32f0-6edc-4a3b-85a4-537592a7c77f_1110x604.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>7.2 Two&#8209;Tower for personalized retrieval</strong></h3><p>You can extend the Two&#8209;Tower idea:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nN-z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nN-z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 424w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 848w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 1272w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nN-z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png" width="576" height="178.33451957295372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20c7855e-c224-44be-8eae-c2c477200301_1124x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:1124,&quot;resizeWidth&quot;:576,&quot;bytes&quot;:24370,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nN-z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 424w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 848w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 1272w, https://substackcdn.com/image/fetch/$s_!nN-z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c7855e-c224-44be-8eae-c2c477200301_1124x348.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><ul><li><p>User embedding is learned from user ID, interaction history, etc.</p></li><li><p>Item embedding from item features.</p></li><li><p>Train with contrastive loss: user should be close to interacted items, far from non&#8209;interacted items.</p></li><li><p>Retrieval at serving time: recommend items with closest vectors to user embedding.</p></li></ul><p>This is often paired with:</p><ul><li><p><strong>Retrieval tower</strong> (user&#8211;item dual encoder) &#8594; candidate set.</p></li><li><p><strong>Ranking model</strong> (feature-rich LTR) &#8594; final personalized ordering.</p></li></ul><div><hr></div><h2><strong>8. LLM&#8209;based Ontologies</strong></h2><h3><strong>8.1 What is an ontology?</strong></h3><p>An <strong>ontology</strong> is a structured representation of concepts and their relationships:</p><ul><li><p>Entities: categories, items, attributes.</p></li><li><p>Relationships: <code>is_a</code>, <code>part_of</code>, <code>compatible_with</code>, etc.</p></li></ul><p>Example snippet:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vEBm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vEBm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 424w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 848w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 1272w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vEBm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png" width="516" height="289.2096774193548" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:992,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:31205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vEBm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 424w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 848w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 1272w, https://substackcdn.com/image/fetch/$s_!vEBm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10dce120-ac53-4216-a27c-fb897fe8490e_992x556.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Attributes for &#8220;Sushi&#8221; might include: cuisine=Japanese, serves_raw_fish=True, typically_contains_rice=True.</p><p>Ontologies power:</p><ul><li><p>Query understanding (&#8220;sushi&#8221; &#8712; Japanese food)</p></li><li><p>Faceted search (filter by cuisine, price range, dietary restrictions)</p></li><li><p>Recommendation diversity (ensure we cover multiple categories)</p></li></ul><h3><strong>8.2 How LLMs help</strong></h3><p>Traditionally, ontologies were built manually or with rule-based NLP. That doesn&#8217;t scale.</p><p>LLMs can:</p><ol><li><p><strong>Generate category trees</strong></p><ul><li><p>Given a list of item names/descriptions, propose a hierarchical category structure.</p></li></ul></li><li><p><strong>Map items to categories</strong></p><ul><li><p>&#8220;Assign this product to one of these categories [A, B, C].&#8221;</p></li></ul></li><li><p><strong>Extract structured attributes</strong></p><ul><li><p>From textual descriptions, emit JSON with fields like <code>cuisine</code>, <code>price_range</code>, <code>is_vegan_friendly</code>.</p></li></ul></li><li><p><strong>Discover synonyms and related concepts</strong></p><ul><li><p>For semantic expansion: <code>&#8220;veggie&#8221;</code>, <code>&#8220;plant-based&#8221;</code>, <code>&#8220;vegetarian&#8221;</code>.</p></li></ul></li></ol><p>Pseudo-code for attribute extraction with an LLM (conceptual):</p><pre><code> def extract_attributes(description, llm_client):
     prompt = f&#8221;&#8220;&#8221;
     You are an information extraction system.
     Read the following restaurant description and output JSON with fields:
     cuisine (string), price_level (one of: cheap, medium, expensive),
     vegetarian_friendly (true/false).
 &#8203;
     Description: {description}
     JSON:
     &#8220;&#8221;&#8220;
     response = llm_client.generate(prompt)
     return json.loads(response)</code></pre><p>Once you have an ontology:</p><ul><li><p>Store it in a graph or relational DB.</p></li><li><p>Use it in:</p><ul><li><p>Query rewriting (&#8220;veggie sushi&#8221; &#8594; add constraint vegetarian_friendly=True).</p></li><li><p>Ranking features (match between query intent and item attributes).</p></li><li><p>Diversification (ensure results cover multiple relevant categories).</p></li></ul></li></ul><div><hr></div><h2><strong>9. Evaluating Search Results at Scale</strong></h2><p>You can&#8217;t improve what you don&#8217;t measure. Search evaluation happens on two axes:</p><ol><li><p><strong>Offline metrics</strong>: using logged or labeled data.</p></li><li><p><strong>Online metrics</strong>: A/B tests on real traffic.</p></li></ol><h3><strong>9.1 Offline metrics: NDCG, MRR, Recall@K</strong></h3><p>Given query <code>q</code>, documents <code>d&#8321;..d&#8345;</code> with relevance labels <code>rel_i</code> and system ranking:</p><ul><li><p><strong>DCG@K (Discounted Cumulative Gain)</strong>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{DCG@K} = \\sum_{i=1}^{K} \\frac{2^{rel_i} - 1}{\\log_2(i+1)}&quot;,&quot;id&quot;:&quot;ONSJEYNZIT&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p><strong>IDCG@K</strong> = DCG of ideal ranking (sort by true relevance).</p></li><li><p><strong>NDCG@K</strong>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{NDCG@K} = \\frac{\\text{DCG@K}}{\\text{IDCG@K}}&quot;,&quot;id&quot;:&quot;UWPATLSUDV&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p><strong>MRR@K</strong> (Mean Reciprocal Rank):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{RR} = \\frac{1}{\\text{rank of first relevant item}}&quot;,&quot;id&quot;:&quot;MMBZRIFOKM&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p><strong>Recall@K</strong>: fraction of relevant items retrieved in top&#8209;K.</p></li></ul><p>Simple NDCG@K implementation:</p><pre><code> import numpy as np
 &#8203;
 def dcg_at_k(rels, k):
     rels = np.asarray(rels)[:k]
     gains = 2 ** rels - 1
     discounts = np.log2(np.arange(2, len(rels) + 2))
     return np.sum(gains / discounts)
 &#8203;
 def ndcg_at_k(rels, k):
     ideal = sorted(rels, reverse=True)
     idcg = dcg_at_k(ideal, k)
     if idcg == 0:
         return 0.0
     return dcg_at_k(rels, k) / idcg
 &#8203;
 # Example: model ranking with relevance labels
 rels = [3, 0, 2, 1]  # relevance of items at positions 1..4
 print(ndcg_at_k(rels, k=3))</code></pre><p>In a real pipeline:</p><ul><li><p>Log per&#8209;query predictions and labels.</p></li><li><p>Aggregate NDCG@K, MRR@K, Recall@K across queries.</p></li><li><p>Compare new models offline before going to A/B testing.</p></li></ul><h3><strong>9.2 Online evaluation: A/B tests</strong></h3><p>Offline metrics have limitations (logging bias, missing labels, etc.). Ultimately, you care about <strong>business and user metrics</strong>:</p><ul><li><p>Click-through rate (CTR)</p></li><li><p>Conversion rate (purchases, bookings)</p></li><li><p>Revenue per session</p></li><li><p>Time to first relevant result</p></li><li><p>User satisfaction proxies (bounce rate, long dwell time, etc.)</p></li></ul><p>Standard approach:</p><ol><li><p>Randomly bucket users into control vs treatment.</p></li><li><p>Control uses baseline search; treatment uses new ranking or retrieval.</p></li><li><p>Run for enough time / users to get statistical power.</p></li><li><p>Check:</p><ul><li><p>Primary success metrics (e.g., +X% CTR).</p></li><li><p>Guardrail metrics (latency, errors, crash rate, etc.).</p></li></ul></li><li><p>Decide whether to ship, iterate, or roll back.</p></li></ol><h3><strong>9.3 Evaluation at scale</strong></h3><p>At scale, you need:</p><ul><li><p><strong>Aggregation pipelines</strong>: compute metrics daily/weekly on billions of log events.</p></li><li><p><strong>Monitoring dashboards</strong>: track relevance and business KPIs over time.</p></li><li><p><strong>Alerting</strong>: detect regressions (e.g., NDCG drop due to bug in feature pipeline).</p></li></ul><p>The modeling work (LTR, semantic search, personalization) is only useful if you can <strong>reliably measure and monitor</strong> impact.</p><div><hr></div><h2><strong>10. How NLP Ties Everything Together</strong></h2><p>NLP is the glue between the components:</p><ul><li><p><strong>Text embeddings</strong> (transformers) &#8594; semantic search &amp; LTR features.</p></li><li><p><strong>Classification</strong> &#8594; intent detection, spam detection, query routing.</p></li><li><p><strong>Sequence labeling</strong> &#8594; entity extraction (locations, product types, attributes).</p></li><li><p><strong>Generation</strong> &#8594; query rewriting, summarizing item descriptions, ontology induction.</p></li></ul><p>You don&#8217;t need to reinvent NLP; you can:</p><ul><li><p>Start from pre&#8209;trained models (e.g., BERT variants, sentence transformers).</p></li><li><p>Fine&#8209;tune for:</p><ul><li><p>Query intent classification.</p></li><li><p>Dual&#8209;encoder retrieval.</p></li><li><p>Sequence labeling for entities.</p></li><li><p>Text&#8209;to&#8209;JSON extraction for ontologies.</p></li></ul></li></ul><div><hr></div><h2><strong>11. Putting It All Together: A Minimal Search &amp; Ranking Stack</strong></h2><p>Here&#8217;s a summarized architecture combining everything:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s0tO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s0tO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 424w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 848w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 1272w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s0tO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png" width="302" height="1808.7173913043478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3306,&quot;width&quot;:552,&quot;resizeWidth&quot;:302,&quot;bytes&quot;:204283,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/180884257?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s0tO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 424w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 848w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 1272w, https://substackcdn.com/image/fetch/$s_!s0tO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a11a8a-7f9a-49d2-af2d-bc8de09e849e_552x3306.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the ecosystem where:</p><ul><li><p><strong>Learning&#8209;to&#8209;Rank</strong> is the central model.</p></li><li><p><strong>Transformers &amp; semantic search</strong> power understanding and retrieval.</p></li><li><p><strong>Two&#8209;Tower architectures</strong> make ANN retrieval scalable.</p></li><li><p><strong>NLP</strong> underpins query understanding and text features.</p></li><li><p><strong>Personalization</strong> injects user/context into the ranking.</p></li><li><p><strong>LLM&#8209;based ontologies</strong> structure the catalog and enrich features.</p></li><li><p><strong>Evaluation at scale</strong> ensures you&#8217;re improving, not just changing.</p></li></ul><div><hr></div><h2><strong>12. Where to Go Next</strong></h2><p>If you want to see all of this wired together in actual code, I have created a small end&#8209;to&#8209;end project you can run locally: <a href="https://github.com/msaharan/DSAIEngineering/tree/c7902c5a69cf16f73e3b577226c601e873f8a755/search_and_ranking/search_and_ranking_demo">github.com/msaharan/DSAIEngineering/search_and_ranking/search_and_ranking_demo</a> (It links to a specific commit so the code matches the version of the post. Later versions of the code will be available on the main branch.)</p><p>It&#8217;s a self&#8209;contained, CPU&#8209;friendly mini stack that implements most of the ideas from this post:</p><ul><li><p><strong>Flow:</strong> normalize/understand query &#8594; retrieve lexical + semantic candidates &#8594; personalize + featurize &#8594; train/eval LTR &#8594; apply business rules &#8594; display results.</p></li><li><p><strong>Retrievers:</strong> TF&#8209;IDF lexical; optional SentenceTransformer semantic; optional dual&#8209;encoder + ANN stub.</p></li><li><p><strong>Ranking:</strong> XGBRanker when available, else RandomForest; offline metrics (NDCG/MRR).</p></li><li><p><strong>Personalization:</strong> simple cuisine/price affinities and user&#8211;item bias.</p></li><li><p><strong>Rules:</strong> vegan boost + cuisine diversity; lightweight ontology&#8209;style enrichment for dietary/category/price hints.</p></li><li><p><strong>Data:</strong> tiny CSVs in <code>data/</code> so you can inspect and change everything.</p></li></ul><h3><strong>12.1 Run the demo</strong></h3><p>From the repo root:</p><pre><code> cd search_and_ranking/search_and_ranking_demo
 docker build -t search-ranking-demo .
 docker run --rm -it search-ranking-demo        # semantic + LTR pipeline
 # Lexical-only:
 # docker run --rm -it search-ranking-demo python run_demo.py
 # Dual-encoder + ANN:
 # docker run --rm -it search-ranking-demo python run_demo.py --semantic --dual</code></pre><p>The script will, end&#8209;to&#8209;end:</p><ol><li><p><strong>Query understanding &amp; intent classification</strong><br>Train a TF&#8209;IDF + logistic regression intent classifier on <code>data/query_intents.csv</code>, and add simple query normalization + synonym expansion.</p></li><li><p><strong>Lexical + semantic retrieval</strong><br>Build a TF&#8209;IDF lexical retriever, optionally add a SentenceTransformer semantic retriever, and (optionally) a tiny dual&#8209;encoder + ANN stub for semantic candidate generation.</p></li><li><p><strong>Personalization</strong><br>Construct simple user profiles (cuisine and price affinities + user&#8211;item bias) from <code>data/query_doc_labels.csv</code>, then inject those features into ranking.</p></li><li><p><strong>Learning&#8209;to&#8209;Rank</strong><br>Train a ranking model (XGBRanker if available, else RandomForest) on grouped query&#8211;item relevance labels and report offline metrics (NDCG, MRR) on held&#8209;out queries.</p></li><li><p><strong>Business rules &amp; ontology&#8209;style enrichment</strong><br>Apply lightweight rules such as vegan boosting and cuisine diversity, plus simple ontology&#8209;style enrichment (dietary hints, categories, price range) derived from catalog metadata.</p></li></ol><h3><strong>12.2 Suggested experiments</strong></h3><p>Once you have it running, here are concrete exercises that map back to sections of this post:</p><ol><li><p><strong>Lexical vs semantic search (Sections 3&#8211;4)</strong></p><ul><li><p>Run lexical&#8209;only, then <code>--semantic</code>.</p></li><li><p>Compare which documents surface for ambiguous or &#8220;fuzzy&#8221; queries.</p></li><li><p>Tweak TF&#8209;IDF and embedding models and see how NDCG/MRR change.</p></li></ul></li><li><p><strong>Play with query understanding (Section 3)</strong></p><ul><li><p>Add new intents and examples to <code>query_intents.csv</code>.</p></li><li><p>Extend the synonym/expansion logic for your own domain.</p></li><li><p>Observe how better query understanding changes candidate sets and ranking.</p></li></ul></li><li><p><strong>Modify personalization (Section 7)</strong></p><ul><li><p>Change how user cuisine/price affinities are computed.</p></li><li><p>Add new user&#8209;level features (e.g., &#8220;likes cheap &amp; fast&#8221; vs &#8220;likes premium &amp; slow&#8221;).</p></li><li><p>Watch how different user profiles get different rankings for the same query.</p></li></ul></li><li><p><strong>Extend the LTR feature set (Section 2)</strong></p><ul><li><p>Edit the ranking feature construction to add new signals (e.g., distance, freshness, popularity buckets).</p></li><li><p>Re&#8209;train the model and inspect which features matter most.</p></li></ul></li><li><p><strong>Experiment with ontology&#8209;style features (Section 8)</strong></p><ul><li><p>Enrich <code>catalog.csv</code> with more structured attributes (dietary tags, categories, price bands).</p></li><li><p>Use them in query understanding (e.g., detect &#8220;vegan&#8221;, &#8220;cheap&#8221;) and as ranking features.</p></li><li><p>If you have access to an LLM, try auto&#8209;generating these attributes for new items.</p></li></ul></li><li><p><strong>Change evaluation settings (Section 9)</strong></p><ul><li><p>Adjust how train/validation splits are done.</p></li><li><p>Compute NDCG/MRR at different K values and inspect failure cases manually.</p></li></ul></li></ol><p>Run the demo end-to-end once and you&#8217;ll have a working template you can adapt to your own search and recommendation projects.</p>]]></content:encoded></item><item><title><![CDATA[KServe Explained: A Practical Guide to Serving ML & GenAI on Kubernetes]]></title><description><![CDATA[From notebooks to production&#8212;how data scientists and ML engineers can use KServe to deploy, scale, and manage predictive and generative models in real&#8209;world systems.]]></description><link>https://newsletter.dsaiengineering.com/p/kserve-explained-a-practical-guide</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/kserve-explained-a-practical-guide</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 24 Nov 2025 16:48:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9Bq6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Bq6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Bq6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Bq6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101325,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dsaiengineering.com/i/179834107?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Bq6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!9Bq6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae5521d8-af78-4fa9-8226-0eeac3141411_960x540.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: https://kserve.github.io</figcaption></figure></div><p>KServe is an open&#8209;source system for serving machine&#8209;learning and generative AI models on Kubernetes.</p><p>If you&#8217;re a junior&#8211;mid&#8209;level data scientist or ML engineer, you&#8217;ve probably already hit the &#8220;OK, my model works in a notebook&#8230; now how do other people use it?&#8221; wall. KServe is one of the main answers to that question in Kubernetes&#8209;based environments.</p><p>This article walks through what KServe is, how the architecture in the image fits together, and what it looks like to use it in real life.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>1. The problem KServe solves</h2><p>Once a model is trained, teams usually need to:</p><ul><li><p>Turn it into an <strong>API</strong> for apps, dashboards, or other services</p></li><li><p><strong>Scale</strong> it up and down as traffic changes</p></li><li><p>Run it on different <strong>hardware</strong> (CPU, GPU, etc.)</p></li><li><p>Handle <strong>versioning</strong>, <strong>canary rollouts</strong>, and <strong>rollbacks</strong></p></li><li><p>Collect <strong>metrics</strong> and <strong>logs</strong> for debugging and monitoring</p></li></ul><p>You <em>can</em> hand&#8209;roll all this: build a Flask/FastAPI service, write Dockerfiles, create Kubernetes Deployments/Services/Ingresses, wire up autoscaling, etc. But it&#8217;s repetitive and easy to get wrong, especially when you have many models.</p><p>KServe&#8217;s goal is to give you a <strong>standard, Kubernetes&#8209;native way</strong> to deploy and manage models so you focus on model logic, not platform plumbing.</p><div><hr></div><h2>2. Reading the diagram: the stack from bottom to top</h2><p>The image shows a layered architecture. Let&#8217;s walk through it bottom&#8209;up.</p><h3>2.1 Hardware: GPU / CPU / APU</h3><p>At the very bottom you see:</p><ul><li><p><strong>NVIDIA &#8211; GPU</strong></p></li><li><p><strong>Intel &#8211; CPU</strong></p></li><li><p><strong>AMD &#8211; APU</strong></p></li></ul><p>These are your <strong>compute resources</strong>. KServe does not replace them; it just helps you use them efficiently:</p><ul><li><p>You can ask for <strong>CPUs only</strong> for lightweight models.</p></li><li><p>You can request <strong>GPUs</strong> when you serve LLMs or heavy deep&#8209;learning models.</p></li><li><p>Kubernetes schedules pods onto nodes that have the requested resources.</p></li></ul><p>As an ML engineer, this means you express what your model needs; you don&#8217;t hard&#8209;code where it runs.</p><div><hr></div><h3>2.2 Kubernetes layer</h3><p>Next is the <strong>Kubernetes</strong> block (EKS / AKS / GKE / on&#8209;prem, etc.).</p><p>Kubernetes provides the basics:</p><ul><li><p>Containers and pods</p></li><li><p>Networking between services</p></li><li><p>Autoscaling at the pod level</p></li><li><p>ConfigMaps, Secrets, and other configuration</p></li></ul><p>KServe is just another controller running <em>inside</em> your cluster. If you&#8217;re comfortable with <code>kubectl</code> and basic Kubernetes concepts (pods, deployments, services), you&#8217;re in good shape.</p><div><hr></div><h3>2.3 Knative + Istio layer</h3><p>Above Kubernetes, the diagram shows <strong>Knative + Istio</strong>.</p><p>KServe relies on these (or similar components) for:</p><ul><li><p><strong>Serverless behavior</strong> &#8211; scale to zero when idle, scale up when requests arrive</p></li><li><p><strong>Traffic management</strong> &#8211; split traffic between two model versions (e.g., 90% old, 10% new)</p></li><li><p><strong>Ingress and routing</strong> &#8211; getting HTTP requests into your cluster and to the right model</p></li><li><p><strong>mTLS and observability</strong> (with Istio or another service mesh)</p></li></ul><p>You don&#8217;t have to deeply understand Knative or Istio to use KServe day&#8209;to&#8209;day, but it helps to know they are the networking + serverless &#8220;engine&#8221; underneath.</p><div><hr></div><h3>2.4 KServe: Predictive &amp; Generative Model Inference</h3><p>This is the big blue layer in the image: <strong>&#8220;Predictive &amp; Generative Model Inference.&#8221;</strong><br>This is what we usually mean when we say &#8220;KServe.&#8221;</p><p>Here KServe provides:</p><ul><li><p>A <strong>standard way to define model endpoints</strong> using Kubernetes CRDs</p></li><li><p><strong>Built&#8209;in support for common ML frameworks</strong>, e.g.:</p><ul><li><p>Hugging Face (transformers and LLMs)</p></li><li><p>PyTorch</p></li><li><p>TensorFlow</p></li><li><p>scikit&#8209;learn</p></li><li><p>XGBoost / LightGBM</p></li></ul></li><li><p>Features for the full inference lifecycle (as labeled above the layer):</p><ul><li><p><strong>pre/post process</strong> &#8211; custom data transformations</p></li><li><p><strong>predict</strong> &#8211; standard inference</p></li><li><p><strong>generate</strong> &#8211; generative tasks (LLMs, image generation, etc.)</p></li><li><p><strong>inference graph</strong> &#8211; chaining multiple models/pipelines</p></li><li><p><strong>explain</strong> &#8211; model explanations</p></li><li><p><strong>monitor</strong> &#8211; metrics and logging hooks</p></li></ul></li></ul><p>Under the hood, KServe uses different model servers and runtimes such as:</p><ul><li><p>Triton Inference Server</p></li><li><p>TensorFlow Serving</p></li><li><p>TorchServe</p></li><li><p>Various runtimes that align with Open Inference or OpenAI&#8209;style APIs</p></li></ul><p>You configure which runtime to use in your model spec; KServe wires it into the platform.</p><div><hr></div><h3>2.5 Model storage</h3><p>On the right, the image shows <strong>&#8220;Model Storage&#8221;</strong> &#8211; a &#8220;bucket&#8221; icon.</p><p>Your models typically live in:</p><ul><li><p>S3, GCS, Azure Blob, or MinIO</p></li><li><p>A shared filesystem or volume</p></li></ul><p>Your KServe configuration points at a URI like <code>s3://my-bucket/models/resnet/</code>.</p><p>KServe then downloads and loads the model into the right inference server. This keeps model artifacts <strong>decoupled</strong> from compute so you can update or redeploy without baking models into images every time.</p><div><hr></div><h2>3. KServe&#8217;s key concepts (what you actually touch)</h2><p>From a data science / ML engineering perspective, these are the building blocks you&#8217;ll work with.</p><h3>3.1 <code>InferenceService</code></h3><p>The main concept is the <code>InferenceService</code>, a custom Kubernetes resource that represents one logical model endpoint.</p><p>Instead of defining a Deployment + Service + Ingress + Autoscaler, you define one <code>InferenceService</code> YAML, and KServe generates everything else.</p><p>A simplified example for a PyTorch model:</p><pre><code><code>apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sentiment-analyzer
spec:
  predictor:
    pytorch:
      storageUri: s3://ml-models/sentiment-analyzer/
      resources:
        requests:
          cpu: &#8220;1&#8221;
          memory: &#8220;2Gi&#8221;
        limits:
          nvidia.com/gpu: 1</code></code></pre><p>What this tells KServe:</p><ul><li><p>Use the <strong>PyTorch</strong> built&#8209;in runtime</p></li><li><p>Load the model from the given <strong>S3 path</strong></p></li><li><p>Request <strong>1 CPU</strong>, <strong>2Gi memory</strong>, and <strong>1 GPU</strong> per pod</p></li><li><p>Create an HTTP/gRPC endpoint for inference</p></li></ul><p>KServe then handles:</p><ul><li><p>Creating a Knative service</p></li><li><p>Deploying pods</p></li><li><p>Autoscaling</p></li><li><p>Exposing a stable endpoint</p></li></ul><div><hr></div><h3>3.2 Predictors, Transformers, Explainers</h3><p>Inside an <code>InferenceService</code> you can define three main components:</p><ol><li><p><strong>Predictor</strong> &#8211; required</p><ul><li><p>The actual model server (TensorFlow Serving, TorchServe, Triton, or a custom container).</p></li><li><p>Handles the core <code>predict</code> or <code>generate</code> logic.</p></li></ul></li><li><p><strong>Transformer</strong> &#8211; optional</p><ul><li><p>A separate container that runs <strong>before and/or after</strong> the predictor.</p></li><li><p>Use it for things like:</p><ul><li><p>Tokenization and embedding lookups</p></li><li><p>Image decoding and normalization</p></li><li><p>Business&#8209;specific response formatting</p></li></ul></li><li><p>It takes in the HTTP request, massages it, calls the predictor, then post&#8209;processes the response.</p></li></ul></li><li><p><strong>Explainer</strong> &#8211; optional</p><ul><li><p>Tied to explainability frameworks (e.g. Alibi Explain).</p></li><li><p>Exposes a separate <code>/explain</code> style endpoint for feature attributions, counterfactuals, etc.</p></li></ul></li></ol><p>As a DS/ML engineer, this separation lets you keep <strong>model code</strong> and <strong>data&#8209;wrangling logic</strong> nicely separated and reusable.</p><div><hr></div><h3>3.3 Inference graphs</h3><p>Sometimes you need more than &#8220;one request in, one model out&#8221;:</p><ul><li><p>Route requests to different model variants based on input</p></li><li><p>Run a pre&#8209;model (e.g., routing or classification) that decides which expert model to call</p></li><li><p>Chain an embedding model &#8594; vector search &#8594; ranking model</p></li></ul><p>KServe&#8217;s <strong>inference graph</strong> feature lets you define a small DAG (directed acyclic graph) of steps: each node can be a model, transformer, or external call. The graph itself is declared as config, which keeps your orchestration logic separate from model code.</p><p>For junior/mid&#8209;level folks, you can think of it as &#8220;Kubeflow Pipelines, but for online inference instead of batch workflows.&#8221;</p><div><hr></div><h3>3.4 Monitoring &amp; scaling</h3><p>KServe surfaces metrics such as:</p><ul><li><p>Request counts</p></li><li><p>Latencies</p></li><li><p>Error rates</p></li></ul><p>These hook into Prometheus/Grafana or whatever observability stack you use. On the scaling side, because KServe is built on Knative, it supports:</p><ul><li><p><strong>Scale to zero</strong> (no pods when idle)</p></li><li><p>Autoscaling based on <strong>requests per second</strong>, <strong>concurrency</strong>, etc.</p></li><li><p>Smooth <strong>rollouts</strong> and <strong>rollbacks</strong> using revisions and traffic splitting</p></li></ul><p>You can, for example, send 5% of traffic to a new model version and watch metrics before promoting it.</p><div><hr></div><h2>4. A typical workflow with KServe</h2><p>Here&#8217;s what life usually looks like when you bring a model from training to production using KServe.</p><ol><li><p><strong>Train and export the model</strong></p><ul><li><p>Example: train a binary text classifier in PyTorch and export a <code>model.pt</code> plus a small TorchServe handler.</p></li></ul></li><li><p><strong>Store the model artifact</strong></p><ul><li><p>Upload to <code>s3://ml-bucket/sentiment/v1/</code>.</p></li></ul></li><li><p><strong>Prepare an </strong><code>InferenceService</code><strong> spec</strong></p></li></ol><pre><code><code>apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sentiment-v1
spec:
  predictor:
    pytorch:
      storageUri: s3://ml-bucket/sentiment/v1/
      resources:
        requests:
          cpu: &#8220;1&#8221;
          memory: &#8220;2Gi&#8221;</code></code></pre><ol><li><p><strong>Apply it to the cluster</strong></p></li></ol><pre><code><code>kubectl apply -f sentiment-v1.yaml</code></code></pre><ol><li><p><strong>Wait until it&#8217;s ready</strong></p></li></ol><pre><code><code>kubectl get inferenceservices</code></code></pre><ol><li><p>Once the STATUS is <code>Ready</code>, KServe has set up the underlying service.</p></li><li><p><strong>Call the endpoint</strong></p><p>From a client (curl, Python, your app), send HTTP POST requests with the expected JSON shape. KServe routes them through the gateway &#8594; Knative &#8594; your predictor container.</p></li><li><p><strong>Update &amp; canary</strong></p><p>When you train <code>sentiment-v2</code>, point a new <code>InferenceService</code> (or new revision) at <code>s3://ml-bucket/sentiment/v2/</code>, then gradually shift traffic from v1 to v2.</p></li></ol><div><hr></div><h2>5. Why KServe is attractive for DS/ML engineers</h2><p>For junior to mid&#8209;level professionals, KServe hits a nice balance:</p><p><strong>Pros</strong></p><ul><li><p>You don&#8217;t need to build REST servers for each model; you reuse battle&#8209;tested runtimes.</p></li><li><p>Deployment is <strong>declarative</strong> &#8211; one YAML per model.</p></li><li><p>It handles <strong>autoscaling</strong>, <strong>networking</strong>, and <strong>versioning</strong> for you.</p></li><li><p>Works with a wide range of <strong>frameworks</strong> and <strong>model types</strong> (tabular, vision, NLP, LLMs).</p></li><li><p>Fits cleanly into <strong>MLOps</strong> workflows with CI/CD, GitOps, etc.</p></li></ul><p><strong>Trade&#8209;offs</strong></p><ul><li><p>You still need a functioning <strong>Kubernetes</strong> cluster and basic knowledge of it.</p></li><li><p>Knative/Istio add complexity; debugging networking issues can be non&#8209;trivial.</p></li><li><p>Serverless features like scale&#8209;to&#8209;zero introduce <strong>cold&#8209;start latency</strong> for some workloads.</p></li></ul><p>If your company already uses Kubernetes, leaning into KServe typically reduces the amount of custom serving infrastructure you have to maintain.</p><div><hr></div><h2>6. How to start learning KServe</h2><p>A practical learning path:</p><ol><li><p><strong>Get a small K8s cluster</strong></p><ul><li><p>Use Kind or Minikube locally, or a small managed cluster.</p></li></ul></li><li><p><strong>Install KServe</strong></p><ul><li><p>Follow the official quickstart for your environment.</p></li></ul></li><li><p><strong>Deploy a sample model</strong></p><ul><li><p>Use built&#8209;in examples (e.g., sklearn iris, or a simple Hugging Face model).</p></li><li><p>Call the endpoint from Python and inspect responses.</p></li></ul></li><li><p><strong>Add a transformer</strong></p><ul><li><p>Put simple pre&#8209;processing (e.g., text cleaning) into a transformer container and see how the request flow changes.</p></li></ul></li><li><p><strong>Experiment with versions &amp; traffic splitting</strong></p><ul><li><p>Deploy v1 and v2 of a model and gradually shift traffic.</p></li></ul></li></ol><p>By the time you&#8217;ve done those steps, the architecture in the image will feel much less abstract&#8212;you&#8217;ll see exactly how your <code>InferenceService</code> spec flows through KServe, Knative, and Kubernetes down to the hardware.</p><div><hr></div><h2>7. Recap</h2><ul><li><p>KServe is the <strong>model&#8209;inference layer</strong> on top of Kubernetes, built with Knative and often Istio.</p></li><li><p>It standardizes how you <strong>deploy, scale, and manage models</strong> (both predictive and generative).</p></li><li><p>You describe your model endpoint with an <code>InferenceService</code>: predictors, transformers, explainers, resources, and storage URI.</p></li><li><p>Under the hood, KServe wires together model servers, traffic management, autoscaling, and observability.</p></li></ul><p>For data scientists and ML engineers moving from experimentation into production, learning KServe gives you a powerful mental model&#8212;and a practical toolkit&#8212;for serving real models in real systems.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>