<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DSAIEngineering Newsletter]]></title><description><![CDATA[Currently: machine learning, tabular foundation models, PyTorch. Pace: 220±20 workouts-turned-posts per year.]]></description><link>https://newsletter.dsaiengineering.com</link><image><url>https://substackcdn.com/image/fetch/$s_!6Jbj!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f30e9b-f3f3-4f45-9508-cbbfbe476b81_500x500.png</url><title>DSAIEngineering Newsletter</title><link>https://newsletter.dsaiengineering.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 16 Jun 2026 21:26:48 GMT</lastBuildDate><atom:link href="https://newsletter.dsaiengineering.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mohit Saharan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dsaiengineering@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dsaiengineering@substack.com]]></itunes:email><itunes:name><![CDATA[Mohit Saharan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Mohit Saharan]]></itunes:author><googleplay:owner><![CDATA[dsaiengineering@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dsaiengineering@substack.com]]></googleplay:email><googleplay:author><![CDATA[Mohit Saharan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[[P31] Architecture of TabICLv2: quantile predictions for regression]]></title><description><![CDATA[How TabICLv2 models regression with 999 conditional quantiles, pinball loss, and distribution reconstruction.]]></description><link>https://newsletter.dsaiengineering.com/p/p31-architecture-of-tabiclv2-quantile-predictions-for-regression</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p31-architecture-of-tabiclv2-quantile-predictions-for-regression</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Sat, 06 Jun 2026 12:52:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6oeP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the last post in the miniseries on the architecture of TabICLv2. The previous post covered many-class classification: how TabICLv2 decomposes large label spaces on both the target-aware embedding side and the ICL output side while keeping the native small-class interface learned during pretraining. This post covers quantile predictions for regression: how TabICLv2 models a continuous target as 999 conditional quantiles.</p><p>The TabICLv2 regression head does not directly emit a single point estimate. In a dedicated regression checkpoint trained with pinball loss, it predicts 999 conditional quantiles at probability levels \(\mathcal{A}=\{0.001,0.002,\ldots,0.999\}\), forming a dense grid that approximates the conditional distribution of the target.</p><p>Like classification, regression keeps the same overall backbone structure: repeated feature grouping, target-aware embedding, and the column/row/ICL transformer blocks \(\text{TF}_\text{col}\), \(\text{TF}_\text{row}\), and \(\text{TF}_\text{icl}\); observed targets still enter twice. Classification embeds those targets as class IDs and emits class logits. Regression swaps the task-specific interfaces and loss: scalar linear target embedders replace class lookup tables, and the output MLP emits 999 raw quantiles per test row instead of logits. Its checkpoint also uses bias-free LayerNorm, whereas the classification checkpoint uses LayerNorm with bias.</p><p>This post explains what those quantiles represent, how pinball loss trains them, how inference turns the raw grid into a monotone predictive distribution, and how the same outputs support both fast point estimates and probabilistic predictions. It also maps the regression path in NanoTabICL through <code>max_classes=0</code> and <code>out_dim=999</code>, and notes what the compact model leaves outside the forward pass. The following figure shows the architecture of TabICLv2.</p><p>A short quiz at the end lets you check your understanding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6oeP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6oeP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6oeP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;abICLv2 architecture; this post covers the regression head (quantile outputs).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200884136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="abICLv2 architecture; this post covers the regression head (quantile outputs)." title="abICLv2 architecture; this post covers the regression head (quantile outputs)." srcset="https://substackcdn.com/image/fetch/$s_!6oeP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!6oeP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33cdc56-011f-4444-b1f2-9e487d3c0c63_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">abICLv2 architecture; this post covers the regression head (quantile outputs).</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Quantile predictions for regression</h2><p>Tabular foundation models adopt different strategies for regression. TabPFNv2 and TabPFN-2.5 model the predictive distribution by discretizing the target space into bins and applying cross-entropy loss. TabICLv2 instead uses a dedicated regression checkpoint that directly predicts quantiles. It retains the same overall backbone structure while changing the task-specific interfaces, loss, and LayerNorm configuration.</p><p>The subsections below build from quantile definitions and pinball loss to training and inference.</p><h3>What quantiles are</h3><p>To see what this regression head is learning, first recall what a quantile represents: the smallest value at which the cumulative probability reaches at least \(\alpha\). Start with the unconditional case: one target \(Y\), no features yet. Let \(Y\) be a real-valued target random variable with cumulative distribution function (CDF)</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;F_Y(q)=P(Y\\leq q).&quot;,&quot;id&quot;:&quot;MPHLJVPZUO&quot;}" data-component-name="LatexBlockToDOM"></div><p>For a probability level \(\alpha\in(0,1)\), define its quantile function as the generalized inverse</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(\\alpha)=\\inf\\{q\\in\\mathbb{R}:F_Y(q)\\geq\\alpha\\}.&quot;,&quot;id&quot;:&quot;KCRCAKDKGX&quot;}" data-component-name="LatexBlockToDOM"></div><p>In words, \(Q(\alpha)\) is the smallest target value whose cumulative probability reaches at least \(\alpha\). This definition remains valid when the CDF has jumps. When the CDF is continuous and strictly increasing, it reduces to the familiar inverse relation \(F_Y(Q(\alpha))=\alpha\). For example, \(Q(0.5)\) is the median and \(Q(0.9)\) is the 90th percentile.</p><p>Now make the distribution depend on the row. In supervised regression, the target distribution depends on the input row. To avoid overloading the table symbol \(X\), write \(Z\) for the random feature vector of a single row and \(x\) for a particular observed row. The conditional quantile function (CQF) is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q_\\alpha(x)=Q_x(\\alpha),\n\n\\qquad\n\nQ_x(\\alpha)=\\inf\\{q\\in\\mathbb{R}:F_{Y\\mid Z=x}(q)\\geq\\alpha\\}.&quot;,&quot;id&quot;:&quot;GGYCPDXUMX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(F_{Y\mid Z=x}(q)=P(Y\leq q\mid Z=x)\) is the conditional CDF at row \(x\). The probability level \(\alpha\) selects a location on that row&#8217;s conditional distribution. The two notations denote the same quantity, written two ways: \(Q_x(\alpha)\) treats the quantile as a function of \(\alpha\) (the inverse-CDF view, as with \(Q(\alpha)\) above), while \(q_\alpha(x)\) treats \(\alpha\) as a label on the \(\alpha\)-quantile as a function of row \(x\) (the prediction view, as with \(\hat{q}_\alpha(x)\) below). This post uses whichever notation reads more naturally in context.</p><p>Instead of asking the model for one conditional summary, TabICLv2 asks it for many summaries spread across the distribution. Specifically, it predicts 999 such quantiles at probability levels</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{A}=\\{0.001,0.002,\\ldots,0.999\\}.&quot;,&quot;id&quot;:&quot;VWPDSEURYA&quot;}" data-component-name="LatexBlockToDOM"></div><p>These 999 outputs form a dense grid of estimated points on \(Q_x(\alpha)\). They are not, by themselves, a full predictive distribution; the inference-time distribution wrapper constructs one by making the grid monotone, interpolating between its points, and extrapolating beyond its endpoints.</p><h3>Pinball loss</h3><p>Each output is trained with pinball loss, also called quantile loss or check loss. If the model predicts \(\hat{q}_\alpha(x)\) for level \(\alpha\) and the observed target is \(y\), define the residual</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u=y-\\hat{q}_\\alpha(x).&quot;,&quot;id&quot;:&quot;XTKSJHVULX&quot;}" data-component-name="LatexBlockToDOM"></div><p>The pinball loss is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_\\alpha(u)\n\n=\n\n\\begin{cases}\n\n\\alpha u, &amp; u\\geq 0,\\\\\n\n(\\alpha-1)u, &amp; u<0.\n\n\\end{cases}&quot;,&quot;id&quot;:&quot;JAWKEXRTBJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Equivalently,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_\\alpha(y-\\hat{q})\n\n=\n\n(\\alpha-\\mathbf{1}\\{y<\\hat{q}\\})(y-\\hat{q}).&quot;,&quot;id&quot;:&quot;VFKFHTZGRQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(\hat{q}\) is shorthand for \(\hat{q}_\alpha(x)\), and \(\mathbf{1}\{y&lt;\hat{q}\}\) is an indicator. Underprediction means \(y&gt;\hat{q}\), so the loss grows at rate \(\alpha\) as the miss increases. Overprediction means \(y&lt;\hat{q}\), so it grows at rate \(1-\alpha\). The result is a tilted absolute-value penalty.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P628!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P628!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 424w, https://substackcdn.com/image/fetch/$s_!P628!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 848w, https://substackcdn.com/image/fetch/$s_!P628!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 1272w, https://substackcdn.com/image/fetch/$s_!P628!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P628!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png" width="1050" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76082,&quot;alt&quot;:&quot;Pinball loss for \\(\\alpha=0.5 and \\alpha=0.9\\): asymmetric slopes penalize under- and over-prediction differently.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200884136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pinball loss for \(\alpha=0.5 and \alpha=0.9\): asymmetric slopes penalize under- and over-prediction differently." title="Pinball loss for \(\alpha=0.5 and \alpha=0.9\): asymmetric slopes penalize under- and over-prediction differently." srcset="https://substackcdn.com/image/fetch/$s_!P628!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 424w, https://substackcdn.com/image/fetch/$s_!P628!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 848w, https://substackcdn.com/image/fetch/$s_!P628!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 1272w, https://substackcdn.com/image/fetch/$s_!P628!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0734a52a-db97-45cc-a928-5c4a36d8384e_1050x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Pinball loss for \(\alpha=0.5 and \alpha=0.9\): asymmetric slopes penalize under- and over-prediction differently.</figcaption></figure></div><p>This asymmetry makes the loss target a specific quantile. For \(\alpha=0.5\),</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_{0.5}(u)=0.5|u|,&quot;,&quot;id&quot;:&quot;EZGVEMQHQA&quot;}" data-component-name="LatexBlockToDOM"></div><p>so minimizing the expected loss recovers a median. For \(\alpha=0.9\), an equally sized underprediction costs nine times as much as an overprediction. The optimum is therefore pushed upward until it represents the conditional 90th percentile.</p><h3>Training and constructing a distribution</h3><p>With pinball loss defined, the regression head trains one raw scalar \(\hat{q}_\alpha(x)\) per level in \(\mathcal{A}\). Each level has a separate output coordinate, but all levels share the backbone and output MLP hidden representation. For each example \((x,y)\), training averages pinball loss equally across all 999 levels:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{L}(x,y)\n\n=\n\n\\frac{1}{|\\mathcal{A}|}\\sum_{\\alpha\\in\\mathcal{A}}\n\n\\rho_\\alpha\\left(y-\\hat{q}_\\alpha(x)\\right).&quot;,&quot;id&quot;:&quot;ADAIOPWKBG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Predicting each level separately raises one practical issue: the architecture imposes no explicit monotonicity or cross-quantile constraint, and pretraining adds no auxiliary penalty for crossing quantiles. The raw outputs can therefore violate the ordering required of a valid quantile function:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q_x(\\alpha_1)\\leq Q_x(\\alpha_2)\\quad\\text{when }\\alpha_1<\\alpha_2.&quot;,&quot;id&quot;:&quot;ZXSCKBRIBP&quot;}" data-component-name="LatexBlockToDOM"></div><p>TabICLv2 resolves this issue while turning the grid points into a full predictive distribution at inference. First, it enforces monotonicity (default: sort; alternative: isotonic regression). Second, it piecewise-linearly interpolates between the corrected points. Third, because the grid stops at \(0.001\) and \(0.999\), it extrapolates parametric tails&#8212;exponential by default, GPD optional. The reconstructed distribution then exposes a PDF, CDF, inverse CDF (ICDF), and analytical moments across \(\mathbb{R}\).</p><h3>Prediction intervals and point estimates</h3><p>Two standard uses of the corrected quantile function are interval construction and point estimation.</p><p>Once the raw outputs have been turned into a monotone quantile function, prediction intervals are a direct use of it. Continuing to write \(\hat{q}_\alpha(x)\) for the corrected quantile value, a central \((1-\gamma)\) prediction interval&#8212;where \(\gamma\in(0,1)\) is the total tail probability outside the interval&#8212;is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\left[\\hat{q}_{\\gamma/2}(x),\\ \\hat{q}_{1-\\gamma/2}(x)\\right].&quot;,&quot;id&quot;:&quot;KKVWNHVPRX&quot;}" data-component-name="LatexBlockToDOM"></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J8CM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J8CM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 424w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 848w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 1272w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J8CM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png" width="1200" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21174,&quot;alt&quot;:&quot;90% central prediction interval from the 5th and 95th predicted quantiles.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200884136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="90% central prediction interval from the 5th and 95th predicted quantiles." title="90% central prediction interval from the 5th and 95th predicted quantiles." srcset="https://substackcdn.com/image/fetch/$s_!J8CM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 424w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 848w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 1272w, https://substackcdn.com/image/fetch/$s_!J8CM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed454cc-4477-482c-9e69-27ed0bef2d18_1200x330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">90% central prediction interval from the 5th and 95th predicted quantiles.</figcaption></figure></div><p>For example, a 90% interval uses \(\gamma=0.1\):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\left[\\hat{q}_{0.05}(x),\\ \\hat{q}_{0.95}(x)\\right].&quot;,&quot;id&quot;:&quot;LLSWNICARI&quot;}" data-component-name="LatexBlockToDOM"></div><p>If the interval has calibrated marginal coverage&#8212;that is, empirically, the fraction of held-out targets falling inside it is close to the nominal level (e.g. ~90%)&#8212;it should contain the true target approximately 90% of the time over repeated samples from the same data-generating process. This marginal coverage does not by itself guarantee conditional coverage for every input row or calibration of every individual quantile. It is an empirical calibration property of the predictions, not something guaranteed merely by using pinball loss or by sorting the quantiles.</p><p>Prediction intervals use specific quantile levels; point estimation uses the full grid. For point estimation, TabICLv2&#8217;s fast mean path averages the 999 predicted quantiles. This is motivated by the quantile-function identity</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}[Y\\mid Z=x]=\\int_0^1 Q_x(\\alpha)\\,d\\alpha,&quot;,&quot;id&quot;:&quot;PORPRLKKBC&quot;}" data-component-name="LatexBlockToDOM"></div><p>when the conditional expectation exists. With a dense, evenly spaced grid of quantiles, this integral can be approximated by a simple average. Because \(\mathcal{A}\) is an evenly spaced grid on \((0,1)\), the sum is a Riemann-sum approximation of the integral:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{\\mu}(x)\n\n\\approx\n\n\\frac{1}{|\\mathcal{A}|}\\sum_{\\alpha\\in\\mathcal{A}}\\hat{q}_\\alpha(x).&quot;,&quot;id&quot;:&quot;BUWVAFHNOU&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(\hat{\mu}(x)\) is the fast point prediction and \(|\mathcal{A}|=999\). The grid omits \(\alpha=0\) and \(\alpha=1\); the fast mean therefore uses only the \(0.001\)&#8211;\(0.999\) grid and ignores the parametric tails extrapolated beyond those endpoints (see the distribution-construction steps above).</p><p>The current implementation constructs the monotone distribution before taking this average. Default sorting only reorders values, so it preserves the average of the raw outputs. The current unweighted isotonic-regression alternative can change individual values, but its pooled averages preserve the total sum and therefore the overall average. Monotonicity correction matters for interpreting the outputs as a quantile function and for distribution operations, but neither current correction method changes the simple average.</p><p>The same 999 raw outputs therefore support two inference paths: a fast point estimate through averaging, and richer probabilistic predictions through a reconstructed monotone distribution.</p><h2>Implementation in NanoTabICL</h2><p>The subsections above describe full TabICLv2 regression inference. NanoTabICL exposes only the regression forward path through <code>max_classes=0</code> and <code>out_dim=999</code>. The following sections trace the target embedders and output head, explain target scaling, and identify the full TabICLv2 inference steps left outside the compact model.</p><p>These two NanoTabICL constructor arguments make the regression path visible:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def __init__(self, max_classes: int, out_dim: int, ...):
    # classification: max_classes = out_dim (= 10 typically)
    # regression: max_classes = 0, out_dim = n_quantiles</code></pre></div><p>The README combines these regression settings in its example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">model = NanoTabICLv2(
    max_classes=0,
    out_dim=999,
    embed_dim=96,
    col_num_blocks=2,
    row_num_blocks=2,
    icl_num_blocks=4,
    col_nhead=4,
    row_nhead=4,
    icl_nhead=4,
)
y_train = torch.randn(batch_size, n_train)
y_test_pred_quantiles = model(X_train_and_test, y_train)</code></pre></div><p>This example instantiates a randomly initialized model. Its 999 outputs acquire quantile meaning only after compatible pinball-loss pretraining or after loading compatible trained regression weights; NanoTabICL provides neither.</p><p>These arguments control different sides of the model:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pZxi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pZxi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 424w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 848w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 1272w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pZxi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png" width="1456" height="218" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/107033d8-716f-4932-814f-581827a48bf7_1652x247.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:218,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200884136?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pZxi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 424w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 848w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 1272w, https://substackcdn.com/image/fetch/$s_!pZxi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F107033d8-716f-4932-814f-581827a48bf7_1652x247.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>NanoTabICL exposes <code>out_dim</code> directly. The full TabICLv2 constructor instead exposes <code>num_quantiles</code> and internally sets <code>out_dim=num_quantiles</code> for regression.</p><p>One more checkpoint-compatibility difference matters: NanoTabICL uses LayerNorm with bias, matching the TabICLv2 classification checkpoint, while the full regression checkpoint uses LayerNorm without bias. The compact model therefore explains the regression architecture, but it is not a drop-in reimplementation of every regression-checkpoint detail.</p><h3>Regression target embeddings</h3><p>The task switch appears first in the two target embedders:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.y_embed_in = (
    ClassEmbedding(max_classes, embed_dim)
    if max_classes &gt; 0
    else nn.Linear(1, embed_dim)
)
self.y_embed_icl = (
    ClassEmbedding(max_classes, icl_dim)
    if max_classes &gt; 0
    else nn.Linear(1, icl_dim)
)</code></pre></div><p>For classification, <code>ClassEmbedding</code> treats each target as an integer class id. For regression, <code>nn.Linear(1, ...)</code> treats each target as one continuous scalar and projects it into the required token space:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">y_train scalar
    -&gt; nn.Linear(1, embed_dim)  for feature-token target-aware embedding
    -&gt; nn.Linear(1, icl_dim)    for row-token ICL embedding</code></pre></div><p>The first projection is added before column-wise processing:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb = self.x_embed(x)
emb[:, :n_train] += self.y_embed_in(y[:, :, None, None])</code></pre></div><p>Here <code>y</code> has shape <code>(batch, n_train)</code>. Adding two singleton dimensions gives <code>(batch, n_train, 1, 1)</code>. The linear layer transforms the final size-one dimension, producing <code>(batch, n_train, 1, embed_dim)</code>, which broadcasts across all grouped feature positions in each training row:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">feature-level target embedding:
(batch, n_train)
    -&gt; (batch, n_train, 1, 1)
    -&gt; (batch, n_train, 1, embed_dim)
    -&gt; broadcast across cols</code></pre></div><p>After row compression, the second projection is added before dataset-wise ICL:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb[:, :n_train] += self.y_embed_icl(y[:, :, None])</code></pre></div><p>At this point <code>emb</code> has shape <code>(batch, rows, icl_dim)</code>. The added singleton dimension lets <code>nn.Linear(1, icl_dim)</code> produce one row-level target vector for each labeled training row:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">row-level target embedding:
(batch, n_train)
    -&gt; (batch, n_train, 1)
    -&gt; (batch, n_train, icl_dim)</code></pre></div><p>Both additions select <code>emb[:, :n_train]</code>, so no target value is injected into test rows.</p><p>Because both embedders consume raw scalar targets, <code>y_train</code> must be standardized before the forward pass and predictions back-transformed afterward. NanoTabICL scales <code>X_train_and_test</code> internally using training rows, but it does not transform <code>y_train</code>; its feature scaling is also asymmetric because it divides by the training-row standard deviation without subtracting the training mean. The README warns:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"># warning: for regression, you need to standardize y yourself
# (and backtransform the output)</code></pre></div><p>This matters because the same standardized target values are passed into both regression target embedders, and the 999 outputs are produced on that standardized scale. Quantiles are equivariant under positive affine maps: if \(Y&#8217; = aY + b\) with \(a&gt;0\), then \(Q_{Y&#8217;}(\alpha) = a\,Q_Y(\alpha) + b\). Standardizing and back-transforming therefore preserves each output&#8217;s probability-level meaning.</p><p>A minimal per-table target transformation would look like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">y_mean = y_train.mean(dim=1, keepdim=True)
y_std = y_train.std(dim=1, unbiased=False, keepdim=True).clamp_min(1e-8)

y_train_scaled = (y_train - y_mean) / y_std
q_scaled = model(X_train_and_test, y_train_scaled)
q_original = q_scaled * y_std[:, :, None] + y_mean[:, :, None]</code></pre></div><p>Every predicted quantile uses the same inverse affine transformation shown above. Target standardization is entirely your responsibility.</p><h3>From test-row states to raw quantiles</h3><p>The final ICL block uses labeled training rows as keys and values, while computing outputs only for test-row queries:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb = self.icl_blocks[-1](emb[:, n_train:], emb[:, :n_train])</code></pre></div><p>Its output has shape <code>(batch, n_test, icl_dim)</code>. The output head then maps each test-row state to <code>out_dim</code> values:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.out_mlp = get_mlp(icl_dim, icl_dim * 2, out_dim)
return self.out_mlp(self.out_ln(emb))</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">after final ICL block: (batch, n_test, icl_dim)
after output MLP:      (batch, n_test, out_dim)
with out_dim=999:      (batch, n_test, 999)</code></pre></div><p>There is no softmax, sorting operation, or monotonicity constraint in this output path. The architecture itself also does not attach \(\alpha\) values to the 999 output positions. Their interpretation as the indexed grid</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\{0.001,0.002,\\ldots,0.999\\}&quot;,&quot;id&quot;:&quot;YXAENZRXPL&quot;}" data-component-name="LatexBlockToDOM"></div><p>comes from training each position against its corresponding pinball-loss level. NanoTabICL provides the architecture and raw forward-pass outputs, but it does not provide that pretraining loop.</p><h3>What NanoTabICL leaves outside the model</h3><p>NanoTabICL returns the raw tensor <code>(batch, n_test, 999)</code> directly. To make those outputs meaningful quantile predictions, the model must first be trained with the corresponding pinball-loss levels or supplied compatible trained weights. Distribution construction and prediction intervals are then implemented downstream of the tensor. Beyond the forward pass described above, NanoTabICL does not include:</p><ul><li><p>pinball-loss pretraining;</p></li><li><p>the mapping from output positions to quantile levels as a model object;</p></li><li><p>monotonicity correction for crossing quantiles;</p></li><li><p>interpolation, tail extrapolation, or distribution statistics;</p></li><li><p>a scikit-learn prediction interface.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p31-architecture-of-tabiclv2-quantile-predictions-for-regression?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p31-architecture-of-tabiclv2-quantile-predictions-for-regression?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Summary</h2><p>TabICLv2 handles regression through a dedicated regression checkpoint that predicts 999 conditional quantiles rather than a single point or a discretized target distribution. Each output level is trained with pinball loss, which targets the corresponding conditional quantile through its asymmetric penalty on under- and over-prediction. The raw 999 scalars are not, by themselves, a valid predictive distribution; at inference time, TabICLv2 sorts or otherwise corrects crossing quantiles, interpolates between grid points, and extrapolates the tails to build a full distribution wrapper.</p><p>The same quantile grid supports two inference paths. A fast point estimate averages the 999 predicted levels, approximating the conditional mean through the quantile-function identity. Richer probabilistic outputs come from the reconstructed distribution: central \((1-\gamma)\) prediction intervals read directly from symmetric quantile pairs, where \(\gamma\) is the total tail probability outside the interval, while PDF, CDF, and moment calculations use the interpolated body and parametric tails. Nominal interval coverage is an empirical calibration property, not something guaranteed by pinball loss alone.</p><p>Regression reuses the same overall compression-then-ICL backbone structure as classification, while changing the task-specific interfaces, loss, and LayerNorm configuration. Observed training targets are embedded as continuous scalars through linear maps at both the feature-token and row-token stages, and the output head emits <code>out_dim=999</code> raw values per test row with no softmax or built-in monotonicity constraint. NanoTabICL makes this path visible through <code>max_classes=0</code> and <code>out_dim=999</code>, but users must standardize <code>y</code> themselves, train the model with the corresponding pinball-loss levels or supply compatible trained weights, and apply the full TabICLv2 inference pipeline downstream of its raw <code>(batch, n_test, 999)</code> output; it leaves pinball-loss pretraining, monotonicity correction, distribution construction, and scikit-learn wrappers outside the model.</p><p>This regression path completes the six-part walkthrough of TabICLv2&#8217;s architecture: repeated feature grouping, target-aware embedding, compression-then-ICL, QASSMax, many-class classification, and quantile regression. Together, the posts trace one table from grouped feature tokens through row-level in-context learning to either class probabilities or a full predictive distribution over a continuous target.</p><h2>Quiz</h2><p>Take the quiz below to test your understanding, and share your answers and doubts in the comments. The questions get progressively harder from 1 to 10.</p><ol><li><p>What does the quantile grid \(\mathcal{A}=\{0.001,0.002,\ldots,0.999\}\) represent?</p></li><li><p>What is pinball loss, and how does its asymmetry target a specific quantile?</p></li><li><p>Why are the raw 999 model outputs not, by themselves, a full predictive distribution?</p></li><li><p>How is pinball loss applied during training for the 999 quantile outputs?</p></li><li><p>Why can the raw 999 quantile outputs cross, and how does TabICLv2 correct them at inference time?</p></li><li><p>After monotonicity correction, how does TabICLv2 construct a full predictive distribution from the corrected quantile grid?</p></li><li><p>How is a central \((1-\gamma)\) prediction interval read from the corrected quantile function, and does pinball loss alone guarantee that a nominal 90% interval covers 90% of held-out targets?</p></li><li><p>How does TabICLv2&#8217;s fast point-estimate path approximate the conditional mean, and why do the current monotonicity-correction methods preserve that mean?</p></li><li><p>How does TabICLv2&#8217;s regression strategy differ from TabPFNv2 and TabPFN-2.5, and which NanoTabICL arguments activate its regression path?</p></li><li><p>Suppose you only need a fast point estimate and never use prediction intervals, PDF, CDF, or moments. Which inference steps could you skip, and what would you lose?</p></li></ol>]]></content:encoded></item><item><title><![CDATA[[P30] Architecture of TabICLv2: many-class classification]]></title><description><![CDATA[How TabICLv2 handles classification tasks with more than 10 classes by decomposing labels on the embedding side and predictions on the output side.]]></description><link>https://newsletter.dsaiengineering.com/p/p30-architecture-of-tabiclv2-many-class-classification</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p30-architecture-of-tabiclv2-many-class-classification</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Sat, 06 Jun 2026 09:27:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iwuq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The previous post covered query-aware scalable softmax, which lets TabICLv2 scale to more context rows. This post examines a different scaling problem: classification with more classes than the model saw during pretraining. TabICLv2&#8217;s usual classifier checkpoint supports at most <code>max_classes=10</code>, while real tabular targets can contain dozens or hundreds of product categories, diagnosis codes, customer segments, or other labels.</p><p>Supporting more than \(10\) classes is not simply a matter of widening the output head. Observed training labels enter TabICLv2 twice: first through target-aware embedding before \(\text{TF}_\text{col}\), and again at the ICL stage before \(\text{TF}_\text{icl}\). The native model expects small class ids at both points and produces only \(10\) logits per prediction.</p><p>TabICLv2 preserves that pretrained interface by decomposing the large label space on both sides of the architecture. Mixed-radix ensembling converts each training label into several small-label views for target-aware column embedding. Hierarchical classification composes several node-local predictions, each with at most \(10\) choices, into probabilities over the original classes. This post explains both mechanisms, how they work together, and the boundary between the full TabICLv2 implementation and NanoTabICL&#8217;s native small-class model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iwuq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iwuq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iwuq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;TabICLv2 architecture; many-class classification affects target-aware embedding and in-context learning.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200868100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TabICLv2 architecture; many-class classification affects target-aware embedding and in-context learning." title="TabICLv2 architecture; many-class classification affects target-aware embedding and in-context learning." srcset="https://substackcdn.com/image/fetch/$s_!iwuq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!iwuq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F995a63d3-bb5a-4cbd-98f6-bfaac7ad5af6_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TabICLv2 architecture; many-class classification affects target-aware embedding and in-context learning.</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Many-class classification</h2><p>The two mechanisms introduced above operate only on observed training labels; test labels remain the unknown values to be predicted. Throughout this post, class labels are assumed to have been encoded as contiguous integers from \(0\) to \(C-1\). Both many-class mechanisms rely on this assumption: the mixed-radix implementation infers the class count as <code>y_train.max() + 1</code>, and hierarchical decoding uses original class labels as output-column indices.</p><p>The discussion starts with hierarchical classification because the output bottleneck is the easiest one to see. Mixed-radix ensembling then solves the analogous problem on the input-label side.</p><h3>The output bottleneck: more than 10 classes</h3><p>TabICLv2 is pretrained on classification tasks with at most 10 classes. Let \(C\) be the number of downstream classes, \(x\) be the test row representation being classified, and \(y\in\{0,\ldots,C-1\}\) be the true class label. A direct \(C\)-class classifier would assign probabilities with a \(C\)-way softmax:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y=c\\mid x)=\\frac{\\exp(s_c(x))}{\\sum_{r=0}^{C-1}\\exp(s_r(x))},&quot;,&quot;id&quot;:&quot;TBURQDTDLD&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(s_c(x)\) is the score, or logit, for class \(c\). This is natural when \(C\leq 10\), but it no longer matches the interface the model was trained to use when \(C\) is much larger.</p><p>TabICLv2&#8217;s solution is to avoid that direct \(C\)-way decision. Instead of training a new large head, it repeatedly asks the native classifier to solve decisions with at most \(10\) choices.</p><h3>Hierarchical classification</h3><p>Let the full class set be</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{Y}=\\{0,1,\\ldots,C-1\\}.&quot;,&quot;id&quot;:&quot;UBDDQLIWET&quot;}" data-component-name="LatexBlockToDOM"></div><p>A hierarchy starts by partitioning \(\mathcal{Y}\) into disjoint groups:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{Y}=\\mathcal{G}_0\\cup\\mathcal{G}_1\\cup\\cdots\\cup\\mathcal{G}_{K-1},\n\n\\qquad\n\n\\mathcal{G}_a\\cap\\mathcal{G}_b=\\varnothing \\quad(a\\ne b),&quot;,&quot;id&quot;:&quot;IGUEIKHVZU&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(K\leq 10\). The root classifier predicts which group contains the true class. If a group still contains more than \(10\) original classes, that group is partitioned again. Repeating this process creates a tree whose internal nodes each have at most \(10\) children. When a node contains at most \(10\) original classes, the native classifier can predict directly among those classes.</p><p>Each original class is then identified by a path through the tree: group choices at internal nodes, followed by a final class choice inside a small leaf node. For class \(c\), write \(v_t(c)\) for the node visited at depth \(t\), \(b_t(c)\) for the local branch or local class choice made there, and \(L(c)\) for the number of local decisions needed to identify \(c\). The decision path is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi(c)=\\left((v_0(c),b_0(c)),\\ldots,(v_{L(c)-1}(c),b_{L(c)-1}(c))\\right).&quot;,&quot;id&quot;:&quot;OFMLWARMBR&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each local decision stays inside the pretrained class budget:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;b_t(c)\\in\\{0,\\ldots,K_t(c)-1\\},\n\n\\qquad K_t(c)\\leq 10,&quot;,&quot;id&quot;:&quot;URRYHIHNXA&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(K_t(c)\) is the number of available choices at the node reached by class \(c\) at step \(t\). The model never has to solve a \(C\)-way decision directly. It solves several decisions with at most \(10\) choices whose combination identifies one original class.</p><p>Let \(\mathcal{D}\) denote the complete labeled context dataset and \(\mathcal{D}_v\) the subset assigned to node \(v\). At depth \(t\), write the node-local probability as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p_{v_t(c)}\\left(b_t(c)\\mid x,\\mathcal{D}_{v_t(c)}\\right).&quot;,&quot;id&quot;:&quot;RIBXPWKNMX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Multiplying the node-local probabilities along the path for class \(c\) gives</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y=c\\mid x,\\mathcal{D})\n\n=\n\n\\prod_{t=0}^{L(c)-1}\n\np_{v_t(c)}\\left(b_t(c)\\mid x,\\mathcal{D}_{v_t(c)}\\right).&quot;,&quot;id&quot;:&quot;RVTWQYONHS&quot;}" data-component-name="LatexBlockToDOM"></div><p>So the probability of an original class is the product of the local probabilities along its path. TabICLv2 therefore replaces a single \(C\)-way softmax with a composition of native predictions, each using at most \(10\) labels and logits.</p><h3>Building the tree in TabICLv2</h3><p>TabICLv2 builds the hierarchy from the sorted observed class labels. For the usual <code>max_classes=10</code> classifier checkpoint, if a node contains \(N\) classes and \(N&gt;10\), the number of child groups is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;K=\\min\\left(\\left\\lceil\\frac{N}{10}\\right\\rceil,10\\right),&quot;,&quot;id&quot;:&quot;GEMXRSCGWR&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(N\) is the number of classes at the current node. The \(N\) classes are split into \(K\) nearly equal contiguous groups. Any child group that still contains more than \(10\) classes is split again. This keeps every local classifier within the model&#8217;s native class capacity while keeping the tree reasonably balanced.</p><p>This is a computational hierarchy over contiguous ranges of encoded class ids, not a learned or domain-defined taxonomy. Nearby encoded ids need not be semantically related. When a broader prediction ensemble includes class-id shuffling, different members can also group the original labels differently.</p><p>For example, with \(C=57\), the root node uses</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;K=\\min(\\lceil57/10\\rceil,10)=6&quot;,&quot;id&quot;:&quot;JDKSVIDRJP&quot;}" data-component-name="LatexBlockToDOM"></div><p>groups, with sizes close to \(57/6\). The first three groups contain 10 classes each, and the last three groups contain 9 classes each. Because every group already has at most \(10\) classes, the tree has two prediction levels: the root predicts among six groups, and the child classifiers predict directly among their 9 or 10 original classes. For a larger \(C\), some root groups would still contain more than \(10\) classes, so those groups would be split recursively.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gmd2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gmd2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 424w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 848w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gmd2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png" width="1456" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89110,&quot;alt&quot;:&quot;Hierarchy for \\(C=57\\): one root split into six contiguous groups, followed by direct leaf-level ICL classification.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200868100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hierarchy for \(C=57\): one root split into six contiguous groups, followed by direct leaf-level ICL classification." title="Hierarchy for \(C=57\): one root split into six contiguous groups, followed by direct leaf-level ICL classification." srcset="https://substackcdn.com/image/fetch/$s_!Gmd2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 424w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 848w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!Gmd2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafdbc598-4666-45f6-878c-2d1e4a07291e_1785x1035.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hierarchy for \(C=57\): one root split into six contiguous groups, followed by direct leaf-level ICL classification.</figcaption></figure></div><h3>Inference with the native ICL predictor</h3><p>At inference time, TabICLv2 applies the hierarchy by recursively calling the native small-class ICL predictor. The hierarchy is not a new \(C\)-class output head; it is a wrapper around the pretrained predictor. The implementation does not pass previous branch choices into one autoregressive decoder. Instead, each tree node \(v\) makes a fresh native ICL prediction using its assigned training subset \(\mathcal{D}_v\). For each candidate class, its predefined path identifies the node-specific contexts whose probabilities contribute to its score, and the implementation recursively evaluates every child rather than selecting only one branch at runtime.</p><p>Operationally, the wrapper performs the following steps:</p><ol><li><p>Partition the class set at each node into at most \(10\) groups.</p></li><li><p>At an internal node, select that node&#8217;s training-row subset, relabel those rows by their child-group index, and run the native ICL classifier on the test row to obtain group probabilities.</p></li><li><p>At a leaf node, select its training-row subset, relabel its original classes to contiguous local ids, and run the native classifier directly among those classes.</p></li><li><p>Score every valid original class by multiplying the probabilities along its path.</p></li><li><p>Take the argmax if a hard class prediction is needed.</p></li></ol><p>The key detail is that this is not greedy decoding. TabICLv2 does not choose one group at the root and discard the rest. It recursively scores child nodes and combines probabilities, so every valid class receives a probability.</p><h3>Picking the predicted class</h3><p>For every valid class \(c&lt;C\), the path score is the product of the internal group probabilities and the final local class probability along \(\pi(c)\). Mathematically, the same argmax can be computed in log space:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{y}\n\n=\n\n\\arg\\max_{0\\leq c<C}\n\n\\sum_{t=0}^{L(c)-1}\n\n\\log p_{v_t(c)}\\left(b_t(c)\\mid x,\\mathcal{D}_{v_t(c)}\\right).&quot;,&quot;id&quot;:&quot;GYZGCVWKHD&quot;}" data-component-name="LatexBlockToDOM"></div><p>The current implementation performs the recursive probability multiplications directly. The composed probabilities are sufficient for prediction, but callers may still request a logits-shaped output. In that case, the implementation returns derived logits, rather than raw decoder logits, by converting each final composed probability \(p\) as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\ell=\\tau\\log(p+\\epsilon),&quot;,&quot;id&quot;:&quot;NVPRAIIWZN&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\tau\) is the softmax temperature and the implementation uses \(\epsilon=10^{-6}\).</p><p>That completes the output-side story: after row representations are built, the model can score every original class by composing native small-class predictions. The other bottleneck happens earlier in the pipeline, before \(\text{TF}_\text{col}\), where labeled context rows still need target-aware embeddings.</p><h3>Mixed-radix ensembling</h3><p>Hierarchical classification fixes prediction, but it cannot by itself explain how the model processes labeled context rows with many-class targets. Before the model reaches \(\text{TF}_\text{icl}\), those rows have already passed through target-aware embedding and \(\text{TF}_\text{col}\). If \(C&gt;10\), the raw class id is too large for the native target-aware embedding interface. Mixed-radix ensembling (MRE) fixes this input-side problem by turning each large label into several small-label views, running \(\text{TF}_\text{col}\) once per view, and averaging the resulting representations. Hierarchical relabeling then keeps the later ICL-side label embedding within <code>max_classes</code> during recursive local predictions.</p><p>The mixed-radix construction begins by representing one large class id as several small digits. The implementation chooses the smallest possible number of views,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;D=\\left\\lceil\\frac{\\log C}{\\log 10}\\right\\rceil,&quot;,&quot;id&quot;:&quot;OVUXVAALZK&quot;}" data-component-name="LatexBlockToDOM"></div><p>then computes a balanced initial base</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;k=\\min\\left(\\left\\lceil C^{1/D}\\right\\rceil,10\\right).&quot;,&quot;id&quot;:&quot;ZLMJYHHKWG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Starting from the balanced list \([k,\ldots,k]\), the implementation returns \(D\) positive-integer bases, also called radices,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;[k_0,k_1,\\ldots,k_{D-1}]&quot;,&quot;id&quot;:&quot;WAIUENUMYA&quot;}" data-component-name="LatexBlockToDOM"></div><p>such that</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;k_i\\leq 10,\n\n\\qquad\n\n\\prod_{i=0}^{D-1}k_i\\geq C.&quot;,&quot;id&quot;:&quot;PCUAMGBLSB&quot;}" data-component-name="LatexBlockToDOM"></div><p>The product condition ensures that there are enough digit combinations to represent all \(C\) classes, while the per-base upper bound keeps every digit within the native class capacity. A base of 1 would add no information, so the useful selected bases are nontrivial. For example, with \(C=25\), the implementation selects the balanced bases \([5,5]\), rather than another valid but less balanced choice such as \([10,3]\).</p><p>For a class label \(y\in\{0,\ldots,C-1\}\), define positional weights</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;w_i=\\prod_{j=i+1}^{D-1}k_j,\n\n\\qquad i=0,\\ldots,D-1,&quot;,&quot;id&quot;:&quot;TKVPVKPBNO&quot;}" data-component-name="LatexBlockToDOM"></div><p>with the convention that an empty product is \(1\), so \(w_{D-1}=1\). Here \(w_i\) is the place value of digit \(i\). The mixed-radix digits are</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y^{(i)}=\\left\\lfloor\\frac{y}{w_i}\\right\\rfloor \\bmod k_i,\n\n\\qquad i=0,\\ldots,D-1.&quot;,&quot;id&quot;:&quot;VNUEQWJFJZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each digit stays within a small class range:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y^{(i)}\\in\\{0,\\ldots,k_i-1\\},&quot;,&quot;id&quot;:&quot;PJWPPHNMCF&quot;}" data-component-name="LatexBlockToDOM"></div><p>so every digit is compatible with the 10-class pretraining regime. For represented labels, the original class id can be reconstructed from its digits:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y=\\sum_{i=0}^{D-1}y^{(i)}w_i,&quot;,&quot;id&quot;:&quot;MWNCEWIPUT&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(y&lt;C\). If \(\prod_i k_i&gt;C\), some digit combinations do not correspond to real downstream classes. Those combinations are simply unused.</p><p>For example, suppose \(C=57\), the same class count used in the hierarchy example. The two mechanisms decompose those labels differently: hierarchy splits the labels into contiguous ranges for output prediction, while mixed radix splits each class id into digits for input embedding.</p><p>For MRE, the implementation first minimizes \(D\). Two views are sufficient, and \(\lceil\sqrt{57}\rceil=8\), so the balanced bases selected are \([8,8]\), with \(8\cdot8=64\geq57\). The bases \([10,6]\) would also satisfy the capacity constraints because \(10\cdot6=60\geq57\), but they are less balanced. With \([8,8]\), digit \(y^{(0)}\) is the high place and \(y^{(1)}\) is the low place:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y^{(0)}=\\left\\lfloor\\frac{y}{8}\\right\\rfloor \\bmod 8,\n\n\\qquad\n\ny^{(1)}=y\\bmod 8.&quot;,&quot;id&quot;:&quot;DARJHQDBFD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Class \(y=42\) becomes \((5,2)\), because \(42=5\cdot8+2\). Class \(y=56\) becomes \((7,0)\). The combinations \((7,1)\) through \((7,7)\) would represent 57 through 63, so they are unused when the true class set has only 57 classes.</p><p>In TabICLv2, these digits provide several small-label views of the original class. Instead of embedding the large class id \(y\) directly, the model embeds one digit \(y^{(i)}\) at a time. Operationally, TabICLv2 creates one labeled-context view per digit, runs \(\text{TF}_\text{col}\) once per view, and averages the resulting representations.</p><p>Let \(E_1[r,j]\in\mathbb{R}^d\) denote the \(d\)-dimensional feature-group representation for row \(r\) and grouped feature position \(j\) before target-aware embedding. For mixed-radix digit view \(i\), define the masked target vector</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u_r^{(i)}\n\n=\n\n\\begin{cases}\n\n\\operatorname{Embed}_\\text{TAE}\\left(y_r^{(i)}\\right), &amp; r\\in\\mathcal{I}_\\text{train},\\\\\n\n\\mathbf{0}_d, &amp; r\\in\\mathcal{I}_\\text{test}.\n\n\\end{cases}&quot;,&quot;id&quot;:&quot;HWHSAUVXBW&quot;}" data-component-name="LatexBlockToDOM"></div><p>The same learned target-aware encoder is reused for every digit view. When writing \(E_1+u^{(i)}\) below, \(u_r^{(i)}\) is broadcast across every grouped feature position \(j\) in row \(r\), so the addition means \(E_1[r,j]+u_r^{(i)}\). A simplified view of the averaged representation is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;O_\\text{avg}\n\n=\\frac{1}{D}\\sum_{i=0}^{D-1}\n\n\\text{TF}_\\text{col}\\left(E_1+u^{(i)}\\right).&quot;,&quot;id&quot;:&quot;ZAKKXIGXRY&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(O_\text{avg}\) is the averaged column-transformer representation across the \(D\) digit views. This is MRE. It exposes information about a large class label through several small-label views, each compatible with the pretrained target-aware embedding interface.</p><p>The important boundary is that MRE is not the final many-class decoder. It prepares representations by making context labels embeddable. Hierarchical classification then handles the final prediction over the original \(C\) classes.</p><h3>Operational implication</h3><p>Both mechanisms require repeated use of the native model: MRE uses multiple label views, and hierarchical classification uses node-specific training subsets. Consequently, the current full implementation does not support KV caching for many-class classification because these changing inputs are incompatible with the available caching path.</p><p>With the full many-class orchestration established, NanoTabICL provides a concrete view of the native small-class interface that both mechanisms reuse.</p><h2>Implementation in NanoTabICL</h2><p>NanoTabICL is also a useful boundary marker: it exposes that native interface, but it stops before the many-class orchestration. In particular, the compact repository does not include:</p><ul><li><p>the recursive hierarchical classification wrapper;</p></li><li><p>mixed-radix digit construction;</p></li><li><p>multiple \(\text{TF}_\text{col}\) passes over digit views;</p></li><li><p>path-probability decoding for \(C&gt;10\).</p></li></ul><p>The standard README classification example shows this native interface directly:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">model = NanoTabICLv2(max_classes=10, out_dim=10)
X_train_and_test = torch.randn(batch_size, n_train+n_test, n_cols)
y_train = torch.randint(10, size=(batch_size, n_train)).float()
y_test_pred_logits = model(X_train_and_test, y_train)</code></pre></div><p>The two NanoTabICL constructor arguments that matter for this post are <code>max_classes</code> and <code>out_dim</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def __init__(self, max_classes: int, out_dim: int, ...):
    # classification: max_classes = out_dim (= 10 typically)</code></pre></div><p>NanoTabICL exposes these as independent constructor arguments, and they control different sides of the compact model:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!39tQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!39tQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 424w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 848w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 1272w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!39tQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png" width="1456" height="215" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:215,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200868100?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!39tQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 424w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 848w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 1272w, https://substackcdn.com/image/fetch/$s_!39tQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c501775-86cc-4040-b562-8f4a5e18bcb3_1769x261.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In the full TabICLv2 classification constructor, <code>out_dim</code> is not independently exposed: it is set internally to <code>max_classes</code>. NanoTabICL&#8217;s explicit separation still helps show the two sides of the native interface.</p><p>The two target embedding tables are initialized in <code>__init__</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.y_embed_in = (
    ClassEmbedding(max_classes, embed_dim)
    if max_classes &gt; 0
    else nn.Linear(1, embed_dim)
)
self.y_embed_icl = (
    ClassEmbedding(max_classes, icl_dim)
    if max_classes &gt; 0
    else nn.Linear(1, icl_dim)
)</code></pre></div><p>For classification, <code>max_classes &gt; 0</code>, so both are <code>ClassEmbedding</code> layers. The first table injects labels into feature tokens before \(\text{TF}_\text{col}\). This is the native interface that MRE repeatedly uses with different digit views:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb = self.x_embed(x)
emb[:, :n_train] += self.y_embed_in(y[:, :, None, None])</code></pre></div><p>At this point <code>emb</code> has shape <code>(batch, rows, cols, embed_dim)</code>. The slice <code>emb[:, :n_train]</code> selects only labeled training rows, and <code>y[:, :, None, None]</code> gives the target embedder singleton axes so the resulting label vector can broadcast across all feature positions in each training row. Test rows are not touched.</p><p>The second table injects labels again after row compression, just before dataset-wise ICL. Hierarchical classification reuses this embedding and the later output head for each node-local prediction:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb[:, :n_train] += self.y_embed_icl(y[:, :, None])
for block in self.icl_blocks[:-1]:
    emb = block(emb, kv_max_idx=n_train)
emb = self.icl_blocks[-1](emb[:, n_train:], emb[:, :n_train])</code></pre></div><p>Now <code>emb</code> has shape <code>(batch, rows, icl_dim)</code>, so <code>self.y_embed_icl(...)</code> returns one row-level label vector per training row. In the loop, <code>kv_max_idx=n_train</code> restricts keys and values to training rows, preventing test rows from being used as labeled context. The ICL blocks use the training rows as labeled context, and the final block computes outputs only for test rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">queries:      emb[:, n_train:]   -&gt; test rows
keys/values:  emb[:, :n_train]   -&gt; training rows</code></pre></div><p>Implementation note: the README stores <code>y_train</code> as a float tensor, but the classification embedder casts labels to integer indices before lookup:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class ClassEmbedding(nn.Embedding):
    def forward(self, y: torch.Tensor) -&gt; torch.Tensor:
        return super().forward(y.squeeze(-1).long())</code></pre></div><p>The classification path is therefore still a lookup-table path. Labels can be carried as floats in the example tensor, but their values must be valid class indices from 0 to 9.</p><p>The output head is a separate MLP:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.out_mlp = get_mlp(icl_dim, icl_dim * 2, out_dim)

return self.out_mlp(self.out_ln(emb))</code></pre></div><p>With <code>out_dim=10</code>, the returned tensor has shape:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, n_test, 10)</code></pre></div><p>Those 10 values are the native small-class logits. This gives the concrete interface that the many-class wrappers depend on:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">training labels 0..9
    -&gt; class embeddings for feature-level and row-level target injection
    -&gt; compression-then-ICL over labeled training rows
    -&gt; 10 logits per test row</code></pre></div><p>For \(C&gt;10\), full TabICLv2 adds orchestration around this interface. Mixed-radix ensembling repeatedly feeds small digit labels into the target-aware embedding side, keeping context-label embeddings compatible with \(\text{TF}_\text{col}\). Hierarchical classification repeatedly asks the native small-class predictor to solve node-local branch decisions during the \(\text{TF}_\text{icl}\) stage. NanoTabICL does not include those wrappers; it makes the reuse point visible.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p30-architecture-of-tabiclv2-many-class-classification?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p30-architecture-of-tabiclv2-many-class-classification?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Summary</h2><p>TabICLv2 supports many-class classification without changing the small-class interface learned during pretraining. Because observed labels enter the model before both \(\text{TF}_\text{col}\) and \(\text{TF}_\text{icl}\), it must decompose the problem at both stages rather than merely replace the output head.</p><p>Mixed-radix ensembling handles the target-aware embedding side. It expresses each large class id as several digits with at most \(10\) values, runs the column transformer for each digit view, and averages the resulting representations. Hierarchical classification handles the ICL and output side. It organizes the original classes into a balanced tree, makes fresh native predictions among at most \(10\) choices at each node, and multiplies probabilities along each path to score every original class.</p><p>Together, these mechanisms turn one unsupported \(C\)-class task into multiple predictions that remain within the model&#8217;s native label and logit capacity. That reuse requires multiple forward passes over different label views and node-specific contexts, which is also why the current many-class path does not support KV caching. NanoTabICL exposes the small-class interface being reused through <code>max_classes=10</code>, its two class-embedding stages, and <code>out_dim=10</code>, but leaves out the mixed-radix and hierarchical orchestration.</p><p>The next post covers quantile predictions for regression, the regression strategy TabICLv2 uses to model predictive uncertainty without discretizing the target into classification bins.</p><h2><strong>Quiz</strong></h2><p>Take the quiz below to test your understanding, and share your answers and doubts in the comments. The questions get progressively harder from 1 to 10.</p><ol><li><p>What are the two many-class bottlenecks TabICLv2 has to handle when \(C&gt;10\)?</p></li><li><p>Which mechanism handles the output side of many-class classification, and which mechanism handles the input-label embedding side?</p></li><li><p>Why does TabICLv2 avoid adding a new direct \(C\)-class output head for many-class classification?</p></li><li><p>In hierarchical classification, what does the path \(\pi(c)\) represent for an original class \(c\)?</p></li><li><p>How is the probability of an original class computed from local hierarchical predictions?</p></li><li><p>For \(C=57\) and <code>max_classes=10</code>, why does the root node use six groups in the example hierarchy?</p></li><li><p>Why is hierarchical decoding in TabICLv2 not greedy decoding?</p></li><li><p>What conditions must the mixed-radix bases \([k_0,k_1,\ldots,k_{D-1}]\) satisfy, and how does TabICLv2 choose among valid bases?</p></li><li><p>With mixed-radix bases \([8,8]\) for \(C=57\), what digits represent class \(y=42\), and which digit combinations are unused?</p></li><li><p>What many-class machinery does NanoTabICL expose, and what does it deliberately leave out?</p></li></ol>]]></content:encoded></item><item><title><![CDATA[[P29] Architecture of TabICLv2: query-aware scalable softmax]]></title><description><![CDATA[How TabICLv2 uses query-aware scalable softmax (QASSMax) to prevent attention fading in tabular in-context learning.]]></description><link>https://newsletter.dsaiengineering.com/p/p29-architecture-of-tabiclv2-query-aware-scalable-softmax</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p29-architecture-of-tabiclv2-query-aware-scalable-softmax</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Fri, 05 Jun 2026 18:22:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2qeK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous post, we looked at compression-then-ICL: the part of TabICLv2 that turns the \(n\times m\) grid of row-feature tokens into row representations, then lets test rows attend to labeled training rows. That design makes the final prediction stage row-level, which is exactly where in-context learning happens.</p><p>But the row-level ICL transformer is not the only place where attention must scale with training-set size. TabICLv2 also uses attention over rows inside the column-wise induced-attention stage. As the number of training rows grows, ordinary softmax attention can become too diffuse. Even when one training row is the best match for a query row, or the most useful row for an inducing vector to summarize, the softmax denominator grows with all the other rows. The best row can then receive only a small fraction of the attention mass. This is the attention-fading problem.</p><p>This post covers Query-Aware Scalable Softmax, or QASSMax, the mechanism TabICLv2 uses to make attention more robust as context size changes. We will start with ordinary softmax as a temperature-controlled normalization, show why attention fades when the number of keys grows, then build up from Scalable Softmax (SSMax) to QASSMax.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2qeK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2qeK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2qeK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;TabICLv2 pipeline; QASSMax is applied in the first induced-attention stage of TF_col and in TF_icl.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200794510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TabICLv2 pipeline; QASSMax is applied in the first induced-attention stage of TF_col and in TF_icl." title="TabICLv2 pipeline; QASSMax is applied in the first induced-attention stage of TF_col and in TF_icl." srcset="https://substackcdn.com/image/fetch/$s_!2qeK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!2qeK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8e03ef-e9e0-4663-acb1-84852a0ecbe6_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TabICLv2 pipeline; QASSMax is applied in the first induced-attention stage of TF_col and in TF_icl.</figcaption></figure></div><p>The figure shows the architectural destination; the next sections explain why those two attention sites need length-aware softmax in the first place. The post ends with a quiz that you can take to test your understanding.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Query-aware scalable softmax</h2><p>Query-aware scalable softmax, or QASSMax, modifies attention by rescaling the query before the query-key dot products are computed. Because softmax is applied to those dot products, changing the query changes the logits. With scalar scaling, this changes how sharp or broad the attention distribution becomes; with element-wise scaling, it can also change the ranking of keys.</p><p>The purpose is to keep attention selective when the number of context samples becomes much larger than the sequence lengths seen during pretraining. To avoid overloading notation, I will use \(N\) for the number of keys in a generic attention calculation. When we specialize the discussion to TabICLv2, I will use \(n\) for the number of training rows.</p><h3>Softmax as temperature</h3><p>Start with standard scaled dot-product attention for one attention head. Let</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q\\in\\mathbb{R}^{d_\\text{head}}&quot;,&quot;id&quot;:&quot;XJHIVLGFYF&quot;}" data-component-name="LatexBlockToDOM"></div><p>be the query vector, and let</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;k_1,\\ldots,k_N\\in\\mathbb{R}^{d_\\text{head}}&quot;,&quot;id&quot;:&quot;MOPTHIOOMI&quot;}" data-component-name="LatexBlockToDOM"></div><p>be the \(N\) key vectors. Here \(d_\text{head}\) is the dimension of one attention head. The unnormalized attention logit for key \(j\) is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_j=\\frac{q^\\top k_j}{\\sqrt{d_\\text{head}}},&quot;,&quot;id&quot;:&quot;WFQPCVMWUO&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(j\in\{1,\ldots,N\}\). The attention weight assigned to key \(j\) is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;a_j=\\text{softmax}(z)_j=\\frac{\\exp(z_j)}{\\sum_{\\ell=1}^{N}\\exp(z_\\ell)}.&quot;,&quot;id&quot;:&quot;VDPJZVZXDS&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(z=(z_1,\ldots,z_N)\), \(a_j\) is the normalized attention weight, and \(\ell\) is only the summation index over keys. The attention output is the weighted average</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{j=1}^N a_jv_j,&quot;,&quot;id&quot;:&quot;HLOVWJMQDB&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(v_j\) is the value vector associated with key \(j\).</p><p>Now rescale the query by a positive scalar \(\lambda\):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{q}=\\lambda q,\n\n\\qquad\n\n\\lambda>0.&quot;,&quot;id&quot;:&quot;HKFSFTPCJD&quot;}" data-component-name="LatexBlockToDOM"></div><p>The new logit is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{z}_j=\\frac{\\tilde{q}^\\top k_j}{\\sqrt{d_\\text{head}}}=\\lambda z_j.&quot;,&quot;id&quot;:&quot;VBETBCMPLH&quot;}" data-component-name="LatexBlockToDOM"></div><p>So scalar query scaling is the same as scalar logit scaling. It is also the same as changing the softmax temperature. For temperature \(\tau&gt;0\),</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{softmax}_\\tau(z)_j=\\frac{\\exp(z_j/\\tau)}{\\sum_{\\ell=1}^{N}\\exp(z_\\ell/\\tau)}.&quot;,&quot;id&quot;:&quot;SOMKHIRAEI&quot;}" data-component-name="LatexBlockToDOM"></div><p>Writing \(\lambda=1/\tau\), we get</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{softmax}_\\tau(z)=\\text{softmax}(\\lambda z).&quot;,&quot;id&quot;:&quot;SLTGJOKCAL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Lower temperature, or larger \(\lambda\), makes the distribution sharper. Higher temperature, or smaller \(\lambda\), makes it broader. The relative odds between keys \(j\) and \(\ell\) become</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{a_j}{a_\\ell}\n\n=\n\n\\frac{\\exp(\\lambda z_j)}{\\exp(\\lambda z_\\ell)}\n\n=\n\n\\exp(\\lambda(z_j-z_\\ell)).&quot;,&quot;id&quot;:&quot;QBFCZEVAEP&quot;}" data-component-name="LatexBlockToDOM"></div><p>The key point is that scaling does not change the ranking of logits. If \(z_j&gt;z_\ell\), then \(\lambda z_j&gt;\lambda z_\ell\). Scaling changes how decisively softmax turns that ranking into probability mass.</p><h3>Why softmax fades as context grows</h3><p>This matters in long contexts because the softmax denominator grows with the number of keys. Consider a simplified attention calculation with one relevant key and many distractor keys:</p><ul><li><p>the relevant key has logit \(z_\star\);</p></li><li><p>each of the \(N-1\) distractors has lower logit \(z_0\); and</p></li><li><p>the logit gap is \(\Delta=z_\star-z_0&gt;0\).</p></li></ul><p>Under ordinary softmax, the attention weight on the relevant key is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;a_\\star\n\n=\n\n\\frac{\\exp(z_\\star)}{\\exp(z_\\star)+(N-1)\\exp(z_0)}\n\n=\n\n\\frac{1}{1+(N-1)\\exp(-\\Delta)}.&quot;,&quot;id&quot;:&quot;VXPTTVDMMM&quot;}" data-component-name="LatexBlockToDOM"></div><p>For a fixed gap \(\Delta\), the denominator grows as \(N\) grows, so</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;a_\\star\\rightarrow0\n\n\\qquad\\text{as}\\qquad\n\nN\\rightarrow\\infty.&quot;,&quot;id&quot;:&quot;ZMURYTCQDT&quot;}" data-component-name="LatexBlockToDOM"></div><p>The problem is not that softmax forgets which key has the highest logit. The relevant key still ranks first. The problem is that many weaker keys can collectively absorb most of the probability mass.</p><p>In TabICLv2, this matters at the attention sites whose key sequence grows with the number of training rows. In \(\text{TF}_\text{icl}\), a test row may attend over many labeled training rows; in the first induced-attention step of \(\text{TF}_\text{col}\), learned inducing vectors summarize many training-row tokens for a fixed grouped feature position. In both cases, as the number of training rows grows, the ordinary softmax denominator grows with all the other rows too. This is attention fading.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gtWK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gtWK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 424w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 848w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 1272w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gtWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png" width="1050" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50631,&quot;alt&quot;:&quot;Attention mass on the relevant key vs. number of distractors (fixed logit gap \\(\\Delta\\)).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200794510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Attention mass on the relevant key vs. number of distractors (fixed logit gap \(\Delta\))." title="Attention mass on the relevant key vs. number of distractors (fixed logit gap \(\Delta\))." srcset="https://substackcdn.com/image/fetch/$s_!gtWK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 424w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 848w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 1272w, https://substackcdn.com/image/fetch/$s_!gtWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd18ed3e1-87ff-4f1a-bc3b-41414cf0e907_1050x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Attention mass on the relevant key vs. number of distractors (fixed logit gap \(\Delta\)).</figcaption></figure></div><h3>SSMax: scale logits with log n</h3><p>Scalable Softmax, or SSMax, addresses this failure mode by making the logit scale grow with context length. It keeps softmax as the normalization function, but rescales the query before computing attention logits.</p><p>Now specialize from \(N\) generic keys to \(n\) training rows. In the TabICLv2 paper&#8217;s notation, let \(q_h=(q_{hi})\) be a query vector at attention head \(h\), where \(i\) indexes the coordinates inside the head. SSMax rescales the query with a learnable per-head scalar \(s_h\):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{q}_{hi}=q_{hi}\\cdot s_h\\log n.&quot;,&quot;id&quot;:&quot;NUQXXOSLMY&quot;}" data-component-name="LatexBlockToDOM"></div><p>This makes each logit</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{z}_j=(s_h\\log n)z_j.\n\n&quot;,&quot;id&quot;:&quot;FFOOXALSFM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Returning to the one-relevant-key example, the attention weight becomes</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;a_\\star\n\n=\n\n\\frac{1}{1+(n-1)\\exp(-s_h\\Delta\\log n)}\n\n\\approx\n\n\\frac{1}{1+n^{1-s_h\\Delta}}.&quot;,&quot;id&quot;:&quot;RKWIXWBRAB&quot;}" data-component-name="LatexBlockToDOM"></div><p>The approximation uses \(n-1\approx n\) and</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\exp(-s_h\\Delta\\log n)=n^{-s_h\\Delta}.&quot;,&quot;id&quot;:&quot;IRDFCXUQKT&quot;}" data-component-name="LatexBlockToDOM"></div><p>This shows why \(\log n\) appears: the softmax denominator grows with the number of keys, so the effective relevant-vs-distractor gap must also grow with context length. In this simplified example, assuming \(s_h&gt;0\), \(s_h\Delta&gt;1\) is where the relevant key does not fade away as \(n\) becomes large.</p><p>That condition is only intuition from the toy setup. Real attention has many different logit values, not one repeated distractor logit. The useful lesson is the scaling law: increasing the context length changes the softmax denominator, so the model benefits from a length-aware logit scale.</p><h3>QASSMax: length base + query gate</h3><p>SSMax gives every query in the same attention head the same length-dependent scale. QASSMax keeps the length-aware idea but makes it more flexible in two ways:</p><ul><li><p>the base scale is a learned vector function of \(\log n\), not one scalar \(s_h\); and</p></li><li><p>the scale is modulated by a bounded gate that depends on the current query.</p></li></ul><p>For head \(h\), let \(q_h=(q_{hi})\) be the query vector, with \(i\) indexing head dimensions. QASSMax rescales each query element as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{q}_{hi}\n\n=\n\nq_{hi}\n\n\\underbrace{\\cdot\\text{MLP}_\\text{base}(\\log n)_{hi}}_\\text{base scaling}\n\n\\cdot\n\n\\underbrace{(1+\\tanh(\\text{MLP}_\\text{gate}(q_h)_i))}_\\text{query-aware gating}.&quot;,&quot;id&quot;:&quot;LGORNEMBCS&quot;}" data-component-name="LatexBlockToDOM"></div><p>Equivalently, in vector notation,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{q}_h=q_h\\odot B_h(n)\\odot G_h(q_h),&quot;,&quot;id&quot;:&quot;LSGNUEQLBG&quot;}" data-component-name="LatexBlockToDOM"></div><p>where</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;B_h(n)=\\text{MLP}_\\text{base}(\\log n)_h\\in\\mathbb{R}^{d_\\text{head}},&quot;,&quot;id&quot;:&quot;XOGBMPPZLQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G_h(q_h)=1+\\tanh(\\text{MLP}_\\text{gate}(q_h))\\in(0,2)^{d_\\text{head}}.&quot;,&quot;id&quot;:&quot;KRNACTEVEV&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(\odot\) denotes element-wise multiplication. \(B_h(n)\) is the length-dependent base vector for head \(h\), and \(G_h(q_h)\) is the query-dependent gate for that same head.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WpKd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WpKd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 424w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 848w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 1272w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WpKd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png" width="1456" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33413,&quot;alt&quot;:&quot;QASSMax rescales the query with a length-dependent base and a bounded query-dependent gate before query-key dot products form the logits sent to softmax.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200794510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="QASSMax rescales the query with a length-dependent base and a bounded query-dependent gate before query-key dot products form the logits sent to softmax." title="QASSMax rescales the query with a length-dependent base and a bounded query-dependent gate before query-key dot products form the logits sent to softmax." srcset="https://substackcdn.com/image/fetch/$s_!WpKd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 424w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 848w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 1272w, https://substackcdn.com/image/fetch/$s_!WpKd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb684399a-4604-4e09-be1b-67c85b4826a8_1800x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">QASSMax rescales the query with a length-dependent base and a bounded query-dependent gate before query-key dot products form the logits sent to softmax.</figcaption></figure></div><p>For \(H\) attention heads, the base network has type</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{MLP}_\\text{base}:\\mathbb{R}\\rightarrow\\mathbb{R}^{H\\times d_\\text{head}}.&quot;,&quot;id&quot;:&quot;OHWJETZOCD&quot;}" data-component-name="LatexBlockToDOM"></div><p>It takes \(\log n\) as input and returns one base scale for every head dimension. The gate network has type</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{MLP}_\\text{gate}:\\mathbb{R}^{d_\\text{head}}\\rightarrow\\mathbb{R}^{d_\\text{head}}.&quot;,&quot;id&quot;:&quot;FOWDNFVIIU&quot;}" data-component-name="LatexBlockToDOM"></div><p>It maps the current query to an element-wise gate. This is the element-wise QASSMax variant used by the official default configuration.</p><p>The two terms have different roles. The base term \(B_h(n)\) handles the predictable effect of context length. As \(n\) changes, the model can learn how much the logits should be rescaled before softmax.</p><p>The gate \(G_h(q_h)\) handles query-specific sharpness. Some queries should retrieve one highly relevant row. Other queries should aggregate signal from many rows. Because</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tanh(x)\\in(-1,1),&quot;,&quot;id&quot;:&quot;CKAROZNPYH&quot;}" data-component-name="LatexBlockToDOM"></div><p>the gate satisfies</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G_h(q_h)\\in(0,2)^{d_\\text{head}}.&quot;,&quot;id&quot;:&quot;KOGLLLSGST&quot;}" data-component-name="LatexBlockToDOM"></div><p>So the query-dependent part can reduce or increase the magnitude of the base scaling contribution, but the gate itself is bounded. The full scale is still not guaranteed to be positive, because the base MLP output is unconstrained.</p><p>There is one important difference from ordinary temperature scaling. A positive scalar temperature changes attention sharpness while preserving the ranking of logits. QASSMax uses element-wise scaling, and \(\text{MLP}_\text{base}(\log n)\) is not constrained to be a positive scalar, so it can change the direction of the query vector and therefore can change logit rankings as well as sharpness.</p><p>At initialization, QASSMax behaves like length-dependent scaling without extra query modulation. The query-aware part can be learned gradually.</p><h3>Why put the gate on the query, not the output?</h3><p>One design choice remains: the gate is applied to the query before attention, rather than to the output after attention.</p><p>Selective attention needs per-query sharpness. A scalar temperature can express this idea in a simple form. For query row or query token \(r\), let \(z_{rj}\) be the attention logit between query \(r\) and key \(j\). With a query-specific temperature \(\tau_r&gt;0\), the attention weight would be</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;a_{rj}\n\n=\n\n\\frac{\\exp(z_{rj}/\\tau_r)}\n\n{\\sum_{\\ell=1}^{N}\\exp(z_{r\\ell}/\\tau_r)}.&quot;,&quot;id&quot;:&quot;PXDIWGLAEJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>QASSMax is more expressive than this scalar-temperature view because it scales different head dimensions differently and can change more than sharpness alone. The scalar-temperature equation is still useful as intuition: the query participates in controlling how selective its attention distribution should be.</p><p>Some gated attention mechanisms apply the gate after attention, for example</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{o}=g(q)\\odot \\text{Attn}(q,K,V).&quot;,&quot;id&quot;:&quot;JDCPHWRDWX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(K\) is the matrix of keys, \(V\) is the matrix of values, \(\text{Attn}(q,K,V)\) is the ordinary attention output for query \(q\), and \(\tilde{o}\) is the gated output. This kind of gate changes the output after attention weights have already been computed.</p><p>QASSMax applies the gate earlier. Because the gate changes the query \(\tilde{q}_h\), it changes the logits before softmax:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{z}_j=\n\n\\frac{(q_h\\odot B_h(n)\\odot G_h(q_h))^\\top k_j}\n\n{\\sqrt{d_\\text{head}}}.&quot;,&quot;id&quot;:&quot;GNCRKAWNME&quot;}" data-component-name="LatexBlockToDOM"></div><p>So the gate affects the attention weights themselves, not only the post-attention output. This is why QASSMax is a softmax-logit modification rather than an output gating mechanism.</p><p>In short, QASSMax fights fading in two places: the base term adapts to context length, and the query gate lets each query adjust its attention geometry and selectivity.</p><h3>Where TabICLv2 uses QASSMax</h3><p>TabICLv2 uses QASSMax where the number of training rows directly affects the number of keys. There are two such places in the architecture:</p><ul><li><p>the first induced-attention stage of \(\text{TF}_\text{col}\), where inducing tokens summarize row tokens for a fixed grouped feature position; and</p></li><li><p>\(\text{TF}_\text{icl}\), where row representations attend across the dataset.</p></li></ul><p>It is not used in the row-wise transformer that compresses feature positions within one row. That stage attends across columns, not across a large set of training examples.</p><p>The paper&#8217;s stress test makes the failure mode concrete.</p><p>This placement matches the failure mode. In the paper&#8217;s needle-in-haystack classification task, the model must focus on one anchor sample among many negative samples. Without scalable softmax, normalized attention entropy rises and accuracy drops as the number of negatives grows. The paper reports entropy divided by \(\log n\) and averaged across heads and layers in \(\text{TF}_\text{icl}\). QASSMax keeps that entropy low and maintains 100% accuracy even with 15K negatives, outperforming SSMax at extreme scales.</p><p>The implementation section below shows how this placement appears in NanoTabICL.</p><h3>Implementation in NanoTabICL</h3><p>NanoTabICL wires QASSMax through the <code>ssmax=True</code> flag. The flag name is broad, but in this implementation it constructs a <code>QASSMax</code> layer. The flag is enabled only in the attention stages where the key sequence can grow with the number of training rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.col_blocks = nn.ModuleList([
    InducedTransformerBlock(
        embed_dim=embed_dim,
        num_heads=col_nhead,
        n_inducing=n_cls_rows,
        ssmax=True,
    )
    for _ in range(col_num_blocks)
])

self.icl_blocks = nn.ModuleList([
    TransformerBlock(embed_dim=icl_dim, num_heads=icl_nhead, ssmax=True)
    for _ in range(icl_num_blocks)
])</code></pre></div><p>The row-wise transformer blocks do not enable QASSMax:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.row_blocks = nn.ModuleList([
    TransformerBlock(embed_dim=embed_dim, num_heads=row_nhead, use_rope=True)
    for _ in range(row_num_blocks)
])</code></pre></div><p>So the placement is:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HZGH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HZGH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 424w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 848w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 1272w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HZGH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png" width="1456" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89381,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200794510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HZGH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 424w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 848w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 1272w, https://substackcdn.com/image/fetch/$s_!HZGH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc57ff6-200b-4895-ba57-e2723504ecb4_1717x447.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This matches the motivation from the previous sections. QASSMax is used when the number of keys is tied to training-set size. It is not used merely because a transformer block exists.</p><h4>Column-wise induced attention</h4><p>The column-wise stage has one important detail. <code>InducedTransformerBlock</code> contains two transformer calls:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.tfm1 = TransformerBlock(embed_dim=embed_dim, num_heads=num_heads, ssmax=ssmax)
self.tfm2 = TransformerBlock(embed_dim=embed_dim, num_heads=num_heads)

kv = self.tfm1(self.inducing_vectors.expand(q.shape[0], -1, -1), q if kv is None else kv, kv_max_idx=kv_max_idx)
return self.tfm2(q, kv, q_max_idx=q_max_idx)</code></pre></div><p>Only the first transformer, <code>tfm1</code>, receives <code>ssmax=ssmax</code>. The second transformer, <code>tfm2</code>, is constructed without QASSMax because it attends over the fixed number of inducing vectors, not directly over all training rows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">for block in self.col_blocks:
    emb = block.col_attn(emb, kv_max_idx=n_train)</code></pre></div><p>The <code>col_attn</code> wrapper transposes the table so each grouped feature position is processed as a separate sequence over rows. Inside the attention module, the effective shape is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch * cols, rows, embed_dim)</code></pre></div><p>The argument <code>kv_max_idx=n_train</code> slices the key/value sequence to the labeled training rows before attention runs. Therefore, when QASSMax is called in <code>tfm1</code>, its length argument is the number of training rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">n = k.size(-2) = n_train</code></pre></div><p>This is exactly the long-context setting QASSMax is meant for. The learned inducing vectors query a potentially large set of training-row tokens for a fixed grouped feature position.</p><h4>Dataset-wise ICL attention</h4><p>The second QASSMax placement is the dataset-wise ICL stack. After row compression, <code>emb</code> has shape:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, rows, icl_dim)</code></pre></div><p>NanoTabICL adds target information to training row tokens, then applies the ICL blocks:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb[:, :n_train] += self.y_embed_icl(y[:, :, None])
for block in self.icl_blocks[:-1]:
    emb = block(emb, kv_max_idx=n_train)
emb = self.icl_blocks[-1](emb[:, n_train:], emb[:, :n_train])</code></pre></div><p>For the intermediate ICL blocks, <code>kv_max_idx=n_train</code> means every row can query the sequence, but keys and values are restricted to training rows. For the final ICL block, the query sequence is explicitly test rows and the key/value sequence is explicitly training rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">queries:      emb[:, n_train:]   -&gt; test rows
keys/values:  emb[:, :n_train]   -&gt; training rows</code></pre></div><p>In both cases, the key length passed to QASSMax is the training context size. So the ICL transformer receives the same length-aware query scaling discussed above.</p><h4>Query scaling inside <code>TransformerBlock</code></h4><p>Inside <code>TransformerBlock</code>, the <code>ssmax=True</code> flag creates a <code>QASSMax</code> layer:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.ssmax_layer = (
    QASSMax(num_heads=num_heads, head_dim=embed_dim // num_heads)
    if ssmax
    else None
)</code></pre></div><p>The attention method first projects query, key, and value vectors, then reshapes them into multi-head form:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, seq_len, embed_dim)
    -&gt; (batch, heads, seq_len, head_dim)</code></pre></div><p>Then QASSMax is applied directly to the projected query tensor:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">q = q if self.ssmax_layer is None else self.ssmax_layer(q=q, n=k.size(-2))
q, k = (t if self.rope is None else self.rope(t) for t in [q, k])
attn_output = nn.functional.scaled_dot_product_attention(...)</code></pre></div><p>This placement matters. In NanoTabICL, the QASSMax-enabled blocks do not use RoPE; RoPE is enabled in the row-wise transformer, where QASSMax is not used. In this implementation, QASSMax changes the projected query before <code>scaled_dot_product_attention</code> computes the logits \(q^\top k\), so it changes the attention weights themselves, not only the post-attention output.</p><h4>The QASSMax module</h4><p>The module has two learned pieces: <code>base_mlp</code> for length-dependent scaling and <code>query_mlp</code> for query-dependent modulation.</p><p>Both pieces are two-layer MLPs with 64 hidden neurons and GELU activation. GELU stands for Gaussian Error Linear Unit; one common definition is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{GELU}(x)=x\\Phi(x),&quot;,&quot;id&quot;:&quot;PFHMQIUHOO&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\Phi(x)\) is the standard normal CDF.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class QASSMax(nn.Module):
    def __init__(self, num_heads: int, head_dim: int, n_hidden: int = 64):
        super().__init__()
        self.base_mlp = get_mlp(1, n_hidden, num_heads * head_dim)
        self.query_mlp = get_mlp(head_dim, n_hidden, head_dim)
        nn.init.zeros_(self.query_mlp[-1].weight)
        nn.init.zeros_(self.query_mlp[-1].bias)

    def forward(self, q: torch.Tensor, n: int) -&gt; torch.Tensor:
        batch_size, num_heads, seq_len, head_dim = q.shape
        logn = q.new_tensor(math.log(max(1, n))).view(1, 1)
        return (
            self.base_mlp(logn).view(1, num_heads, 1, head_dim)
            * (1 + torch.tanh(self.query_mlp(q)))
            * q
        )</code></pre></div><p>The forward pass starts from the projected query tensor:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, heads, query_len, head_dim)</code></pre></div><p>The length input is converted to a one-element tensor:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">logn: (1, 1)</code></pre></div><p>After <code>base_mlp(logn)</code> and <code>.view(1, num_heads, 1, head_dim)</code>, the base scale has shape:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(1, heads, 1, head_dim)</code></pre></div><p>This is the implementation of \(B_h(n)\). The singleton batch and query dimensions make the same length-dependent scale broadcast across all examples and all query positions.</p><p>The query-dependent gate is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">1 + torch.tanh(self.query_mlp(q))</code></pre></div><p>It has the same shape as <code>q</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, heads, query_len, head_dim)</code></pre></div><p>This is the implementation of \(G_h(q_h)\). Because <code>tanh</code> lies in \((-1,1)\), the multiplicative gate lies in \((0,2)\). The final return statement multiplies the original query by both terms:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{q}_h=q_h\\odot B_h(n)\\odot G_h(q_h).&quot;,&quot;id&quot;:&quot;BBYOJWCXEM&quot;}" data-component-name="LatexBlockToDOM"></div><p>The zero initialization of the last <code>query_mlp</code> layer controls the starting behavior:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">nn.init.zeros_(self.query_mlp[-1].weight)

nn.init.zeros_(self.query_mlp[-1].bias)</code></pre></div><p>At initialization, <code>query_mlp(q)</code> is zero, so the gate starts as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;1+\\tanh(0)=1.&quot;,&quot;id&quot;:&quot;ZOXVKABGRR&quot;}" data-component-name="LatexBlockToDOM"></div><p>So QASSMax initially behaves like length-dependent scaling without extra query modulation. The query-aware part is learned gradually rather than perturbing attention sharply at initialization.</p><p>The implementation therefore mirrors the mathematical decomposition from the previous section:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Je45!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Je45!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 424w, https://substackcdn.com/image/fetch/$s_!Je45!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 848w, https://substackcdn.com/image/fetch/$s_!Je45!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 1272w, https://substackcdn.com/image/fetch/$s_!Je45!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Je45!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png" width="1456" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:368,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106483,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200794510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Je45!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 424w, https://substackcdn.com/image/fetch/$s_!Je45!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 848w, https://substackcdn.com/image/fetch/$s_!Je45!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 1272w, https://substackcdn.com/image/fetch/$s_!Je45!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5e1049-9e90-4f9c-aee1-97fe63376a31_1751x442.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That is the whole implementation idea: QASSMax is not a separate attention kernel. It is a learned query rescaling step inserted immediately before ordinary scaled dot-product attention.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p29-architecture-of-tabiclv2-query-aware-scalable-softmax?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p29-architecture-of-tabiclv2-query-aware-scalable-softmax?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Summary</h2><p>Ordinary softmax attention can fade as the number of keys grows: even a clearly relevant row can lose attention mass to many individually weaker distractors. SSMax addresses this by scaling logits with a learned factor proportional to \(\log n\), matching the way the softmax denominator grows with context length.</p><p>QASSMax makes that idea more flexible. It rescales the query with a learned length-dependent base term \(B_h(n)\) and a bounded query-dependent gate \(G_h(q_h)\), so attention can adapt both to the number of training rows and to the specific query being processed. In NanoTabICL, this appears exactly where long row contexts matter: the first induced-attention step of the column-wise transformer and the dataset-wise ICL transformer.</p><p>The next post covers many-class classification, where TabICLv2 extends a model pretrained with at most 10 classes to settings with many more labels.</p><h2>Quiz</h2><p>Take the quiz below to test your understanding, and share your answers and doubts in the comments. The difficulty increases from question 1 to question 10 in increasing order of the question number.</p><ol><li><p>What practical attention problem does QASSMax address in TabICLv2?</p></li><li><p>Why is multiplying the query vector by a positive scalar equivalent to changing softmax temperature?</p></li><li><p>In the one-relevant-key example, why does the relevant key's attention weight go to zero as \(N\) grows under ordinary softmax?</p></li><li><p>What is the basic scaling idea behind SSMax?</p></li><li><p>Why does \(\log n\) appear in scalable softmax rather than just \(n\)?</p></li><li><p>What are the two multiplicative components QASSMax adds to the query?</p></li><li><p>What does the bounded query gate \(1+\tanh(\text{MLP}_\text{gate}(q_h))\) allow the model to do?</p></li><li><p>Why can QASSMax change more than just the sharpness of the attention distribution?</p></li><li><p>Where is QASSMax used in NanoTabICL, and where is it not used?</p></li><li><p>In the NanoTabICL <code>QASSMax.forward</code> method, why does <code>base_mlp(logn)</code> have shape <code>(1, heads, 1, head_dim)</code> after reshaping, while the query gate has the same shape as <code>q</code>?</p></li></ol>]]></content:encoded></item><item><title><![CDATA[[P28] Architecture of TabICLv2: compression-then-ICL]]></title><description><![CDATA[How TabICLv2 compresses target-aware feature tokens into row representations, then uses dataset-wise in-context learning to predict test rows.]]></description><link>https://newsletter.dsaiengineering.com/p/p28-architecture-of-tabiclv2-compression-then-icl</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p28-architecture-of-tabiclv2-compression-then-icl</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Thu, 04 Jun 2026 18:22:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7S5J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous post, we looked at target-aware embedding: the step where TabICLv2 injects observed labels into the feature tokens of training rows while keeping test rows unlabeled. At that point, the model has a target-aware tensor with one token for every row and every grouped feature position.</p><p>This post covers the next architectural step: compression-then-ICL. TabICLv2 does not run dataset-level in-context learning directly over all \(n\times m\) row-feature tokens. It first contextualizes grouped feature positions across rows, compresses each row into a smaller row representation, and then runs in-context learning over those row tokens.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7S5J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7S5J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7S5J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;TabICLv2 pipeline; this post covers the compression-then-ICL stack.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200640745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TabICLv2 pipeline; this post covers the compression-then-ICL stack." title="TabICLv2 pipeline; this post covers the compression-then-ICL stack." srcset="https://substackcdn.com/image/fetch/$s_!7S5J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!7S5J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9305236e-1f67-4295-88ce-8aa99cd3c09d_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TabICLv2 pipeline; this post covers the compression-then-ICL stack.</figcaption></figure></div><p>In this post, we start from the target-aware tensor \(E_2\), pass it through column-wise and row-wise transformer blocks, and end with dataset-wise ICL, where test row representations attend to labeled training row representations. A short quiz at the end lets you check whether the architecture and label-flow details are clear.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Compression-then-ICL</h2><h3>Starting point: target-aware tokens \(E_2\)</h3><p>Before compression begins, TabICLv2 has already constructed a target-aware token tensor</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2\\in\\mathbb{R}^{n\\times m\\times d}.&quot;,&quot;id&quot;:&quot;LPBHPYYKBI&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(d\) is the token embedding dimension. Each token \(E_2[i,j]\in\mathbb{R}^d\) represents row \(i\) and grouped feature position \(j\). After repeated feature grouping, \(m\) denotes the number of grouped feature positions being processed by the transformer stack.</p><p>The previous post derived target-aware embedding in full. For training rows, the update is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]=E_1[i,j]+u_i,&quot;,&quot;id&quot;:&quot;WWOOJOWNBZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(E_1\) is the feature-only tensor from repeated feature grouping and \(u_i\) is the row-level target vector from target-aware embedding (see the previous post for \(\operatorname{Embed}_\text{TAE}\) and the test-row masking rule). For a labeled row, the same target vector is broadcast across all grouped feature positions.</p><p>The important point is that compression does not operate on raw feature tokens. It operates on feature tokens that already contain training-label information. For training rows, each grouped feature token has access to the observed target through \(u_i\). For test rows, the target component is absent, because the target is what the model must predict.</p><p>The compression-then-ICL pipeline explains what happens next. TabICLv2 must convert the \(n\times m\) grid of row-feature tokens into row-level representations that can be used for prediction. It does this in three stages:</p><ol><li><p>column-wise embedding applies a Set Transformer \(\text{TF}_\text{col}\) to each grouped feature position;</p></li><li><p>row-wise interaction uses \(\text{TF}_\text{row}\) with learned \([\text{CLS}]\) tokens to compress grouped feature embeddings within each row;</p></li><li><p>dataset-wise ICL uses \(\text{TF}_\text{icl}\) so test row representations can attend to labeled training row representations.</p></li></ol><p>The diagram below shows the same three-stage path, with tensor shapes at each step:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lm_c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lm_c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 424w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 848w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 1272w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lm_c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png" width="995" height="1007" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1007,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43424,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200640745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lm_c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 424w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 848w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 1272w, https://substackcdn.com/image/fetch/$s_!Lm_c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be7264c-822b-4516-a966-cd7aa5be4a55_995x1007.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Stage 1: Column-wise embedding (\(\text{TF}_\text{col}\))</h3><p>The first stage processes one grouped feature position at a time, across rows. For a fixed grouped feature index \(j\), the model sees the row sequence</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(E_2[1,j],E_2[2,j],\\ldots,E_2[n,j]).&quot;,&quot;id&quot;:&quot;QQLRVRBRGC&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is &#8220;column-wise&#8221; in the table sense: the grouped feature position is fixed, and the row index varies.</p><p>The column-wise block uses a Set Transformer-style variant of attention. Here, Set Transformer means attention through learned inducing tokens rather than direct all-pairs row attention.</p><p>Because the training-row tokens already contain target information, this stage can learn how a grouped feature behaves across labeled examples. For example, through the induced summaries, \(\text{TF}_\text{col}\) can relate a test row&#8217;s token at position \(j\) to training-row tokens at the same position \(j\), where the training tokens encode both feature values and observed targets.</p><p>TabICLv2 does this with induced attention rather than full row-by-row attention. In the default no-leakage setting, a small set of learned inducing tokens first summarizes the training rows for each grouped feature position. Then all rows, including test rows, attend back to those summaries. Informally, for each \(j\),</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[:,j]\\quad\\longrightarrow\\quad \\text{training-row summaries for position }j\\quad\\longrightarrow\\quad \\tilde{E}[:,j].&quot;,&quot;id&quot;:&quot;NHFCXUZIVI&quot;}" data-component-name="LatexBlockToDOM"></div><p>The train-only summary step matters in this default setting. Test rows can receive context built from labeled examples, but they do not contribute to the summaries themselves. This keeps the direction of information aligned with supervised prediction: labeled rows inform unlabeled rows, not the other way around. Let</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{E}\\in\\mathbb{R}^{n\\times m\\times d}&quot;,&quot;id&quot;:&quot;OGKXJOOQUF&quot;}" data-component-name="LatexBlockToDOM"></div><p>denote the output of the column-wise stage.</p><h3>Stage 2: Row-wise compression (\(\text{TF}_\text{row}\))</h3><p>The second stage changes the axis of attention. After column-wise embedding, each token \(\tilde{E}[i,j]\) has context from other rows at the same grouped feature position. Now the model must combine the \(m\) grouped feature positions inside each row.</p><p>For a fixed row \(i\), the row-wise transformer receives</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(\\tilde{E}[i,1],\\tilde{E}[i,2],\\ldots,\\tilde{E}[i,m]),&quot;,&quot;id&quot;:&quot;LVUUXNORCM&quot;}" data-component-name="LatexBlockToDOM"></div><p>along with learned \([\text{CLS}]\) tokens. The grouped feature tokens provide the content of the row. The \([\text{CLS}]\) tokens provide learned readout positions whose outputs are kept after attention.</p><p>If there are \(c\) learned \([\text{CLS}]\) tokens, the row-wise transformer produces \(c\) summary vectors for row \(i\). These are flattened into a single row representation</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;h_i\\in\\mathbb{R}^{d_\\text{row}},&quot;,&quot;id&quot;:&quot;BVTHMDOFSN&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(d_\text{row}=c\cdot d\) in the simple flattened-CLS view. This is the main compression step:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{E}\\in\\mathbb{R}^{n\\times m\\times d}\n\n\\quad\\longrightarrow\\quad\n\nH=(h_1,\\ldots,h_n)\\in\\mathbb{R}^{n\\times d_\\text{row}}.&quot;,&quot;id&quot;:&quot;ITDVQJHDPS&quot;}" data-component-name="LatexBlockToDOM"></div><p>The model has moved from one token per row-feature position to one representation per row.</p><h3>Stage 3: Dataset-wise ICL (\(\text{TF}_\text{icl}\))</h3><p>The row representations</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;h_1,\\ldots,h_n&quot;,&quot;id&quot;:&quot;NKCRSWEOGR&quot;}" data-component-name="LatexBlockToDOM"></div><p>become the tokens over which \(\text{TF}_\text{icl}\) operates. This is the stage that most directly resembles in-context learning: each row is now an example token, and test examples can use labeled training examples as context.</p><p>Before \(\text{TF}_\text{icl}\), TabICLv2 injects target information a second time. For training rows, the model adds another target embedding to the row representation. For test rows, no true target is supplied. This second embedding has a different role from the target-aware embedding in the previous post:</p><ul><li><p>the first target embedding entered before column-wise and row-wise processing, so labels could shape feature and row representations;</p></li><li><p>the second target embedding enters after row compression, so the ICL transformer receives label values on context rows while test/query rows remain unlabeled.</p></li></ul><p>To keep the same row-ordering convention as the previous post, let \(n_\text{train}\) be the number of training rows. In NanoTabICL, training rows are placed first, so</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{I}_\\text{train}=\\{1,\\ldots,n_\\text{train}\\},\n\n\\qquad\n\n\\mathcal{I}_\\text{test}=\\{n_\\text{train}+1,\\ldots,n\\}.&quot;,&quot;id&quot;:&quot;ZTALCDDWAI&quot;}" data-component-name="LatexBlockToDOM"></div><p>Define </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_i =\n\n\\begin{cases}\n\n\\displaystyle h_i+\\operatorname{Embed}_\\text{ICL}(y_i), &amp; i\\in\\mathcal{I}_\\text{train},\\\\\n\nh_i, &amp; i\\in\\mathcal{I}_\\text{test},\n\n\\end{cases}\n\n&quot;,&quot;id&quot;:&quot;MZSEUCNHXF&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(z_i\in\mathbb{R}^{d_\text{row}}\) is the row token passed to \(\text{TF}_\text{icl}\), and \(\operatorname{Embed}_\text{ICL}\) maps an observed target into \(\mathbb{R}^{d_\text{row}}\). The ICL transformer lets test rows attend to labeled training rows and outputs contextualized test-row states; the prediction head then maps those states to \(\hat{y}_i\), the predicted target for test row \(i\).</p><h2>Why this design?</h2><p>Now that the data path is defined, the design choice becomes clearer: TabICLv2 separates representation building from dataset-level prediction.</p><p>Column-wise and row-wise processing build representations. The column-wise stage relates the same grouped feature position across rows through induced summaries. The row-wise stage combines grouped feature positions into one row representation. Dataset-wise ICL then performs the final train-test interaction over row tokens.</p><p>The computational reason is that full attention over every row-feature token would make the dataset-level interaction much larger than necessary. The model would be reasoning over \(n\times m\) tokens when the final prediction problem is naturally row-level. Compression changes the final ICL input from a grid of feature tokens to a sequence of row tokens.</p><p>The representational reason is that the row tokens are not plain feature summaries. They have already passed through target-aware column-wise processing and row-wise aggregation. By the time \(\text{TF}_\text{icl}\) runs, each test row token can attend to labeled training row tokens that carry both feature context and target information.</p><h2>At a glance</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PkMo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PkMo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 424w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 848w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 1272w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PkMo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png" width="1456" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96389,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200640745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PkMo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 424w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 848w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 1272w, https://substackcdn.com/image/fetch/$s_!PkMo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfe3cbc-31ea-429d-8b6b-c807b6ba7c41_1689x432.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Implementation in NanoTabICL</h2><p>NanoTabICL implements a compact educational version of the compression-then-ICL stack in the same order as the architecture above. Production TabICLv2 adds engineering details such as inference managers, caching/offloading, and reserved CLS slots, but the same three-stage structure is visible in <code>NanoTabICLv2.forward</code>.</p><p>After repeated feature grouping and the first target-aware embedding, the working tensor is called <code>emb</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"> (batch, rows, cols, embed_dim)</code></pre></div><p>Here <code>cols</code> is the number of grouped feature positions in NanoTabICL. At this point, <code>emb[:, :n_train]</code> has already received the first target embedding, so the compression stack starts from the implementation counterpart of \(E_2\).</p><h3>Row and column attention helper</h3><p>The implementation snippets below rely on two thin wrappers around the same helper. The helper that makes row-wise and column-wise attention compact is <code>TableAttnBase</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def row_attn(self, q, kv=None, **kwargs):
    n_batch, n_rows, n_cols, embed_dim = q.shape
    q, kv = (None if t is None else t.flatten(0, 1) for t in [q, kv])
    return self(q, kv, **kwargs).reshape(n_batch, n_rows, -1, embed_dim)

def col_attn(self, q, kv=None, **kwargs):
    return self.row_attn(
        q.transpose(1, 2),
        None if kv is None else kv.transpose(1, 2),
        **kwargs,
    ).transpose(1, 2)</code></pre></div><p><code>row_attn</code> flattens <code>(batch, rows)</code> into one larger batch dimension, so standard sequence attention can run across columns inside each row. <code>col_attn</code> swaps the row and column axes, reuses <code>row_attn</code>, and swaps the axes back. This is the implementation trick that lets the same transformer block operate over either axis of a 2D table without writing separate attention kernels for row-wise and column-wise attention.</p><h3>Column-wise block</h3><p>First, \(\text{TF}_\text{col}\) applies induced attention within each grouped feature position:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">for block in self.col_blocks:
    emb = block.col_attn(emb, kv_max_idx=n_train)</code></pre></div><p>The call to <code>col_attn</code> means: keep the grouped feature position fixed and attend across rows. Internally, it transposes the table so each grouped feature position becomes its own row sequence, applies the same sequence module on that axis, and then transposes back. For the attention module, one fixed grouped feature position is seen as:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch * cols, rows, embed_dim)</code></pre></div><p>The argument <code>kv_max_idx=n_train</code> is the leakage-control detail. Inside <code>TransformerBlock.forward</code>, it slices the key/value sequence to the first <code>n_train</code> rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if kv_max_idx is not None: kv = kv[..., :kv_max_idx, :]</code></pre></div><p>In the column-wise induced block, this means the learned inducing vectors summarize only training rows in the default no-leakage path. The original row tokens then attend back to those train-derived summaries. Test rows can receive context from labeled training examples, but they do not contribute to the column-wise context unless an implementation path explicitly enables embedding with test rows.</p><p>The block itself is <code>InducedTransformerBlock</code>, which implements induced attention:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.tfm1 = TransformerBlock(embed_dim=embed_dim, num_heads=num_heads, ssmax=ssmax)
self.tfm2 = TransformerBlock(embed_dim=embed_dim, num_heads=num_heads)
self.inducing_vectors = nn.Parameter(0.02 * torch.randn(1, n_inducing, embed_dim))

kv = self.tfm1(self.inducing_vectors.expand(q.shape[0], -1, -1), q if kv is None else kv, kv_max_idx=kv_max_idx)
return self.tfm2(q, kv, q_max_idx=q_max_idx)</code></pre></div><p>The first transformer call lets the learned inducing vectors query the training rows and produce a compact set of latent summaries in the default path:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">training-row tokens -&gt; n_inducing latent summaries</code></pre></div><p>The second transformer call lets the original row tokens attend back to those summaries. The output still has shape <code>(batch, rows, cols, embed_dim)</code>. So this block is not the final \(n\times m\) to \(n\) compression step; it is the column-wise contextualization step that prepares better tokens for row compression.</p><p>The full TabICLv2 implementation exposes an <code>embed_with_test</code> option for cases where one deliberately allows the column embedder&#8217;s inducing points to attend to train and test rows together. NanoTabICL follows the default no-leakage path shown above.</p><h3>Row-wise block</h3><p>Second, \(\text{TF}_\text{row}\) adds learned row-level \(<code>[CLS]\)</code> tokens and attends across feature positions within each row:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb = torch.cat([self.row_cls_tokens.expand(n_batch, n_rows, -1, -1), emb], dim=2)
for block in self.row_blocks[:-1]:
    emb = block.row_attn(emb)
emb = self.row_blocks[-1].row_attn(emb, q_max_idx=self.row_cls_tokens.size(-2))
emb = self.row_ln(emb).flatten(-2, -1)</code></pre></div><p>Here <code>row_attn</code> keeps the row fixed and attends across the column axis. The prepended <code>row_cls_tokens</code> are learned summary positions. They are not labels and they are not additional input features; they are readout slots that can attend to the grouped feature tokens in the same row.</p><p>NanoTabICL prepends these CLS tokens with <code>torch.cat</code>, which gives the following shape transition:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6mbD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6mbD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 424w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 848w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 1272w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6mbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png" width="1456" height="374" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:374,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110685,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200640745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6mbD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 424w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 848w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 1272w, https://substackcdn.com/image/fetch/$s_!6mbD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19456ca7-2730-4dc0-b8b6-50a143df12d1_1709x439.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The final row block is called with <code>q_max_idx=self.row_cls_tokens.size(-2)</code>, which means the model only computes outputs for the CLS positions in that last row-attention pass. The grouped feature-token outputs are no longer needed after they have contributed to the CLS summaries.</p><p>That flattened size is <code>icl_dim</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">icl_dim = embed_dim * n_cls_cols</code></pre></div><p>So the row-wise block is the implementation's main compression point:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, rows, cols, embed_dim)
    -&gt; (batch, rows, icl_dim)</code></pre></div><p>The full <code>tabicl</code> implementation avoids the explicit <code>torch.cat</code> by reserving CLS slots during column embedding and then replacing those reserved slots with learned CLS tokens before row attention. In that implementation, the shape before row interaction is closer to <code>(batch, rows, grouped_feature_positions + reserved_cls_slots, embed_dim)</code>. The mathematical role is the same: these positions are learned readout slots whose final outputs are concatenated into the row representation.</p><h3>ICL block</h3><p>Third, \(\text{TF}_\text{icl}\) performs in-context learning over row tokens. This is where NanoTabICL injects targets a second time:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb[:, :n_train] += self.y_embed_icl(y[:, :, None])
for block in self.icl_blocks[:-1]:
    emb = block(emb, kv_max_idx=n_train)
emb = self.icl_blocks[-1](emb[:, n_train:], emb[:, :n_train])
return self.out_mlp(self.out_ln(emb))</code></pre></div><p>At this point, before the second target injection, <code>emb</code> has shape:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, rows, icl_dim)</code></pre></div><p>The call <code>self.y_embed_icl(y[:, :, None])</code> returns a tensor with shape <code>(batch, n_train, icl_dim)</code>. Adding it to <code>emb[:, :n_train]</code> marks the training row tokens as labeled context examples. Test rows are left without a target embedding.</p><p>The intermediate ICL blocks again use <code>kv_max_idx=n_train</code>, so every row is updated by attending only to training rows as keys and values. In the final ICL block, NanoTabICL avoids computing outputs for training rows because only test predictions are needed:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">queries:      emb[:, n_train:]   -&gt; test rows
keys/values:  emb[:, :n_train]   -&gt; training rows</code></pre></div><p>So the final output contains predictions only for the test rows:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">(batch, n_test, out_dim)</code></pre></div><p>That last-block shortcut is a NanoTabICL simplification for the hands-on implementation. The architectural statement is broader: \(\text{TF}_\text{icl}\) uses labeled training row representations as context to produce test-row states, and the output head decodes those states into predictions.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p28-architecture-of-tabiclv2-compression-then-icl?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p28-architecture-of-tabiclv2-compression-then-icl?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Summary</h2><p>Compression-then-ICL is the part of TabICLv2 that turns a grid of target-aware feature tokens into row-level predictions. The column-wise transformer first lets each grouped feature position learn from labeled examples across rows. The row-wise transformer then uses learned \([\text{CLS}]\) tokens to compress each row from \(m\) feature-position tokens into a fixed-dimensional row representation. Finally, the ICL transformer operates over row tokens, adding target information to training rows and letting test rows attend to that labeled context.</p><p>The main idea is separation of roles: feature and row processing happen before dataset-wise prediction, so the final ICL stage does not need to attend over every cell-level token.</p><p>One detail this post has mostly treated as a black box is query-aware scalable softmax (QASSMax). The actual TabICLv2 architecture uses QASSMax in the induced column attention and the ICL transformer for long-context scaling; the label-flow story above is unchanged. The next post covers QASSMax, the attention-scaling mechanism TabICLv2 uses to preserve selective attention as the number of context samples grows.</p><h2>Quiz</h2><p>Take the quiz below to test your understanding, and share your answers and doubts in the comments. The difficulty increases from question 1 to question 10 in increasing order of the question number.</p><ol><li><p>What are the three main stages in TabICLv2&#8217;s compression-then-ICL pipeline?</p></li><li><p>What does the tensor \(E_2\in\mathbb{R}^{n\times m\times d}\) represent before compression begins?</p></li><li><p>Why are target embeddings added to training-row feature tokens but not to test-row feature tokens?</p></li><li><p>In \(\text{TF}_\text{col}\), which axis is attended over for a fixed grouped feature position \(j\)?</p></li><li><p>What role do learned inducing tokens play in the column-wise Set Transformer block?</p></li><li><p>Why does the default no-leakage setting build column-wise summaries from training rows only?</p></li><li><p>How do the learned \([\text{CLS}]\) tokens in \(\text{TF}_\text{row}\) help convert an \(n\times m\times d\) tensor into row-level representations?</p></li><li><p>Explain why NanoTabICL injects target information twice: once before column-wise processing and once before the ICL block.</p></li><li><p>In the final ICL block, why can the implementation use test rows as queries and training rows as keys and values instead of computing outputs for every row?</p></li><li><p>Suppose \(\text{TF}_\text{icl}\) attended over every cell-level token instead of compressed row tokens. What computational and modeling tradeoffs would this create compared with the compression-then-ICL design?</p></li></ol>]]></content:encoded></item><item><title><![CDATA[ [P27] Architecture of TabICLv2: target-aware embedding]]></title><description><![CDATA[How TabICLv2 uses target-aware embedding to add training labels to tabular in-context learning tokens while preventing label leakage in test rows.]]></description><link>https://newsletter.dsaiengineering.com/p/p27-architecture-of-tabiclv2-target-aware-embedding</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p27-architecture-of-tabiclv2-target-aware-embedding</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Tue, 02 Jun 2026 18:00:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4oFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous post, we looked at the first step in TabICLv2&#8217;s architecture: repeated feature grouping. That step gives each feature position a small amount of neighboring-column context, helping the model avoid collapsing similar-looking columns into nearly identical representations while still preserving \(m\) effective feature positions.</p><p>But repeated feature grouping is still feature-only. It tells the model more about how columns sit next to other columns, but it does not yet tell the model which rows produced which outcomes. The next architectural step, target-aware embedding, adds that supervised signal for training rows.</p><p>The following figure shows the full TabICLv2 architecture. This post focuses on the target-aware embedding block, immediately after repeated feature grouping.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4oFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4oFt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4oFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;TabICLv2 architecture; this post covers target-aware embedding.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200333302?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TabICLv2 architecture; this post covers target-aware embedding." title="TabICLv2 architecture; this post covers target-aware embedding." srcset="https://substackcdn.com/image/fetch/$s_!4oFt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!4oFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18176d47-adcb-4ec4-a6a5-8530e4485650_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TabICLv2 architecture; this post covers target-aware embedding.</figcaption></figure></div><p>In this post, we start from the feature-only tensor \(E_1\), add observed targets only to training-row tokens, and leave test rows unlabeled.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Target-aware embedding</h2><h3>Starting point: feature-only tokens \(E_1\)</h3><p>Start with the feature matrix after preprocessing/normalization:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;X=(x_{ij})\\in\\mathbb{R}^{n\\times m},&quot;,&quot;id&quot;:&quot;BCPBDOAOAM&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(n\) is the number of rows, \(m\) is the number of feature columns, and \(x_{ij}\) is the value of feature \(j\) in row \(i\). In the previous post, repeated feature grouping produced a feature-token tensor</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_1\\in\\mathbb{R}^{n\\times m\\times d},&quot;,&quot;id&quot;:&quot;MXBZJZLTBI&quot;}" data-component-name="LatexBlockToDOM"></div><p>containing one \(d\)-dimensional embedding for each row \(i\in\{1,\ldots,n\}\) and each group position \(j\in\{1,\ldots,m\}\). The token</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_1[i,j]\\in\\mathbb{R}^d&quot;,&quot;id&quot;:&quot;ZMKWBRSDRS&quot;}" data-component-name="LatexBlockToDOM"></div><p>therefore represents row \(i\) at grouped feature position \(j\). At this point the representation is still feature-only. It encodes the input values \(X\), including the local feature context introduced by repeated feature grouping, but it does not yet include the observed targets of the training rows.</p><h3>The operation: add a target embedding to every training-row token</h3><p>Target-aware embedding changes \(E_1\) into a target-aware tensor</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2\\in\\mathbb{R}^{n\\times m\\times d}&quot;,&quot;id&quot;:&quot;CQJIXEOKOP&quot;}" data-component-name="LatexBlockToDOM"></div><p>by adding an embedding of the observed target \(y_i\) to each feature token in training row \(i\). This is an elementwise vector addition in the same \(d\)-dimensional token space.</p><p>To write the operation cleanly, let \(n_\text{train}\) be the number of training rows. In the code implementation, training rows are placed first, so</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{I}_\\text{train}=\\{1,\\ldots,n_\\text{train}\\}&quot;,&quot;id&quot;:&quot;YEPFOHLDSK&quot;}" data-component-name="LatexBlockToDOM"></div><p>is the set of row indices whose targets are observed. The test rows are</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{I}_\\text{test}=\\{n_\\text{train}+1,\\ldots,n\\},&quot;,&quot;id&quot;:&quot;CYFURIWUQL&quot;}" data-component-name="LatexBlockToDOM"></div><p>whose targets must be predicted. For \(i\in\mathcal{I}_\text{train}\), \(y_i\in\mathcal{Y}\) denotes the observed target for row \(i\), where \(\mathcal{Y}\) is the target space. In classification, \(\mathcal{Y}\) is a finite set of class labels; in regression, \(\mathcal{Y}\subseteq\mathbb{R}\).</p><p>Define the target-aware embedding map</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\operatorname{Embed}_\\text{TAE}:\\mathcal{Y}\\rightarrow\\mathbb{R}^d.&quot;,&quot;id&quot;:&quot;MRTGXGIUVM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then define a row-level vector</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u_i=\n\n\\begin{cases}\n\n\\operatorname{Embed}_\\text{TAE}(y_i), &amp; i\\in \\mathcal{I}_\\text{train},\\\\\n\n\\mathbf{0}_d, &amp; i\\notin \\mathcal{I}_\\text{train},\n\n\\end{cases}&quot;,&quot;id&quot;:&quot;IKCAFTFIVR&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\mathbf{0}_d\in\mathbb{R}^d\) is the zero vector. The target-aware representation is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]=E_1[i,j]+u_i,\n\n\\qquad i=1,\\ldots,n,\\quad j=1,\\ldots,m,&quot;,&quot;id&quot;:&quot;DZIREYSGKE&quot;}" data-component-name="LatexBlockToDOM"></div><p>where for a training row,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]=E_1[i,j]+\\operatorname{Embed}_\\text{TAE}(y_i),\n\n\\qquad i\\in\\mathcal{I}_\\text{train},&quot;,&quot;id&quot;:&quot;RTZXSHCVQC&quot;}" data-component-name="LatexBlockToDOM"></div><p>while for a test row,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]=E_1[i,j],\n\n\\qquad i\\in\\mathcal{I}_\\text{test}.&quot;,&quot;id&quot;:&quot;YBDODKTOZS&quot;}" data-component-name="LatexBlockToDOM"></div><p>For each training row \(i\), TabICLv2 computes one target vector and broadcasts it across all \(m\) feature tokens in that row. Test rows receive the zero vector instead. This boundary is essential: for test rows, \(y_i\) is the value the model must infer, so adding \(\operatorname{Embed}_\text{TAE}(y_i)\) would leak the answer.</p><h3>Classification vs regression implementations</h3><p>The embedding map depends on the prediction task. For classification, it maps discrete class labels to label vectors; for regression, it maps a scalar target to the token space.</p><p>For classification with \(K\leq K_{\max}\) classes, assume labels have been encoded as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_i\\in\\{0,\\ldots,K-1\\}.&quot;,&quot;id&quot;:&quot;CIRWYWHVEG&quot;}" data-component-name="LatexBlockToDOM"></div><p>In TabICLv2, \(K_{\max}=10\) for the pretrained small-class target encoder. The embedding map can be viewed as a learned lookup table</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;W_\\text{cls}\\in\\mathbb{R}^{K_{\\max}\\times d},\n\n\\qquad\n\n\\operatorname{Embed}_\\text{TAE}(y_i)=W_\\text{cls}[y_i].&quot;,&quot;id&quot;:&quot;QBJGSMCEYM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(W_\text{cls}[k]\in\mathbb{R}^d\) is the target vector associated with class \(k\). The full TabICLv2 implementation realizes this with a one-hot-plus-linear layer, which is equivalent to selecting a learned vector for each supported class. NanoTabICL uses an <code>nn.Embedding</code> wrapper for the same lookup-table idea.</p><p>Tasks with more than \(K_{\max}=10\) classes need extra handling. The full TabICLv2 implementation uses mixed-radix ensembling for the target-aware column-embedding stage and hierarchical classification later in the ICL stage. That many-class machinery is outside this post&#8217;s small-class view; here we focus on the standard \(K\leq 10\) case.</p><p>For regression, where \(y_i\in\mathbb{R}\), the target embedding is a learned linear layer, which can be written as an affine map from the scalar target to the \(d\)-dimensional token space:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\operatorname{Embed}_\\text{TAE}(y_i)=a y_i+b,\n\n\\qquad a,b\\in\\mathbb{R}^d,&quot;,&quot;id&quot;:&quot;LMODGPRKOL&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(a\) and \(b\) are learned vectors. In both classification and regression, the output of \(\operatorname{Embed}_\text{TAE}\) lives in \(\mathbb{R}^d\), the same space as the feature token \(E_1[i,j]\). That is why direct addition is well-defined.</p><h3>Why add to tokens instead of appending a target column?</h3><p>If the target were appended as a separate token, the feature-token count would change from \(m\) to \(m+1\). Target-aware addition keeps the shape fixed:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{shape}(E_2)=\\text{shape}(E_1)=n\\times m\\times d.&quot;,&quot;id&quot;:&quot;ZZEUMPJNMB&quot;}" data-component-name="LatexBlockToDOM"></div><p>The label information is therefore available at every grouped feature token before the column-wise and row-wise transformer stages, but the architecture still processes \(m\) feature positions rather than \(m+1\). This is different from the TabPFNv2-style target-column appending, where the target is represented as an additional column alongside the feature columns.</p><h3>Why this helps before the transformers run</h3><p>Keeping the shape fixed is the computational benefit. The representational benefit connects back to representation collapse, but from a different angle than repeated feature grouping.</p><p>The next stage is column-wise processing: for each grouped feature position \(j\), the transformer processes that position across many rows. Because training rows now carry embedded targets, the column-wise transformer can see feature patterns together with observed outcomes.</p><p>Let \(Y\) denote the target random variable. In the previous post, we considered two feature random variables \(X_a\) and \(X_b\) with similar marginal distributions,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P_{X_a}\\approx P_{X_b},&quot;,&quot;id&quot;:&quot;OZHTLMEKUO&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(P_{X_j}\) denotes the marginal distribution of feature \(X_j\). Similar marginals do not imply similar predictive roles. The two features can still have different target relationships:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(Y\\mid X_a=x)\\neq P(Y\\mid X_b=x)&quot;,&quot;id&quot;:&quot;CBSBIVJFAO&quot;}" data-component-name="LatexBlockToDOM"></div><p>for some values \(x\). Repeated feature grouping helps with the feature-feature part of this problem: a token is not built from one isolated scalar. Target-aware embedding helps with the feature-target part: during column-wise processing, training-row tokens carry both a feature representation and the observed outcome for that row.</p><p>The important nuance is that target-aware embedding does not distinguish feature positions within the same row by itself. The same vector \(\operatorname{Embed}_\text{TAE}(y_i)\) is added to every grouped feature token in row \(i\). The feature-position information still comes from \(E_1[i,j]\); the target-aware term supplies row-level supervised context.</p><p>This row-level supervised context becomes useful across examples. For training rows \(i\) and \(r\) with different targets,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]-E_1[i,j]=\\operatorname{Embed}_\\text{TAE}(y_i),\n\n\\qquad\n\nE_2[r,j]-E_1[r,j]=\\operatorname{Embed}_\\text{TAE}(y_r).&quot;,&quot;id&quot;:&quot;GQBSTIKCZR&quot;}" data-component-name="LatexBlockToDOM"></div><p>So, for each grouped feature position \(j\), the column-wise transformer receives examples of the form &#8220;this feature pattern occurred in a row with this target.&#8221; Across many rows, that makes feature-target association available earlier than it would be in a purely feature-only embedding.</p><p>Informally, for a training row with feature vector \(x_i=(x_{i1},\ldots,x_{im})\), the representation changes from a feature-only encoding to a feature-target encoding:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_1[i,\\cdot]\\approx \\phi(x_i),\n\n\\qquad\n\nE_2[i,\\cdot]\\approx \\psi(x_i,y_i).&quot;,&quot;id&quot;:&quot;JWJIPXVFOO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(E_1[i,\cdot]\) and \(E_2[i,\cdot]\) denote all grouped feature tokens for row \(i\), while \(\phi\) and \(\psi\) are informal names for feature-only and feature-target representation functions. The feature-target encoding applies only to training rows; test rows still carry \(E_1\) at this stage. This is the first target injection in TabICLv2. A second target embedding is added later, after row aggregation, before dataset-wise ICL.</p><h3>Implementation in NanoTabICL</h3><p>NanoTabICL implements target-aware embedding with two pieces: a task-dependent target embedder and one masked in-place addition to the training rows.</p><p>During initialization, the target embedder depends on whether the model is configured for classification or regression:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">self.y_embed_in = (
    ClassEmbedding(max_classes, embed_dim)
    if max_classes &gt; 0
    else nn.Linear(1, embed_dim)
)</code></pre></div><p>For classification, <code>ClassEmbedding</code> is a learnable lookup table:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class ClassEmbedding(nn.Embedding):
    def reset_parameters(self) -&gt; None:
        nn.init.uniform_(self.weight, -1/math.sqrt(self.num_embeddings), 1/math.sqrt(self.num_embeddings))

    def forward(self, y: torch.Tensor) -&gt; torch.Tensor:
        return super().forward(y.squeeze(-1).long())</code></pre></div><p>The call to <code>long()</code> turns labels such as <code>0</code>, <code>1</code>, or <code>2</code> into embedding-table indices. The custom initialization matches the <code>nn.Linear</code> weight initialization scale. For regression, <code>nn.Linear(1, embed_dim)</code> implements the affine scalar-to-vector map \(a y_i+b\).</p><p>The actual target-aware update is one line in <code>forward</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">emb[:, :n_train] += self.y_embed_in(y[:, :, None, None])</code></pre></div><p>The shape logic is the key to reading this line:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nxxx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nxxx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 424w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 848w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 1272w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nxxx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png" width="1412" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:109818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200333302?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nxxx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 424w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 848w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 1272w, https://substackcdn.com/image/fetch/$s_!nxxx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d506-f9ac-4001-801e-4bf744675396_1412x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PyTorch broadcasting adds that target vector across all <code>cols</code> grouped feature positions in the corresponding training row. This is the code version of:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_2[i,j]=E_1[i,j]+\\operatorname{Embed}_\\text{TAE}(y_i),\n\\qquad i\\in\\mathcal{I}_\\text{train}.&quot;,&quot;id&quot;:&quot;GTRWQLMFUS&quot;}" data-component-name="LatexBlockToDOM"></div><p>The masking boundary is the important safety rule. NanoTabICL never indexes <code>emb[:, n_train:]</code> in this addition, so test rows remain feature-only at this stage:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">training rows: feature token + target embedding
test rows:     feature token only</code></pre></div><p>That single slice, <code>:n_train</code>, is what prevents target leakage in the implementation.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p27-architecture-of-tabiclv2-target-aware-embedding?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p27-architecture-of-tabiclv2-target-aware-embedding?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2><strong>Summary</strong></h2><p>After target-aware embedding, training-row feature tokens carry supervised context, while test-row feature tokens remain unlabeled. TabICLv2 gets this effect by adding one target embedding to every grouped feature token in each labeled row, without changing the \(n\times m\times d\) tensor shape produced by repeated feature grouping.</p><p>The result is \(E_2\): a target-aware feature-token tensor ready for the compression-then-ICL pipeline. The next post covers how TabICLv2 compresses these tokens into row representations, adds target information again at the row-token level for labeled rows, and then performs dataset-wise in-context learning.</p>]]></content:encoded></item><item><title><![CDATA[[P26] Architecture of TabICLv2: repeated feature grouping]]></title><description><![CDATA[A technical guide to TabICLv2 repeated feature grouping: why similar columns confuse encoders, how circular shifts add context, with NanoTabICL implementation.]]></description><link>https://newsletter.dsaiengineering.com/p/p26-architecture-of-tabiclv2-repeared-feature-grouping</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p26-architecture-of-tabiclv2-repeared-feature-grouping</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 01 Jun 2026 13:05:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_8so!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With this post, I am starting a six-part miniseries on the architecture of TabICLv2. The goal is to cover the architecture one subsection at a time, so each post can focus on the details needed to understand that component without making a single article too long. The reference for all posts in this miniseries is the <a href="https://arxiv.org/pdf/2602.11139">TabICLv2 paper (arXiv)</a>. For a hands-on demo, I will use the <a href="https://github.com/soda-inria/nanotabicl/blob/main/model.py">NanoTabICL implementation</a> as the code companion for this miniseries. It is a short (~170 lines of code) self-contained implementation of the TabICLv2 architecture for educational and experimental purposes. It&#8217;s a good point to start before diving into the full model code.</p><p>The following figure illustrates the architecture of TabICLv2. This post covers repeated feature grouping. Later posts will cover target-aware embedding, column/row transformers, QASSMax, and the prediction heads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_8so!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_8so!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!_8so!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!_8so!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!_8so!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_8so!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png" width="849" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:849,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204801,&quot;alt&quot;:&quot;TabICLv2 architecture; this post covers only repeated feature grouping.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200113120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TabICLv2 architecture; this post covers only repeated feature grouping." title="TabICLv2 architecture; this post covers only repeated feature grouping." srcset="https://substackcdn.com/image/fetch/$s_!_8so!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 424w, https://substackcdn.com/image/fetch/$s_!_8so!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 848w, https://substackcdn.com/image/fetch/$s_!_8so!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 1272w, https://substackcdn.com/image/fetch/$s_!_8so!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9188e3c-be97-44af-b42a-3bc549bc4094_849x878.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TabICLv2 architecture; this post covers only repeated feature grouping.</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><h2>Repeated feature grouping</h2><p>Let&#8217;s assume a dataset with the following properties and notation:</p><ul><li><p>\(n\) rows and \(m\) columns, </p></li><li><p>feature random variables denoted as \((X_1,\ldots,X_m),\)  </p></li><li><p>a target random variable denoted as \(Y\),  </p></li><li><p>the value of feature \(j\in\{1,\ldots,m\}\) in row \(i\in\{1,\ldots,n\}\) denoted as \(x_{ij}\), and</p></li><li><p>the full \(j\)-th column denoted as \(x_{\cdot j}=(x_{1j},\ldots,x_{nj})\).</p></li></ul><p>TabICLv2 first asks how to represent the features in a table.</p><h3>The problem: similar columns, different roles</h3><p>In tabular data, two features can have similar marginal distributions,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P_{X_a}\\approx P_{X_b},&quot;,&quot;id&quot;:&quot;LSUWFUZNSB&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(P_{X_j}\) denotes the marginal distribution of feature \(X_j\). But similar marginals do not imply similar predictive roles. For example, <code>days_since_signup</code> and <code>days_since_last_purchase</code> may both be positive, right-skewed variables, but their relationships to churn can be very different. This creates a representation problem: before the model can reason over feature interactions, its initial feature embeddings must preserve enough information to tell features apart.</p><h4>Independent column embedding collapses</h4><p>TabICL-style feature embedding can process each feature with the same encoder. A simplified way to write such an independent column encoder is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\phi:\\mathbb{R}^n\\rightarrow\\mathbb{R}^d,&quot;,&quot;id&quot;:&quot;GRWNVIHZPF&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(d\) is the embedding dimension. The column vector \(x_{\cdot j}\) is mapped to a feature representation</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;e_j=\\phi(x_{\\cdot j}).&quot;,&quot;id&quot;:&quot;DIEDBXKNKB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each column is embedded on its own, without looking at neighboring columns. But if \(\phi\) mostly sees each feature through its own values, then two columns with similar empirical distributions may be mapped to similar embeddings:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\|e_a-e_b\\|_2 \\approx 0\n\n\\quad \\text{or} \\quad\n\n\\cos(e_a,e_b)\\approx 1.&quot;,&quot;id&quot;:&quot;WXTNKQBIJM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(\|\cdot\|_2\) is Euclidean distance and \(\cos(e_a,e_b)\) is cosine similarity. This is the representation-collapse problem: distinct features become nearly indistinguishable in representation space even though their semantics, correlations, or target relationships differ.</p><p>The issue here is that the marginal distribution is not the entire identity of a feature. In reality, a feature is also characterized by its joint behavior with other features and with the target. Individual feature embedding can underuse this context. Repeated feature grouping addresses the feature-feature part of that context; the next post covers how target information enters the representation.</p><h4>Grouping neighboring columns can lead to information loss</h4><p>TabPFNv2 and TabPFN-2.5 mitigate this collapse by grouping multiple columns into single tokens. But, although grouping gives each feature token some neighboring-feature context, it also reduces the number of effective feature tokens, which can discard fine-grained feature information. The tradeoff is that one token now represents several original columns, so the model has fewer distinct feature positions to work with.</p><p>Downstream attention layers then operate on coarser tokens. If fine-grained feature identity is weakened early, later layers have less direct information with which to distinguish individual columns. TabICLv2 uses repeated feature grouping to keep the contextualization benefit while preserving \(m\) effective feature positions.</p><h3>TabICLv2&#8217;s fix: grouping with circular shifts</h3><p>For a table with \(m\) feature columns, TabICLv2 creates \(m\) groups with \((0,1,3)\) offset pattern relative to every feature with circular shifts. This is why the method is called repeated feature grouping.</p><p>To understand this mathematically, let us define the mapper</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho(t)=1+((t-1)\\bmod m),&quot;,&quot;id&quot;:&quot;RWJXGHBNPK&quot;}" data-component-name="LatexBlockToDOM"></div><p>so \(\rho(t)\) maps any integer \(t\) back into the column index set \(\{1,\ldots,m\}\). Then, the group anchored at feature \(j\) contains columns</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\big(j,\\rho(j+1),\\rho(j+3)\\big).\n\n&quot;,&quot;id&quot;:&quot;XROMVJOVXM&quot;}" data-component-name="LatexBlockToDOM"></div><p>For example, for \(m=5\) columns:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8dJ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8dJ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 424w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 848w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8dJ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png" width="1410" height="420" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33617,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200113120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8dJ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 424w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 848w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c819d9b-aed7-4e3d-9ec8-2009cc7a63a9_1410x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here, feature 1 appears as anchor in group 1, as offset \(+1\) in group 5, and as offset \(+3\) in group 3.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bm8P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bm8P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 424w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 848w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 1272w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bm8P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png" width="401" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:401,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103858,&quot;alt&quot;:&quot;*Repeated feature grouping by TabICLv2.*&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200113120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="*Repeated feature grouping by TabICLv2.*" title="*Repeated feature grouping by TabICLv2.*" srcset="https://substackcdn.com/image/fetch/$s_!bm8P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 424w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 848w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 1272w, https://substackcdn.com/image/fetch/$s_!bm8P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e61cf08-a349-456b-88f0-d2fcc6862aee_401x878.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Repeated feature grouping by TabICLv2.</figcaption></figure></div><p>For row \(i\), the grouped input for position \(j\) is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;g_j(i)=\\left(x_{i,j},x_{i,\\rho(j+1)},x_{i,\\rho(j+3)}\\right),&quot;,&quot;id&quot;:&quot;MFBKEWMNYC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where the vector \(g_j(i)\in\mathbb{R}^3\) contains three scalar feature values from the same row. For example, \(g_1(i)=(x_{i,1},x_{i,2},x_{i,4})\) takes three values from column \(j\), \(j+1\), and \(j+3\) of row \(i\). Each group is encoded by a shared linear map</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Lin}: \\mathbb{R}^3\\rightarrow\\mathbb{R}^d,&quot;,&quot;id&quot;:&quot;ORSKCQSVUZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>producing</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;E_1[i,j]=\\text{Lin}(g_j(i)).&quot;,&quot;id&quot;:&quot;AJPACJBHUT&quot;}" data-component-name="LatexBlockToDOM"></div><p>The resulting tensor \(E_1\in\mathbb{R}^{n\times m\times d}\) contains one \(d\)-dimensional embedding for each row \(i\) and each group position \(j\). Put simply, one shared linear map turns each 3-value group into a \(d\)-dimensional token.</p><p>With the shift pattern \((0,1,3)\), for \(m \geq 7\) columns, no pair of columns appears together in more than one group. This gives each feature several contextual views without repeatedly coupling the same feature pairs. The result is a representation that helps prevent feature symmetries between neighboring columns while preserving \(m\) effective feature positions.</p><h3>Implementation in NanoTabICL</h3><p>In NanoTabICL, repeated feature grouping happens at the start of <code>NanoTabICLv2.forward</code>. The relevant model parameters are set during initialization:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"> self.feature_group_size = feature_group_size
 self.x_embed = nn.Linear(feature_group_size, embed_dim)</code></pre></div><p>The default <code>feature_group_size</code> is 3, so <code>self.x_embed</code> is a shared linear map from a 3-value feature group into the token dimension. This corresponds to the mathematical map</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Lin}: \\mathbb{R}^3\\rightarrow\\mathbb{R}^d.&quot;,&quot;id&quot;:&quot;JQBDECQCBO&quot;}" data-component-name="LatexBlockToDOM"></div><p>The grouping itself is implemented by indexing shifted versions of the column axis:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"> idxs = torch.arange(n_cols, dtype=torch.long, device=x.device)
 x = torch.stack(
     [x[:, :, (idxs + (2 ** i - 1)) % n_cols]
      for i in range(self.feature_group_size)],
     dim=-1,
 )
 emb = self.x_embed(x)</code></pre></div><p>The expression <code>(2 ** i - 1)</code> is the code version of the offset pattern. With <code>feature_group_size=3</code>, the loop uses:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_1rp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_1rp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 424w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 848w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 1272w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_1rp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png" width="1410" height="315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:315,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200113120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_1rp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 424w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 848w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 1272w, https://substackcdn.com/image/fetch/$s_!_1rp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20fc3911-4edc-4287-a664-e61c8dc60ebd_1410x315.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The modulo operation <code>% n_cols</code> is the circular wraparound. If <code>idxs = [0, 1, 2, 3, 4]</code>, then the offset <code>3</code> gives <code>[3, 4, 0, 1, 2]</code>, so the last columns wrap back to the beginning.</p><p>The shape transition is the main thing to notice:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XzFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XzFo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 424w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 848w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 1272w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XzFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png" width="1410" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d41693bd-c972-492d-aed0-415d3fddf874_1410x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68267,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/200113120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XzFo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 424w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 848w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 1272w, https://substackcdn.com/image/fetch/$s_!XzFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41693bd-c972-492d-aed0-415d3fddf874_1410x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So NanoTabICL keeps the same number of column positions, <code>cols</code>, but each position has already looked at a small circular group of neighboring columns. That is the implementation counterpart of preserving \(m\) effective feature slots while giving each slot local feature context.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p26-architecture-of-tabiclv2-repeared-feature-grouping?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p26-architecture-of-tabiclv2-repeared-feature-grouping?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>Summary</h2><p>Repeated feature grouping addresses a core weakness of independently embedding tabular features: columns with similar value distributions can become hard to distinguish. TabICLv2 groups each feature with shifted companion features, giving the model multiple contextual views while keeping the number of effective feature positions unchanged. The next post covers target-aware embedding, the step where TabICLv2 injects observed targets into the feature representations of training rows.</p>]]></content:encoded></item><item><title><![CDATA[[P25] SAM Platform: Orchestration + ML for Trading PyPI Libraries]]></title><description><![CDATA[Developing a production-oriented trading system on top of ML4T libraries.]]></description><link>https://newsletter.dsaiengineering.com/p/p25-sam-platform-orchestration-and-ml-for-trading-pypi-lib</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p25-sam-platform-orchestration-and-ml-for-trading-pypi-lib</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 25 May 2026 17:57:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!smaU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a href="https://newsletter.dsaiengineering.com/p/p23-introducing-sam">P23</a>, I introduced SAM as a personal quantitative research and engineering platform to apply the ML and quantitative finance concepts discussed in the newsletter. In P23, I made SAM work to generate a simple daily brief. After that, I tried implementing in SAM the volatility-regime forecasting workflow discussed in <a href="https://newsletter.dsaiengineering.com/p/p19-volatility-regime-forecasting-tabpfn-tabicl-classical-ml-models-2">P19</a>, but I wasn&#8217;t satisfied with the results. I felt I lacked the domain knowledge to make it work in a way that&#8217;s realistic.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/subscribe?"><span>Subscribe now</span></a></p><p>Yesterday, I <a href="https://www.linkedin.com/posts/msaharan_machinelearning-quantitativefinance-algorithmictrading-activity-7464368821412282368-mwld?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">came across</a> the <a href="https://ml4trading.io/">ML for Trading book</a> by <a href="https://www.linkedin.com/in/applied-ai/">Stefan Jansen</a> through a note posted by <a href="http://substack.com/@ml4trading">him on Substack</a>. It caught my attention because I had been looking for something like it to learn the field and was also trying to find coding resources to build SAM upon. I found both in this one.</p><p>The following picture shows a screenshot from the official website of the book. It covers much of the classical ML material that I was planning to discuss in this newsletter already in addition to tabular foundation models. Moreover, it covers many of the production and workflow pieces I wanted to build into SAM to make it a quant research and engineering platform for personal and educational use.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!smaU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!smaU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 424w, https://substackcdn.com/image/fetch/$s_!smaU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 848w, https://substackcdn.com/image/fetch/$s_!smaU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 1272w, https://substackcdn.com/image/fetch/$s_!smaU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!smaU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:801775,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/199218829?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!smaU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 424w, https://substackcdn.com/image/fetch/$s_!smaU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 848w, https://substackcdn.com/image/fetch/$s_!smaU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 1272w, https://substackcdn.com/image/fetch/$s_!smaU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4967b8f1-c28a-4797-8b43-ef2021d57cb3_1495x782.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The book also comes with several more add-ons that are highly relevant to today's AI-dominant workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OPG8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OPG8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 424w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 848w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 1272w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OPG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png" width="1218" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1218,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162892,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/199218829?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OPG8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 424w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 848w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 1272w, https://substackcdn.com/image/fetch/$s_!OPG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd7dec3d-09d9-4e34-8dc4-98aa1dafd8ea_1218x854.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The part I found immediately useful was the following set of <a href="https://github.com/orgs/ml4t/repositories">libraries</a>. The core libraries I looked at are MIT licensed and already contain a lot of wiring that I would otherwise have to do on my own, and I would certainly either get them wrong due to lack of domain knowledge or make a lot of mistakes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a9tg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a9tg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 424w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 848w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 1272w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a9tg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png" width="964" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:964,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80192,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/199218829?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a9tg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 424w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 848w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 1272w, https://substackcdn.com/image/fetch/$s_!a9tg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c8a38d8-ada5-4329-8a28-ea8075fbf7a9_964x655.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s likely to be a good idea to not reinvent the wheel. So, today, I rebuilt SAM around ML4T libraries.</p><p>The implementation details will change as SAM develops further, but the architectural decision is the durable part. For now, it&#8217;s enough to know that SAM is a thin orchestration layer on top of ML4T: SAM orchestrates; ML4T computes, validates, backtests, and executes. SAM owns the repeatable workflow layer: configuration, CLI routing, strategy selection, pipeline sequencing, manifests, promotion gates, and operator controls. ML4T owns the specialized trading machinery: market data access, feed specifications, feature engineering, signal diagnostics, model training surfaces, backtesting, live execution, broker wrappers, and risk guards.</p><p>If you are interested in looking at the code and playing with it, you can find today&#8217;s version <a href="https://github.com/msaharan/sam/tree/d00cd413522805dfd436e3a6efe0dd1d3ff7f540">here</a>. The current repo is still small, but the important change is visible in the structure: configs, pipelines, manifests, live runners, ops tools, and tests now sit around ML4T rather than replacing it. The screenshot below is SAM after the rebuild.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jvZs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jvZs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 424w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 848w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 1272w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jvZs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png" width="1252" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1252,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:167215,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/199218829?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jvZs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 424w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 848w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 1272w, https://substackcdn.com/image/fetch/$s_!jvZs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e518cec-3fca-49e4-85dd-752698c8f01c_1252x810.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This version is still early, but it gives SAM the right shape for further development. Check out the repo and let me know what you think.</p>]]></content:encoded></item><item><title><![CDATA[[P24] YouTube channel for explainer videos]]></title><description><![CDATA[Subscribe for videos about tabular foundation models, machine learning, data science, and quantitative finance.]]></description><link>https://newsletter.dsaiengineering.com/p/p24-youtube-channel-for-explainer</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p24-youtube-channel-for-explainer</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Fri, 22 May 2026 15:59:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/oEYl7P7MJiY" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I will be using my YT channel to talk about the content discussed in my newsletter in video format. </p><p>Here&#8217;s a video describing how I came to know about tabular foundation models, why I talk exclusively about them and not about other developments in the broad field of AI, and what I will be doing in my videos. </p><p>Subscribe to the channel to receive updates and share with other data, AI, and finance professionals who might be interested.</p><div id="youtube2-oEYl7P7MJiY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;oEYl7P7MJiY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/oEYl7P7MJiY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading [DS, AI, Engineering] Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[[P23] Introducing SAM: a personal quantitative research and engineering platform]]></title><description><![CDATA[A companion to the [DS, AI, Engineering] Newsletter.]]></description><link>https://newsletter.dsaiengineering.com/p/p23-introducing-sam</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p23-introducing-sam</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Thu, 21 May 2026 05:39:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!f3mC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the past month, I developed this series into something I had wanted it to become: an engineering-first exploration of the frontier of tabular ML, with a focus on quantitative finance for real-world use cases.</p><p>In my posts, I have tried my best to make the examples as close to reality as possible because I am interested in applying these lessons, conceptually and programmatically, to real-world problems.<br><br>I am happy with the depth my posts cover, and I have several more planned. However, after P22, I took a strategic break to reflect on the quality of the posts and code so I could improve future posts. One of the things I wanted to do was extract reusable concepts and save them in a self-contained page for future reference.<br><br>But concepts by themselves are of limited use. That led me to think it would be better to have a place where I could practice them regularly and solve a real problem at the same time.<br><br>This week, I took a step toward that and started building a quantitative research and engineering platform to analyze financial markets continuously and in an increasingly sophisticated manner over time. I am calling it SAM, and it is available under the Apache 2.0 license.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f3mC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f3mC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 424w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 848w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 1272w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f3mC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png" width="1259" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1259,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153979,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/198610017?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f3mC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 424w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 848w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 1272w, https://substackcdn.com/image/fetch/$s_!f3mC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F889de304-0637-40bf-ba8f-9234b22ca677_1259x813.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>README:</p><p>SAM is an open-core personal quant research and engineering platform and is a companion to the <a href="https://newsletter.dsaiengineering.com/">DSAIEngineering Newsletter</a>. The workflows and primitives described in the newsletter are implemented in SAM. Currently, it focuses on production-style US-listed ETF allocation, volatility/risk scoring, and future US cross-sectional equity ranking workflows. More functionality will be integrated from the newsletter into SAM to make it more capable over time.</p></blockquote><p>The idea for creating this platform has been on my mind since last year. I always envisioned using it in combination with Perplexity Finance: SAM could do comprehensive data analysis, modeling, simulations, and related quantitative workflows; Perplexity Finance could do broader market and geopolitical analysis; and together, they could be used to develop investment or trading strategies that a person could test with paper money to learn the process and test their understanding of the field.</p><p>Currently, SAM is capable of generating a daily report with the following contents:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown"># Daily SAM Brief - 2026-05-21

Research only. Not investment advice.

## Data Freshness and Validation

- Price last date: 2026-05-20
- Data freshness days: 1
- Validation failures: 0
- Model status: reused

## SPY Risk Regime

- Risk level: normal
- Realized volatility 20d: 0.1058
- Threshold: 0.2074
- Volatility / threshold: 0.5103
- Drawdown 20d: -0.0106
- MA distance 20d: 0.0165

## ETF Ranking

| score_rank | symbol | prediction | selected |
| --- | --- | --- | --- |
| 1 | SLV | 0.1835 | True |
| 2 | GLD | 0.0254 | True |
| 3 | SPY | 0.0237 | True |
| 4 | TLT | 0.0200 | True |
| 5 | EFA | 0.0187 | True |
| 6 | XLU | 0.0164 | False |
| 7 | DIA | 0.0146 | False |
| 8 | XLF | 0.0126 | False |
| 9 | LQD | 0.0123 | False |
| 10 | HYG | 0.0098 | False |

## Target Research Weights

| symbol | weight | score | score_rank |
| --- | --- | --- | --- |
| SLV | 0.2000 | 0.1835 | 1 |
| GLD | 0.2000 | 0.0254 | 2 |
| SPY | 0.2000 | 0.0237 | 3 |
| TLT | 0.2000 | 0.0200 | 4 |
| EFA | 0.2000 | 0.0187 | 5 |

## Turnover and Cost Diagnostics

- Previous weight date: 2026-05-20 00:00:00
- Turnover: 0.4000
- Estimated cost: 0.0002
- Gross exposure: 1.0000
- Net exposure: 1.0000
- Turnover breach: False

## Limitations

- Public market data can be revised and may differ from institutional data.
- Costs are simple basis-point estimates, not a market-impact model.
- The allocation view is a daily research snapshot for a monthly-horizon ETF workflow.
- TabPFN and TabICL are not required for this CPU-first daily brief.
</code></pre></div><p>Over time, I want SAM to perform more sophisticated analyses using traditional quantitative finance methods and ML methods.<br><br>If you are interested in playing with it, I invite you to check out the repository: <a href="https://github.com/msaharan/sam">github.com/msaharan/sam</a>, and let me know what you think.</p>]]></content:encoded></item><item><title><![CDATA[[P22] Tactical asset allocation with TabPFN, TabICL, and XGBoost - 3]]></title><description><![CDATA[Given information available at a monthly signal date, can TabPFN, TabICL, XGBoost, and simple allocation rules rank assets in a public ETF universe by next-month relative attractiveness?]]></description><link>https://newsletter.dsaiengineering.com/p/p22-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-3</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p22-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-3</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Tue, 12 May 2026 18:31:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZJDl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a href="https://open.substack.com/pub/dsaiengineering/p/p20-tactical-asset-allocation-with-tabpfn-tabicl-xgboost?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P20</a>, I converted tactical asset allocation into a supervised tabular ranking problem. Each row was an asset-month observation, the label indicated whether that asset belonged to the next-month top group, and TabPFN, TabICL, XGBoost, and deterministic allocation rules were evaluated as allocation scorers rather than price forecasters.</p><p>In <a href="https://open.substack.com/pub/dsaiengineering/p/p21-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-2?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P21</a>, I made that first workflow more demanding. The universe expanded from 9 ETFs to 25 ETFs, the target changed from top 3 of 9 to top 5 of 25, the base positive rate fell from about 0.333 to 0.200, the return convention moved from close-to-close to next-open-to-next-open, and the main feature set removed static ticker identity features.</p><p>That follow-up made the workflow more credible, but it also made the remaining gaps easier to see. The liquidity diagnostic path was thin. Feature sensitivity was mostly a lightweight linear-model check. Alternative objectives were still closer to audits of the one-month score than to retrained objective-specific experiments. The portfolio section had improved, but it still needed a cleaner distinction between score quality and allocation quality.</p><p>So this post is the final follow-up for this tactical-allocation mini-series. I keep the same 25-ETF, top-5, next-open-to-next-open problem from P21, but I strengthen the experimental design around the remaining issues that are addressable in a public notebook. I leave three limitations outside today&#8217;s scope: TabICL remains disabled, the dataset is still public rather than institutional point-in-time data, and tabular foundation model (TFM) embeddings are not tested. Those are real limitations, but this post is about making the rest of the testbench more complete.</p><p>I am treating this as a learning-in-public research notebook and worked example, not as a claim to have the final word on tactical allocation or tabular foundation models. I am learning as I go, and I want that to make the write-up more careful, not more tentative. The value of the post is in making the workflow inspectable enough that TFM labs, ML practitioners, data and AI teams, and quant readers can see the setup, the potential benefits, the tradeoffs, and the limitations clearly.</p><p>The empirical result is mixed in an informative way. TabPFN has the best completed row-level ranking result on the main target in this run. The TabPFN calibration-base scorer has holdout Average Precision of 0.302, and the final direct TabPFN scorer has AP 0.298, against a 0.200 base rate. XGBoost is above base rate but lower on this row-level view, with AP 0.252. However, the highest headline portfolio Sharpe is not TabPFN. It is the 12-month momentum top-k rule, with Sharpe 1.11. TabPFN portfolios are competitive and often close to the top, but they do not settle the allocation question against compact finance rules in this workflow.</p><p>For me, the conclusion is methodological rather than promotional. Direct TabPFN scoring still looks useful after the P21 workflow is strengthened, while the best deterministic allocation rule remains highly competitive at the portfolio layer. That is why this kind of notebook is worth building: it separates the modeling question, &#8220;can a pretrained tabular model rank asset-month rows?&#8221;, from the allocation question, &#8220;does that score become the best portfolio rule after turnover, risk, and portfolio construction are included?&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZJDl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZJDl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZJDl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png" width="1230" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:170019,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZJDl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!ZJDl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff504790d-f06d-429b-b9e6-d2c5ad459b90_1230x862.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The equity-curve figure previews the central tension in the post. The learned scores are useful, but the portfolio paths are close enough that the allocation layer has to be evaluated on its own terms. I discuss the Sharpe, drawdown, turnover, and uncertainty details after setting up the notation and experimental changes.</p><p>You can find the notebook in my GitHub repository <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260512-tabpfn-tabicl-tactical-asset-allocation-3.assets/saved-versions/tabpfn-tabicl-tactical-asset-allocation-20260512-v2.ipynb">here</a> or you can <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-tactical-asset-allocation-20260512">clone it directly</a> on Kaggle.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading [DS, AI, Engineering] Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Minimal notation for this follow-up</strong></h2><p>P20 gives the full formulation, and P21 restates the notation for the broader universe and next-open target. I will only repeat the pieces needed to read this post, so the later tables and diagnostics have a common language.</p><p>Each row is an asset-month pair \((i,t)\). The index \(i\) denotes an ETF, and \(t\) denotes a monthly signal date. The information set available at the signal date is \(\mathcal{F}_t\). The feature vector is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x_{i,t} = \\phi_i(\\mathcal{F}_t),&quot;,&quot;id&quot;:&quot;JQQBCDBQCW&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\phi_i(\cdot)\) is the feature-generation process for asset \(i\). In this notebook, the main features include trailing returns, volatility, downside volatility, drawdown, moving-average distance, beta, volume and dollar-volume proxies, high-low range proxies, VIX features, Treasury features, and other public market-state features. The main feature policy is identity-ablated, meaning static ticker identity and asset-group metadata are excluded from the headline predictive matrix.</p><p>The target uses the next-open-to-next-open convention introduced in P21. Let \(O^+_{i,t+1}\) denote the adjusted first tradable open after signal month \(t\), and let \(O^+_{i,t+2}\) denote the adjusted first tradable open after the next month. The one-month forward return used for the main target is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^{\\text{open}}_{i,t+1}\n\n=\n\n\\frac{O^+_{i,t+2}}{O^+_{i,t+1}} - 1.&quot;,&quot;id&quot;:&quot;GNMNJGKUME&quot;}" data-component-name="LatexBlockToDOM"></div><p>The label is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{i,t}\n\n=\n\n\\mathbf{1}\n\n\\left\\{\n\nR^{\\text{open}}_{i,t+1}\n\n\\text{ is among the top } K\n\n\\text{ returns in month } t+1\n\n\\right\\}.&quot;,&quot;id&quot;:&quot;QYAEOZQSSD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathbf{1}\{\cdot\}\) is the indicator function. It equals 1 when the condition inside the braces is true and 0 otherwise. Let \(N_t\) be the number of assets with a usable forward return in month \(t\), and let \(K\) be the number selected as positives. In the full holdout months, \(N_t=25\) and \(K=5\), so the monthly positive-class rate is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\bar{y}_t = \\frac{K}{N_t} = \\frac{5}{25} = 0.20.&quot;,&quot;id&quot;:&quot;LBUVVNJZQJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>A scorer then produces:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_{i,t}\n\n\\approx\n\n\\mathbb{P}(y_{i,t}=1 \\mid x_{i,t}, \\mathcal{D}_{\\text{train}}),&quot;,&quot;id&quot;:&quot;UQBHJKHUNM&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(s_{i,t}\) is the model score and \(\mathcal{D}_{\text{train}}\) is the labelled training or context data available to the scorer. This is a top-group membership score, not a price forecast. A high score does not mean the asset must go up next month. It means the model ranks the asset as more likely to belong to the next-month top group within the available ETF cross-section.</p><p>The supervised table is familiar, but TabPFN and XGBoost use it differently. XGBoost learns task-specific parameters from the labelled allocation table. Conceptually, it selects a scoring function from a model family \(\mathcal{G}\):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n\n=\n\n\\arg\\min_{f \\in \\mathcal{G}}\n\n\\sum_{(x_{i,t},y_{i,t}) \\in \\mathcal{D}_{\\text{train}}}\n\n\\ell(y_{i,t}, f(x_{i,t})).&quot;,&quot;id&quot;:&quot;PHNQCCBKVS&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{G}\) is the candidate family of scoring functions, \(f\) is one candidate scorer, \(\ell\) is the loss function, and \(\hat{f}\) is the fitted scorer selected from that family.</p><p>Direct TabPFN uses a pretrained tabular model and labelled rows as task context:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{P}_{\\theta}\n\n\\left(\n\ny_\\ast = 1\n\n\\mid\n\nx_\\ast,\n\nX_{\\text{context}},\n\ny_{\\text{context}}\n\n\\right),&quot;,&quot;id&quot;:&quot;KAZUZQYPOJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\theta\) denotes the pretrained TabPFN parameters, \(x_\ast\) is a query row, \(y_\ast\) is its unknown label, and \((X_{\text{context}}, y_{\text{context}})\) is the labelled context. This is the TFM use case I am testing: whether a pretrained in-context tabular learner can produce useful allocation scores under the same chronology, leakage discipline, and portfolio diagnostics used for the classical supervised ML baseline.</p><p>With that notation in place, the rest of the post is about what changed in the testbench and how those changes affect the interpretation.</p><h2><strong>What is new in this experiment</strong></h2><h3><strong>Liquidity and spread are now part of the feature and diagnostic story</strong></h3><p>P21 added portfolio stress diagnostics, but the liquidity path still needed work. A strategy that looks good on returns alone can become less interesting once turnover, capacity, trading frictions, and liquidity are considered. Even in a public ETF notebook, I want the workflow to start carrying these concerns through the experiment.</p><p>Today&#8217;s notebook adds dollar-volume and high-low range proxy features to the model frame. This is both a feature-set change and a diagnostic improvement. The final main feature matrix grows to 1,060 features after retained numeric columns and missingness indicators. That matters because P22 is not only rerunning P21 with extra reports; it is also giving the models additional liquidity-related inputs.</p><p>The portfolio diagnostics now include average weighted ADV, where ADV means average daily dollar volume, a high-low range proxy reported in basis points, and turnover-weighted participation at a USD 1 million notional portfolio size. These are proxies, not institutional execution data. The high-low proxy is based on intraday range and should not be read as a quoted bid-ask spread. The participation calculation is not a market-impact model. But the diagnostic is now an interpretable liquidity screen rather than a placeholder.</p><h3><strong>Feature-policy sensitivity is now a model-family check</strong></h3><p>P21 used Logistic Regression as a lightweight sensitivity diagnostic. That was useful, but it did not answer whether the feature policy mattered similarly for TabPFN and for a tree-based classical tabular model.</p><p>Today&#8217;s notebook reruns direct TabPFN and a fixed GPU XGBoost sensitivity model across five feature policies:</p><ul><li><p>full;</p></li><li><p>ticker-ablated;</p></li><li><p>identity-ablated;</p></li><li><p>metadata-only;</p></li><li><p>strict-time-series.</p></li></ul><p>In this naming scheme, &#8220;full&#8221; keeps the complete candidate feature set, &#8220;ticker-ablated&#8221; removes ticker dummy features, &#8220;identity-ablated&#8221; removes ticker identity and asset-group metadata, &#8220;metadata-only&#8221; keeps only those static identity and metadata fields, and &#8220;strict-time-series&#8221; keeps the non-identity time-series and market-state features. In this run, the strict-time-series and identity-ablated matrices end up with the same retained feature count after filtering.</p><p>This is a fixed XGBoost sensitivity model rather than a second full XGBoost search. It uses a fixed GPU XGBoost configuration for the feature-policy comparison, while the main XGBoost row still uses the searched model. That is a reasonable compromise for this notebook because the purpose is to test whether the main identity-ablated conclusion is fragile, not to run a full nested model-selection study for every feature variant.</p><h3><strong>Alternative objectives are retrained, not only audited</strong></h3><p>P21 made a useful start on alternative objectives, but the more direct version is to retrain and evaluate models on those objectives. Today&#8217;s notebook does that for three targets:</p><ul><li><p>1-month benchmark-relative outperformance versus SPY;</p></li><li><p>3-month top-k membership;</p></li><li><p>6-month top-k membership.</p></li></ul><p>For the benchmark-relative target:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y^{\\text{bench}}_{i,t}\n\n=\n\n\\mathbf{1}\n\n\\left\\{\n\nR^{\\text{open}}_{i,t+1}\n\n-\n\nR^{\\text{open}}_{\\text{SPY},t+1}\n\n> 0\n\n\\right\\}.&quot;,&quot;id&quot;:&quot;JNRQZEKQIJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(R^{\text{open}}_{\text{SPY},t+1}\) is SPY&#8217;s one-month forward open-to-open return under the same convention. This benchmark-relative target uses SPY as the label benchmark. Later, the portfolio benchmark-relative table uses a 60/40 SPY/TLT benchmark for active-risk diagnostics; those are related but distinct benchmark choices.</p><p>For an \(h\)-month top-k target, where \(h\) is the horizon length in months:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^{(h)}_{i,t}\n\n=\n\n\\prod_{m=0}^{h-1}\n\n\\left(1 + R^{\\text{open}}_{i,t+1+m}\\right)\n\n- 1.&quot;,&quot;id&quot;:&quot;UCUXXOLDGL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(m\) indexes the monthly return offsets inside the horizon. The \(h\)-month label then applies the same top-\(K\) cross-sectional rule to \(R^{(h)}_{i,t}\), with \(K=5\) in this experiment. This matters because a model trained to predict one-month top-k membership is not necessarily the right model for a six-month target or a benchmark-relative target.</p><p>The notebook also changes how multi-horizon results are interpreted. Overlapping 3-month and 6-month forward returns are not compounded as if they were independent monthly portfolio returns. They are reported as horizon-aware score-to-return diagnostics. This is less flashy, but it is conceptually cleaner.</p><h3><strong>The portfolio layer is treated as a separate object</strong></h3><p>The score \(s_{i,t}\) is not the portfolio. A portfolio requires a selection rule, a weighting rule, turnover accounting, transaction costs, and some view of risk. The same row-level score can look better or worse after this translation.</p><p>Let \(w_{i,t}\) be the portfolio weight assigned to asset \(i\) after the signal at month \(t\). In the main top-k equal-weight portfolio, selected assets receive weight \(1/K\), unselected assets receive weight 0, and the portfolio return is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^p_{t+1}\n\n=\n\n\\sum_i w_{i,t} R^{\\text{open}}_{i,t+1},&quot;,&quot;id&quot;:&quot;IKMNZGREQX&quot;}" data-component-name="LatexBlockToDOM"></div><p>with turnover:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau_t\n\n=\n\n\\sum_i |w_{i,t} - w_{i,t-1}|,&quot;,&quot;id&quot;:&quot;RGFZUYTUYD&quot;}" data-component-name="LatexBlockToDOM"></div><p>and net diagnostic return:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\widetilde{R}^p_{t+1}\n\n=\n\nR^p_{t+1} - c \\tau_t,&quot;,&quot;id&quot;:&quot;PCCAWGLWVG&quot;}" data-component-name="LatexBlockToDOM"></div><p>where the summation is over the available assets, \(R^p_{t+1}\) is the pre-cost portfolio return, \(\tau_t\) is the traded-weight turnover implied by the change in portfolio weights, \(\widetilde{R}^p_{t+1}\) is the cost-adjusted diagnostic return, and \(c=0.0005\), corresponding to 5 basis points per unit of traded weight. Under this convention, opening a fully invested long-only portfolio from cash has \(\tau_t=1\), and replacing the entire portfolio with non-overlapping long-only holdings has \(\tau_t=2\).</p><p>Today&#8217;s notebook also adds benchmark-relative diagnostics, tax-drag sensitivity, turnover constraints, liquidity and participation proxies, drawdown summaries, and a trailing-covariance ex ante risk proxy. These do not make the notebook a production allocator. They keep the interpretation from resting only on a row-level model metric.</p><p>The next section walks through the resulting data frame, score quality, robustness checks, and portfolio translation in that order.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p22-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-3?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading [DS, AI, Engineering] Newsletter! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/p/p22-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-3?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.dsaiengineering.com/p/p22-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-3?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2><strong>Code discussion and experimental results</strong></h2><h3><strong>Dataset and chronological splits</strong></h3><p>I start with the data shape because the rest of the metrics depend on this cross-section and chronology. The final model frame contains 6,059 asset-month rows across 243 months and 25 ETFs. The holdout period contains 1,875 rows across 75 months, from January 2020 through March 2026. The holdout positive rate is 0.200 because the target selects five assets out of twenty-five in each full holdout month.</p><p>The ETF universe is:</p><p>SPY, QQQ, DIA, IWM, EFA, EEM, TLT, IEF, SHY, LQD, HYG, GLD, SLV, DBC, VNQ, IYR, XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, XLY.</p><p>This is still a public ETF universe, not an institutional production universe. It is broad enough to include US equity styles, international equity, rates, credit, commodities, real estate, and US sectors, while remaining small enough that the full notebook is inspectable. The rows are then split chronologically as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_FuQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_FuQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 424w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 848w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 1272w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_FuQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png" width="1209" height="354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:1209,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51290,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_FuQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 424w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 848w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 1272w, https://substackcdn.com/image/fetch/$s_!_FuQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcf4a2ca-f3ca-49c4-bc1b-719ba9ef8f86_1209x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Chronological split matters because random splits in market data can leak future regimes into model selection and make a score look more stable than it would be in a real research process. Feature filtering and imputation are fitted on the model-selection history before being applied to calibration and holdout rows.</p><h3><strong>Main row-level score quality</strong></h3><p>The first empirical question is row-level: can a scorer separate future top-five asset-month rows from the other rows in the holdout set? Because the holdout label selects 5 assets out of 25 each month, a non-informative scorer has an Average Precision reference level of 0.200.</p><p>I keep the same row-level metrics used in P20 and P21. Average Precision summarizes precision-recall ranking quality, ROC AUC summarizes pairwise positive-versus-negative ranking, Brier score measures squared probability error, log loss penalizes confident wrong probability estimates, and ECE means expected calibration error, computed here with 10 quantile bins.</p><p>For comparability, the TabPFN runs use model version 2.6, the same model line used for P20 and P21. A newer TabPFN version was released while this post was being prepared, so pinning the version keeps the experiment from mixing model-version change with testbench change.</p><p>There are also three learned-score variants to keep separate. &#8220;Direct&#8221; means the final scorer fitted or contextualized on all pre-holdout rows and then evaluated on holdout. &#8220;Calibration base&#8221; means the scorer excludes the calibration-window labels; this gives a cleaner object for calibration-window evaluation and for checking what changes when the calibration window is not used as labelled context. &#8220;Calibrated&#8221; means a sigmoid calibration layer was fitted on the calibration window. The calibration-base score series is a distinct diagnostic series, not a separate model family. The deterministic rules are raw ranking scores, not calibrated probability models, so Brier score, log loss, and ECE are not reported for those rows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mfyy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mfyy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 424w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 848w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 1272w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mfyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png" width="1208" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1208,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102999,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mfyy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 424w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 848w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 1272w, https://substackcdn.com/image/fetch/$s_!mfyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7d6367e-622b-4bb0-a15b-23edf8911f66_1208x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The main row-level result is that TabPFN has clear lift over the 0.200 base rate. The calibration-base TabPFN row has AP 0.302, and direct TabPFN has AP 0.298. I do not treat that small difference as a meaningful contest between two TabPFN variants; the important point is that both are materially above the non-informative reference.</p><p>XGBoost is also above base rate, but lower than TabPFN on AP and ROC AUC in this run. The momentum rules are close to XGBoost on AP, which reinforces the value of deterministic finance baselines even before the portfolio layer is introduced.</p><p>The next two figures show the row-level ranking and calibration diagnostics visually, but I keep them to a small set of headline curves for readability. The tables carry the fuller variant comparison, including TabPFN calibration base.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iZBk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iZBk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iZBk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png" width="1213" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1213,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99672,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iZBk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!iZBk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63b85f91-1969-433e-b9b8-dbdaf000ffee_1213x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The precision-recall figure should be read against the 0.200 base-rate reference. TabPFN does not need to produce a perfect ranking to be useful. It needs to keep the ranked list above the non-informative reference in the part of the list used for top-five selection.</p><p>Calibration is a separate question, so I read it from the calibration curve and the probability-error columns rather than from AP alone.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CcLE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CcLE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 424w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 848w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 1272w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CcLE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png" width="1159" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1159,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101538,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CcLE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 424w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 848w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 1272w, https://substackcdn.com/image/fetch/$s_!CcLE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf4e0664-ddc3-4bff-a1d7-b8e78c4b3477_1159x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The calibration result tells a different story from the ranking result. XGBoost calibrated has a better Brier score and expected calibration error than XGBoost final, but it gives up AP. TabPFN direct has stronger ranking quality and reasonable calibration. This is a common supervised ML tradeoff: a score can become more probability-like without becoming a better top-k ranker.</p><p>The XGBoost calibration curve can look visually odd because the uncalibrated XGBoost probabilities are poorly scaled for this holdout. That is not the same as an invalid ranker. The score sanity checks show valid probabilities with no NaNs and no values outside \([0,1]\), and the XGBoost final row still has AP 0.252 versus the 0.200 base rate. The issue is probability reliability: the uncalibrated XGBoost score ranks some rows usefully, but its probability scale is not trustworthy. The sigmoid-calibrated XGBoost row fixes much of that probability-scale problem, as shown by the lower Brier score and ECE, but its AP falls to 0.240.</p><h3><strong>Feature-policy sensitivity</strong></h3><p>Before translating scores into portfolios, I first check whether the main row-level result depends on a fragile feature policy. The main run is identity-ablated, so the model does not receive ticker identity or asset-group metadata in the headline feature matrix. The sensitivity study asks whether this choice changes the result sharply.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Zk_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Zk_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 424w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 848w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 1272w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Zk_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png" width="1216" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86125,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_Zk_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 424w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 848w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 1272w, https://substackcdn.com/image/fetch/$s_!_Zk_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a63505f-59a9-4359-aede-bb7533066a42_1216x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The identity-ablated TabPFN result does not appear to depend heavily on that feature policy. Full features improve TabPFN AP only slightly, from 0.298 to 0.302. That suggests the main conclusion is not just a static-identity artifact.</p><p>The metadata-only result is still important. Metadata-only TabPFN reaches AP 0.290, and metadata-only fixed XGBoost reaches AP 0.282. This does not mean metadata-only is a sufficient allocation strategy. It means persistent asset-class structure is meaningful in this fixed ETF universe. That is worth noticing rather than treating feature design as an implementation detail. Feature design is part of the scientific result.</p><h3><strong>Objective-specific retraining</strong></h3><p>The next sensitivity is the target itself. The objective-specific reruns ask whether the models still find useful structure when the label changes. This is more direct than the P21 audit because the models are retrained for the alternative labels.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U5fj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U5fj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 424w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 848w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 1272w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U5fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png" width="1218" height="204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:204,&quot;width&quot;:1218,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32975,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!U5fj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 424w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 848w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 1272w, https://substackcdn.com/image/fetch/$s_!U5fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d532307-7b1d-4ecd-a453-b8bb63d6dece_1218x204.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The benchmark-relative target has a higher base rate, so AP 0.466 is a modest lift above 0.422. The 3-month and 6-month targets are more encouraging because their base rates remain 0.200 and TabPFN direct reaches AP 0.325 and 0.337. The table above shows the best holdout scorer for each objective, which is why calibration variants are not listed there unless they are the best row-level scorer. The table below is different: it shows selected score-to-return diagnostics, so it includes calibration-base variants when they produce a relevant portfolio diagnostic.</p><p>The return columns in the next table depend on the objective. For the benchmark-outperform rows, selected and universe horizon returns are SPY-relative one-month returns, and active horizon return is selected SPY-relative return minus universe SPY-relative return. For the 3-month and 6-month top-k rows, selected and universe horizon returns are raw cumulative open-to-open returns over the stated horizon.</p><p>The horizon-aware score-to-return diagnostics are:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qhzP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qhzP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 424w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 848w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 1272w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qhzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png" width="1217" height="895" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:895,&quot;width&quot;:1217,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125307,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qhzP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 424w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 848w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 1272w, https://substackcdn.com/image/fetch/$s_!qhzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b5ab35b-d445-48fd-b919-5d01c4718add_1217x895.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These are not independent tradable monthly portfolio returns. They are horizon-aware diagnostics of selected forward returns. That is the right interpretation because 3-month and 6-month forward windows overlap. The positive active horizon returns therefore say that the selected baskets had better subsequent horizon returns than the universe average in this historical holdout; they do not define an immediately compounding monthly strategy.</p><h3><strong>Monthly cross-sectional ranking</strong></h3><p>After the row-level robustness checks, I return to the main one-month target and ask how the scores behave inside each monthly cross-section. Row-level AP pools all holdout rows together, but tactical allocation is applied month by month. The monthly rank diagnostic asks whether high scores line up with high next-month returns inside each monthly cross-section.</p><p>For month \(t\), the Spearman information coefficient is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_t\n\n=\n\n\\operatorname{corr}_{\\text{Spearman}}\n\n\\left(\n\n\\{s_{i,t}\\}_{i=1}^{N_t},\n\n\\{R^{\\text{open}}_{i,t+1}\\}_{i=1}^{N_t}\n\n\\right).&quot;,&quot;id&quot;:&quot;VXXOITCDXV&quot;}" data-component-name="LatexBlockToDOM"></div><p>The table below reports the mean and median of this monthly rank correlation, the top-k hit rate, and selected-minus-universe returns. The top-k hit rate is the fraction of selected assets that actually land in the realized top-five group, so a non-informative top-five selection would be expected to sit near 0.20 in a 25-asset month.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!spXO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!spXO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 424w, https://substackcdn.com/image/fetch/$s_!spXO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 848w, https://substackcdn.com/image/fetch/$s_!spXO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 1272w, https://substackcdn.com/image/fetch/$s_!spXO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!spXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png" width="1211" height="796" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4612450b-b363-434e-b307-416854d66bf8_1211x796.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:796,&quot;width&quot;:1211,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:118159,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!spXO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 424w, https://substackcdn.com/image/fetch/$s_!spXO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 848w, https://substackcdn.com/image/fetch/$s_!spXO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 1272w, https://substackcdn.com/image/fetch/$s_!spXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4612450b-b363-434e-b307-416854d66bf8_1211x796.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This view supports the TabPFN result, but it also keeps the interpretation grounded. TabPFN calibration-base and direct TabPFN have positive monthly rank diagnostics and positive active selected returns. The 12-month momentum rule is very close, with the highest mean IC in this table. XGBoost final is positive, but its allocation translation is less favorable than its row-level AP alone might suggest.</p><p>The low-volatility rule has a low result under this target in this holdout. I read that as period-specific. The 2020-2026 holdout includes the COVID crash, a rapid recovery, an inflation and rate shock, and a strong growth-equity period. A defensive rule can look less favorable when the target rewards relative top-five returns in a risk-on or momentum-dominated path.</p><h3><strong>Portfolio translation</strong></h3><p>The monthly ranking view is still not the same as a portfolio. The main portfolio diagnostic converts each scorer into a monthly top-five equal-weight portfolio and subtracts 5 basis points per unit of traded-weight turnover. This is not a production backtest. It is a controlled score-to-allocation diagnostic. CAGR means compound annual growth rate. Sharpe is computed from annualized mean monthly net return divided by annualized monthly volatility, using a zero risk-free-rate simplification, so it does not have to equal CAGR divided by annual volatility exactly.</p><p>Because the calibration-base score series is distinct, I also include its portfolio translation when it is part of the headline strategy set. I do not draw every such variant in the PR and calibration figures because that would make the figures harder to read; the tables are the authoritative place for the variant-level numbers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VnQ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VnQ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 424w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 848w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 1272w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VnQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png" width="1215" height="748" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:748,&quot;width&quot;:1215,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131742,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VnQ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 424w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 848w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 1272w, https://substackcdn.com/image/fetch/$s_!VnQ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1cdc531-9d69-466c-a437-b38502ced02b_1215x748.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the key table for interpreting the experiment because it prevents the row-level metric from becoming the whole headline. TabPFN has the best row-level ranking result, but the 12-month momentum rule has the highest portfolio Sharpe point estimate. TabPFN calibration-base has the highest final equity in this subset, but it also has higher volatility and a larger drawdown than the 12-month momentum rule.</p><p>This is a plausible finance result, not a warning sign by itself. In ETF allocation, deterministic rules such as 6-month or 12-month momentum are meaningful baselines. They are compact expressions of persistent return continuation effects that many tactical-allocation systems compare against after costs, turnover, and risk are included. A supervised model or TFM score can improve row-level ranking and still not translate into the best portfolio result if the score creates more turnover, selects a more volatile basket, or concentrates in assets with less favorable drawdown timing.</p><p>This is not a contradiction. The row-level classification problem and the portfolio problem are connected but not identical. A model can identify more top-group rows overall and still produce a less attractive portfolio after monthly selection, turnover, risk concentration, and drawdown enter the picture.</p><h3><strong>Portfolio realism and uncertainty</strong></h3><p>The score and allocation results need to be checked against broader portfolio diagnostics. The benchmark-relative view compares strategies against a 60/40 SPY/TLT benchmark. This is a portfolio-reporting benchmark, not the SPY benchmark used in the benchmark-outperformance label above. Annual active return is the annualized strategy-minus-benchmark return. Tracking error is the annualized volatility of that active return, and information ratio is annual active return divided by tracking error. This is not benchmark-relative optimization, but it is closer to how allocation results are often discussed in practice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G5yC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G5yC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 424w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 848w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 1272w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G5yC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png" width="1214" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G5yC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 424w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 848w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 1272w, https://substackcdn.com/image/fetch/$s_!G5yC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F383980d6-8d00-4fb9-b183-a0f6b797d956_1214x349.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The liquidity proxy also now carries useful information. I include both TabPFN direct and TabPFN calibration-base here because they are both headline portfolio strategies in the main portfolio table, and liquidity is a property of the resulting traded weights rather than only a property of the model family. The spread-proxy column is the weighted high-low range proxy reported in basis points, not an estimate of realized transaction cost:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dNQU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dNQU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 424w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 848w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 1272w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dNQU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png" width="1214" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:411,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66821,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dNQU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 424w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 848w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 1272w, https://substackcdn.com/image/fetch/$s_!dNQU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40fad602-b3ad-46dc-ad33-da130b04bcff_1214x411.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the USD 1 million diagnostic notional, the participation values are small. That does not prove the strategies are scalable. It says the notebook now has a working path for asking the capacity question instead of ignoring it. A real capacity study would need actual bid-ask quotes, order-book depth, creation-redemption mechanics, fund-specific spreads, and market-impact modeling.</p><p>The trailing-covariance ex ante risk proxy is also populated:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fry1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fry1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 424w, https://substackcdn.com/image/fetch/$s_!fry1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 848w, https://substackcdn.com/image/fetch/$s_!fry1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 1272w, https://substackcdn.com/image/fetch/$s_!fry1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fry1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png" width="1215" height="415" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:1215,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59273,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fry1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 424w, https://substackcdn.com/image/fetch/$s_!fry1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 848w, https://substackcdn.com/image/fetch/$s_!fry1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 1272w, https://substackcdn.com/image/fetch/$s_!fry1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35ca3013-e74a-4026-9218-4fd063ce0bea_1215x415.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a proxy, not a production risk model. It still checks whether the portfolio construction is creating risk differences that a row-level metric would miss. In this table, the TabPFN direct portfolio&#8217;s higher realized volatility is visible before looking at return alone, while the 12-month momentum portfolio realized less volatility than this trailing-covariance proxy expected.</p><p>The month-block bootstrap intervals remain wide:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dM8u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dM8u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 424w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 848w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 1272w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dM8u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png" width="1214" height="316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dM8u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 424w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 848w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 1272w, https://substackcdn.com/image/fetch/$s_!dM8u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3166352-2412-4e0d-94d8-a1c5ba0c479b_1214x316.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Selected portfolio intervals are similarly wide:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j_f4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j_f4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 424w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 848w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 1272w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j_f4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png" width="1216" height="303" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:303,&quot;width&quot;:1216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51783,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j_f4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 424w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 848w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 1272w, https://substackcdn.com/image/fetch/$s_!j_f4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34240a2a-4a1a-4fb1-bfd2-a3a7a0b223c5_1216x303.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>These intervals are the reason I keep the conclusion modest. TabPFN is competitive in this run and has the best main row-level ranking view, but the portfolio-level uncertainty overlaps with strong baselines. This is evidence from one public historical path, not a claim of stable model-family superiority.</p><h3><strong>Runtime and research cost</strong></h3><p>After accuracy, portfolio behavior, and uncertainty, there is one more practical dimension: research cost. The final direct TabPFN scorer takes about 78 seconds. XGBoost final takes about 463 seconds after the early-stopping changes. This is not a pure speed comparison because TabPFN and XGBoost are doing different things, but it is relevant for notebook-first research workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c_lJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c_lJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 424w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 848w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 1272w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c_lJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png" width="1330" height="915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1330,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82917,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197373041?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c_lJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 424w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 848w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 1272w, https://substackcdn.com/image/fetch/$s_!c_lJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5010174e-bd17-45b2-b60d-82c5b9fd83b1_1330x915.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The practical lesson is not that one should replace XGBoost with TabPFN everywhere. XGBoost remains useful when a team wants task-specific fitting, tree-based diagnostics, feature-importance workflows, and controlled hyperparameter search. TabPFN is useful when direct pretrained tabular scoring is informative enough to justify its inclusion as a baseline or component in the research workflow.</p><h2><strong>Known limitations</strong></h2><p>Those results still need boundaries. TabICL is disabled in today&#8217;s empirical comparison. That is an engineering limitation of this public notebook run, not a statement about TabICL as a model family. I still want TabICL in the broader DSAIEngineering testbench because open-source usability matters for practical commercial work.</p><p>The data is still public. yfinance, public Cboe VIX history, and public FRED series make the notebook reproducible, but they are not a licensed point-in-time institutional data stack. A production version would need stronger data lineage, corporate-action controls, vendor quality checks, point-in-time macro release handling, and operational monitoring.</p><p>TFM embeddings are not tested. This post evaluates direct TabPFN scoring. A separate experiment could ask whether TabPFN or TabICL embeddings improve downstream XGBoost, a ranking model, a portfolio optimizer, or a strategy-selection meta-model.</p><p>The liquidity and high-low range diagnostics are proxies. The high-low range proxy is not a real quoted bid-ask spread. The participation diagnostic is useful for scale awareness, but it is not a market-impact model.</p><p>The objective-specific reruns are more direct than P21, but they are still focused diagnostics. A production benchmark-relative allocator would optimize active risk directly. A production multi-horizon allocator would define how overlapping signals map into actual position changes and rebalance timing.</p><p>The portfolio diagnostics are not production backtests. They do not include real execution, tax lots, mandate constraints, borrow constraints, capacity limits, market-impact estimation, a production ex ante risk model, model registry, or live monitoring. They are useful for testing whether model scores survive controlled allocation translation, not for approving live capital.</p><p>Finally, the holdout is one historical path. The month-block bootstrap intervals are wide. I read the result as evidence that direct TabPFN scoring is useful in this workflow, not as evidence that this TFM-based allocation workflow has a stable edge.</p><h2><strong>Conclusion</strong></h2><p>This final follow-up puts the tactical asset-allocation notebook in a more complete and better-qualified state. P20 introduced the supervised ranking framing. P21 made the problem more demanding with a larger universe, a lower base-rate target, a next-open convention, and identity-ablated main features. Today&#8217;s version strengthens the remaining experimental layers: liquidity, feature-policy sensitivity, objective-specific retraining, portfolio realism, and uncertainty.</p><p>The result is intentionally not a single-winner model-family story. TabPFN has the best completed row-level ranking result on the main target in this run. It also remains useful in monthly rank and portfolio diagnostics. But the best headline portfolio Sharpe belongs to the 12-month momentum rule, not to TabPFN, in this run. XGBoost remains an important classical supervised baseline, even though its main holdout AP is lower than TabPFN&#8217;s in this run.</p><p>I read the deterministic-rule result as a normal bar in this domain. If an ML or TFM allocator does not improve on strong deterministic finance rules after chronology, transaction costs, turnover, drawdown, and uncertainty are carried through the workflow, then the appropriate conclusion is that the model is useful in some parts of the workflow but not the leading allocation rule in that experiment.</p><p>That mixed result is the point. Tabular foundation models can be evaluated as useful components inside quantitative workflows, while still being compared with classical ML baselines, deterministic financial rules, calibration checks, leakage controls, uncertainty intervals, and portfolio translation.</p><p>This closes the tactical-allocation mini-series for now. The next step is to carry the same evaluation discipline into a new quantitative workflow.</p>]]></content:encoded></item><item><title><![CDATA[[P21] Tactical asset allocation with TabPFN, TabICL, and XGBoost - 2]]></title><description><![CDATA[Given information available at a monthly signal date, can TabPFN, TabICL, XGBoost, and simple allocation rules rank assets in a public ETF universe by next-month relative attractiveness?]]></description><link>https://newsletter.dsaiengineering.com/p/p21-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-2</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p21-tactical-asset-allocation-with-tabpfn-tabicl-xgboost-2</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 11 May 2026 20:10:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7JJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a href="https://open.substack.com/pub/dsaiengineering/p/p20-tactical-asset-allocation-with-tabpfn-tabicl-xgboost?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P20</a>, I converted tactical asset allocation into a supervised tabular ranking problem. Each row was an asset-month observation, the label indicated whether that asset landed in the next-month top group, and TabPFN, TabICL, XGBoost, and deterministic allocation rules were evaluated as allocation scorers rather than price forecasters.</p><p>That first version was intentionally controlled. It used a nine-ETF universe, a top-three target, ticker identity features, a close-to-close return convention, and a simple transaction-cost-aware portfolio diagnostic. The result was useful, but the limitations were also clear. A nine-asset universe makes the top-group label coarse. Static ticker identity can make the model learn persistent ETF identity rather than reusable market-state relationships. Close-to-close evaluation is easy to audit, but it is not the same as asking what would happen if a signal were acted on after month-end information is available. The portfolio diagnostic also needed more stress checks.</p><p>This post is the follow-up. I keep the same supervised tabular framing, but I make the allocation testbench more demanding in the directions that mattered most after P20:</p><ul><li><p>The universe is expanded from 9 ETFs to 25 ETFs.</p></li><li><p>The target changes from top 3 of 9 to top 5 of 25.</p></li><li><p>The base positive rate falls from about 0.333 to about 0.200.</p></li><li><p>The target return convention changes from close-to-close to next-open-to-next-open.</p></li><li><p>The main feature set removes ticker identity, asset-group metadata, and risk-bucket metadata.</p></li><li><p>Feature-set sensitivity is checked with lightweight Logistic Regression diagnostics.</p></li><li><p>The notebook adds multi-horizon and benchmark-relative audits.</p></li><li><p>The portfolio section adds turnover, tax-drag, drawdown, benchmark-relative, and constrained-weight diagnostics.</p></li></ul><p>The empirical result changes in an interesting way. In P20, direct TabPFN and TabICL had the strongest row-level Average Precision point estimates, around 0.409 against a 0.333 base rate, while XGBoost had stronger portfolio diagnostic point estimates. In this follow-up, TabICL is not part of the final empirical comparison because the full TabICL run hit RAM limits on Kaggle. Direct TabPFN remains the strongest row-level ranker among the models I completed, with holdout Average Precision of 0.313 against a 0.200 base rate. It also leads the main portfolio diagnostic in this run. XGBoost and the 12-month momentum rule remain serious baselines.</p><p>The most important interpretation is not that one model family wins. The useful result is that a more demanding allocation workflow changes the comparison while preserving the central lesson from P20: tabular foundation model scoring can be evaluated inside a realistic supervised ML workflow, but the conclusion depends on ranking quality, monthly cross-sectional behavior, allocation translation, uncertainty, and known limitations. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7JJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7JJp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7JJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png" width="1230" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166900,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7JJp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!7JJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F189ff9ef-d2af-4ac0-bd8e-1e3baba80728_1230x862.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The equity-curve figure gives the high-level result before the tables. TabPFN direct finishes with the highest final equity among the plotted score-driven portfolios, but the path should not be read as a trading recommendation. It is a visual summary of how the monthly score-to-portfolio diagnostic behaved after the same top-five allocation rule and transaction-cost assumption were applied across scorers.</p><p>You can find the notebook in my GitHub repository <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260511-tabpfn-tabicl-tactical-asset-allocation-2.assets/saved-versions/tabpfn-tabicl-tactical-asset-allocation-20260511-v3.ipynb">here</a> or you can <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-tactical-asset-allocation-20260511">clone it directly</a> on Kaggle. The results discussed in this post come from the latest notebook version and its generated artifacts.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Minimal notation for this follow-up</h2><p>P20 gives the full formulation, so I will only restate the notation needed to read this follow-up. Each row is still an asset-month pair \((i,t)\). The feature vector \(x_{i,t}\) contains information available at or before the monthly signal date, and the score \(s_{i,t}\) is used to rank assets within that month. The target \(y_{i,t}\) is 1 when the asset belongs to the next-period top group and 0 otherwise.</p><p>The core object is unchanged:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x_{i,t} = \\phi_i(\\mathcal{F}_t),&quot;,&quot;id&quot;:&quot;QVDWBTRYUR&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\mathcal{F}_t\) is the information set available at the signal date and \(\phi_i(\cdot)\) is the feature-generation process for asset \(i\). A scorer then produces:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_{i,t}\n\n\\approx\n\n\\mathbb{P}(y_{i,t}=1 \\mid x_{i,t}, \\mathcal{D}_{\\text{train}}),&quot;,&quot;id&quot;:&quot;MVMHBQZETT&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\mathcal{D}_{\text{train}}\) is either the training data for a fitted supervised model or the labelled context for a direct tabular foundation model scorer. This is the only TFM recap needed here. XGBoost learns task-specific parameters from the allocation table. TabPFN uses a pretrained tabular model and the labelled context rows to score query rows:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{P}_{\\theta}\n\n\\left(\n\ny_{\\ast}=1\n\n\\mid\n\nx_{\\ast},\n\nX_{\\text{context}},\n\ny_{\\text{context}}\n\n\\right),&quot;,&quot;id&quot;:&quot;MPAQGIUMYE&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\theta\) denotes the pretrained TFM parameters, \(x_{\ast}\) is a query row, and \((X_{\text{context}}, y_{\text{context}})\) is the labelled context. The new capability I am exercising is this pretrained in-context scoring mechanism. The old discipline still applies: the target, chronology, leakage controls, baselines, calibration, and portfolio translation remain part of the experiment.</p><h2>What is new in this experiment</h2><h3>A broader universe and a less coarse target</h3><p>The P20 universe had 9 ETFs:</p><p>SPY, QQQ, IWM, TLT, IEF, GLD, HYG, EEM, VNQ.</p><p>This follow-up uses 25 ETFs:</p><p>SPY, QQQ, DIA, IWM, EFA, EEM, TLT, IEF, SHY, LQD, HYG, GLD, SLV, DBC, VNQ, IYR, XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, XLY.</p><p>This adds more US equity style exposure, developed and emerging international equity, rates, credit, commodities, real estate, and US sectors. It is still not a production institutional universe, but it is a more demanding cross-sectional ranking problem than the first notebook.</p><p>Let \(N_t\) be the number of available assets in month \(t\). In the holdout period, \(N_t=25\) for the usable months. The target selects \(K=5\) assets. The monthly base rate is therefore approximately:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\bar{y}_t = \\frac{K}{N_t} = \\frac{5}{25} = 0.20.&quot;,&quot;id&quot;:&quot;HKMCJISRST&quot;}" data-component-name="LatexBlockToDOM"></div><p>This matters for Average Precision. In P20, a non-informative scorer had a reference level near 0.333 because three out of nine assets were positive each month. In this post, the reference level is near 0.200. A raw AP number from P20 and a raw AP number from this post should not be compared without remembering that the target is different.</p><h3>A next-open target convention</h3><p>P20 used a close-to-close convention. That was easy to audit, but it left an execution caveat: if features are computed at month-end, what price could actually be traded after the signal is formed?</p><p>P20 used \(P_{i,t}\) for the adjusted month-end close price. I use a new symbol here because the execution convention changes from close-to-close to open-to-open. Let \(O^+_{i,t+1}\) denote the adjusted first tradable open after signal month \(t\), and let \(O^+_{i,t+2}\) denote the adjusted first tradable open after the next month. The selected one-month target return is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^{\\text{open}}_{i,t+1}\n\n=\n\n\\frac{O^+_{i,t+2}}{O^+_{i,t+1}} - 1.&quot;,&quot;id&quot;:&quot;GCXENCAFEC&quot;}" data-component-name="LatexBlockToDOM"></div><p>The label is then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{i,t}\n\n=\n\n\\mathbf{1}\n\n\\left\\{\n\nR^{\\text{open}}_{i,t+1}\n\n\\text{ is among the top } K\n\n\\text{ cross-sectional returns}\n\n\\right\\}.&quot;,&quot;id&quot;:&quot;TCKAEKRTMY&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here \(\mathbf{1}\{\cdot\}\) is the indicator function: it equals 1 when the condition inside the braces is true and 0 otherwise. The phrase &#8220;top \(K\)&#8221; means that assets are ranked within the same signal month by the selected forward return convention.</p><p>The notebook still saves both close-to-close and next-open returns for audit. This is important because a change in execution convention can change rankings. The goal is not to claim that this is now a complete execution model. It is a more realistic diagnostic convention than the P20 close-to-close target, but it still does not model intraday slippage, bid-ask spreads, order timing, market impact, or data release timing in production detail.</p><h3>Identity-ablated main features</h3><p>The P20 notebook included ticker identity indicators and asset-group metadata. That choice was defensible for a fixed-universe allocation exercise, but it raised a good interpretation problem. If the model performs well, is it learning reusable relationships between market-state features and future relative returns, or is it learning persistent differences among named ETFs?</p><p>This post makes the main run identity-ablated. If \(z_i\) denotes static identity and metadata features for asset \(i\), the main feature policy is closer to: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x^{\\text{ablated}}_{i,t}\n\n=\n\n\\phi_i(\\mathcal{F}_t) \\setminus z_i.&quot;,&quot;id&quot;:&quot;WZDDXUYIUP&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notebook excludes 34 static identity or metadata columns from the main model matrix. The final main feature matrix has 956 model features after retained base features and missingness indicators.</p><p>This does not mean identity information is unimportant. In fact, one diagnostic suggests it may be very important. The metadata-only Logistic Regression sensitivity model has a holdout AP of 0.282, which is above the 0.200 base rate and also slightly above the XGBoost final AP of 0.277 in this run. That does not mean metadata-only is a better allocation system. It means persistent asset identity and group structure contain meaningful information in this historical sample. For a practitioner, that is a useful warning: removing identity features makes the main experiment more conservative, but studying identity sensitivity remains part of the interpretation.</p><h3>Feature sensitivity as a diagnostic, not a full rerun</h3><p>The notebook compares these feature policies with lightweight CPU Logistic Regression:</p><ul><li><p>full</p></li><li><p>ticker-ablated</p></li><li><p>identity-ablated</p></li><li><p>metadata-only</p></li><li><p>strict time-series</p></li></ul><p>The important phrase is &#8220;lightweight diagnostic.&#8221; I did not rerun the full TabPFN and XGBoost workflows for every feature variant because the practical constraint in this iteration was compute time. I had to rerun the notebook several times while working around TabICL memory failures and XGBoost tuning costs; each full attempt took close to two hours, with roughly 1.5 hours spent in the XGBoost tuning part. A full model-by-feature-variant study would be stronger, but that is a separate experiment. The current sensitivity check is useful for inspecting whether feature-family choices matter, but it should not be presented as a complete model-family sensitivity study.</p><p>The holdout feature sensitivity table is:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rj8D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rj8D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 424w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 848w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 1272w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rj8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png" width="1413" height="412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:1413,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75138,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rj8D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 424w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 848w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 1272w, https://substackcdn.com/image/fetch/$s_!Rj8D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525be6e-6fb3-4e1f-be91-e0abf0cba3e9_1413x412.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The metadata-only result is the surprising line. It suggests that persistent group and identity information can rank the holdout rows better than a weak linear model using the larger noisy feature set. This is not a reason to abandon time-series features. It is a reason to be careful with interpretation and to treat feature design as part of the scientific result.</p><h3>Alternative-objective audits</h3><p>P20 noted that the top-group target was only one possible allocation objective. A production allocator might care about benchmark-relative active return, multi-month horizons, drawdown risk, tax-aware turnover, or risk-adjusted return.</p><p>This notebook does not retrain the main models for all of those objectives. Instead, it asks a narrower question: how do the existing one-month scores relate to alternative labels?</p><p>For a 3-month or 6-month horizon \(h\), the notebook compounds the selected one-month return convention. In this run, that selected convention is next-open-to-next-open:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^{(h)}_{i,t}\n\n=\n\n\\prod_{m=0}^{h-1}\n\n\\left(1 + R^{\\text{open}}_{i,t+1+m}\\right)\n\n- 1,&quot;,&quot;id&quot;:&quot;KISEHEUYJA&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(m\) indexes the one-month return offset inside the \(h\)-month compounding window. The notebook then assigns a top-\(K\) label using the same cross-sectional ranking idea. It also creates a benchmark-relative label against SPY:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y^{\\text{bench}}_{i,t}\n\n=\n\n\\mathbf{1}\n\n\\left\\{\n\nR^{\\text{open}}_{i,t+1} - R^{\\text{open}}_{\\text{SPY},t+1} > 0\n\n\\right\\}.&quot;,&quot;id&quot;:&quot;EAJJMFGWCG&quot;}" data-component-name="LatexBlockToDOM"></div><p>These are audits of the same scores, not new objective-specific training runs. This distinction matters. If a reader wants to know whether TabPFN is best for a six-month allocation target, the correct experiment would retrain and validate on that target. The current notebook asks whether the one-month next-open score contains some information about those alternative outcomes.</p><h2>Experimental results</h2><h3>Dataset and splits</h3><p>The final dataset contains 6,059 rows across 243 months and 25 assets. The holdout period contains 1,875 rows across 75 months, from January 2020 through March 2026. The holdout positive rate is exactly 0.200 because the target selects five assets out of twenty-five in each full holdout month.</p><p>The chronological split follows the same design logic as P20:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PQzi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PQzi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 424w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 848w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 1272w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PQzi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png" width="1413" height="409" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:409,&quot;width&quot;:1413,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PQzi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 424w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 848w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 1272w, https://substackcdn.com/image/fetch/$s_!PQzi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20a23eda-42e7-44ee-af05-9b62c76f0066_1413x409.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a time-ordered split, not a random split.</p><h3>Row-level classification</h3><p>P20 used Average Precision, ROC AUC, Brier score, calibration diagnostics, monthly rank diagnostics, and portfolio diagnostics because the target is a top-group ranking label rather than an ordinary balanced classification target. Average Precision summarizes precision-recall ranking quality, ROC AUC summarizes pairwise positive-versus-negative ranking, and Brier score measures squared probability error. I keep those metrics here so the comparison is continuous. The new pieces in this post are not new row-level metrics; they are the lower base-rate target, the next-open target convention, identity-ablated features, feature-policy sensitivity, alternative-objective audits, and the extra portfolio stress diagnostics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TMBp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TMBp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 424w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 848w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 1272w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TMBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png" width="1408" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TMBp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 424w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 848w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 1272w, https://substackcdn.com/image/fetch/$s_!TMBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee4f44a-1beb-4e65-9640-d1635c22315d_1408x869.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The direct TabPFN scorer has the highest holdout AP and ROC AUC among the completed main models. The comparison with P20 needs care. P20's TabPFN and TabICL AP values were around 0.409, but the base rate was 0.333. Here the TabPFN AP is 0.313, but the base rate is 0.200. On the new, less coarse target, TabPFN is further above the non-informative reference than the raw AP alone would suggest.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ADpf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ADpf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ADpf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png" width="1213" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1213,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ADpf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!ADpf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa166d844-8393-4cc3-ab5a-cd946bd5d1a3_1213x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The result is useful for the TFM question because the main TabPFN run is identity-ablated. The model is not being handed ticker identity or asset-group labels in the main feature matrix. It is using the labelled context rows and the time-varying feature table to score holdout asset-month rows.</p><p>In the precision-recall plot, the useful comparison is against the lower horizontal base-rate reference. The curves do not need to reach extreme precision to be useful in this setting; they need to stay meaningfully above the 0.200 reference over the part of the ranked list that matters for top-group selection. TabPFN direct does that most clearly among the completed main models.</p><p>That does not prove that TabPFN has discovered a stable allocation law. It shows that, in this public-data experiment, direct TabPFN scoring remains competitive after making the target harder, removing static identity features, and changing the execution convention.</p><h3>Monthly cross-sectional ranking</h3><p>P20 showed why monthly rank diagnostics matter: TabPFN and TabICL had the strongest row-level AP point estimates, while XGBoost looked stronger in the monthly rank and portfolio views. That same separation of questions matters here. Row-level AP pools all holdout rows together, while allocation is applied month by month. A scorer can have useful pooled AP and still produce a weaker within-month ordering.</p><p>For each month \(t\), Spearman information coefficient is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_t\n\n=\n\n\\operatorname{corr}_{\\text{Spearman}}\n\n\\left(\n\n\\{s_{i,t}\\}_{i=1}^{N_t},\n\n\\{R^{\\text{open}}_{i,t+1}\\}_{i=1}^{N_t}\n\n\\right).&quot;,&quot;id&quot;:&quot;FZTDFMJQPZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The mean of \(\rho_t\) across holdout months summarizes whether higher scores tended to align with higher next-open returns within the same monthly cross-section.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U02Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U02Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 424w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 848w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 1272w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U02Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png" width="1408" height="980" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:980,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147238,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U02Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 424w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 848w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 1272w, https://substackcdn.com/image/fetch/$s_!U02Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b371534-ba7f-45ed-b0c5-461b341de96e_1408x980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>TabPFN direct leads on selected return and active return. XGBoost is close. The 12-month momentum rule has the highest mean IC among these rows, but a lower top-k hit rate and slightly lower selected return. This continues the P20 lesson: mean rank correlation, top-k hit rate, selected return, and portfolio performance are related but not identical.</p><p>The low-volatility rule performs poorly in this holdout under this target. That should be read as a period-specific result, not as a general claim against low-volatility investing. The 2020 to 2026 holdout contains a sharp crash, a rapid recovery, an inflation and rates shock, and a strong mega-cap growth period. A defensive rule can look weak under a relative top-five target if the period rewards risk-on or momentum-like exposures.</p><h3>Portfolio translation</h3><p>The portfolio diagnostic converts scores into monthly top-five equal-weight portfolios and subtracts a 5 basis point cost per unit of one-way turnover. If \(w_{i,t}\) is the portfolio weight assigned to asset \(i\) after signal month \(t\), the pre-cost diagnostic portfolio return is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^p_{t+1}\n\n=\n\n\\sum_i w_{i,t} R^{\\text{open}}_{i,t+1}.&quot;,&quot;id&quot;:&quot;AQFKWPVHCB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Turnover is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau_t\n\n=\n\n\\sum_i |w_{i,t} - w_{i,t-1}|.&quot;,&quot;id&quot;:&quot;VVWLJJZBUR&quot;}" data-component-name="LatexBlockToDOM"></div><p>In these two equations, the summation is over the available assets in the monthly universe. The turnover \(\tau_t\) is one-way turnover: a move from 20 percent in one asset to 0 percent contributes 20 percentage points, and a move from 0 percent to 20 percent in another asset contributes another 20 percentage points.</p><p>The net return used in the main portfolio table is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\widetilde{R}^p_{t+1}\n\n=\n\nR^p_{t+1} - c \\tau_t,&quot;,&quot;id&quot;:&quot;LUNQQXPITP&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(c=0.0005\) for the 5 basis point transaction-cost assumption.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4GsM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4GsM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 424w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 848w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 1272w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4GsM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png" width="1411" height="938" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:938,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166684,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4GsM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 424w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 848w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 1272w, https://substackcdn.com/image/fetch/$s_!4GsM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d6969d-47ef-4260-890f-e401313e43c2_1411x938.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the biggest empirical change from P20. In P20, XGBoost had the strongest portfolio diagnostic point estimates. In this follow-up, TabPFN direct leads the main portfolio table by Sharpe and final equity. It also remains ahead after the basic 5 basis point turnover cost.</p><p>I do not treat this as a stable trading result. The holdout is one historical path. The confidence intervals are wide. The portfolio is still a simple monthly top-five diagnostic, not a production optimizer. But the result is useful because it shows that the TabPFN row-level advantage is not only a row-level AP artifact in this run. It also survives a simple allocation translation.</p><h3>Turnover, tax drag, and weight constraints</h3><p>The notebook adds several stress views that were missing or incomplete in P20.</p><p>The turnover-cap diagnostic blends previous weights toward new target weights. With a 50 percent turnover cap, the resulting portfolio can hold more than five assets because old positions decay rather than disappearing immediately. This is not a pure top-five portfolio. It is better described as post-score turnover-smoothed weights.</p><p>Selected turnover-cap results:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qqUE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qqUE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 424w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 848w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 1272w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qqUE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png" width="1411" height="608" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qqUE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 424w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 848w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 1272w, https://substackcdn.com/image/fetch/$s_!qqUE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d62d15-aaf7-44d8-a536-82d5bc498968_1411x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The stress result is favorable for TabPFN and XGBoost, but it changes portfolio meaning. A turnover-smoothed portfolio is not the same object as a strict top-five monthly portfolio. That is why the notebook saves the turnover-cap outputs separately.</p><p>The tax-drag sensitivity applies additional drag per unit of turnover. It is not a tax engine. It is a way to ask whether high-turnover strategies are fragile to additional frictions.</p><p>For TabPFN direct, CAGR falls from 18.3 percent with no extra tax drag to 14.8 percent with 50 basis points of tax drag per turnover. For XGBoost final, CAGR falls from 16.0 percent to 11.0 percent under the same stress. The larger drop for XGBoost is consistent with its higher annual turnover.</p><p>The weight-constraint artifact confirms that the main top-five equal-weight portfolios have 20 percent maximum asset weights and gross leverage of 1.0. It also shows that turnover is not trivial. XGBoost final has 52 months above 50 percent turnover and 17 months above 100 percent turnover. TabPFN direct has 24 months above 50 percent turnover and 5 months above 100 percent turnover. This is one reason portfolio diagnostics are necessary even when row-level AP looks good.</p><h3>Benchmark-relative diagnostics</h3><p>The notebook also compares strategy returns against benchmark portfolios such as 60/40 SPY/TLT. Against 60/40, TabPFN direct has annualized active return of about 10.5 percent, tracking error of about 11.8 percent, and information ratio of about 0.89. XGBoost final has annualized active return of about 8.9 percent, tracking error of about 12.5 percent, and information ratio of about 0.71.</p><p>These numbers are diagnostics, not mandate-ready active-risk controls. A real benchmark-relative allocation process would usually include explicit tracking-error budgets, sector or asset-class constraints, capacity assumptions, and ex ante risk models. Still, adding the benchmark-relative view is useful because it moves the evaluation closer to how allocation results are often discussed in practice.</p><h3>Calibration</h3><p>Calibration remains separate from ranking. If a score is only used to sort assets within a month, a monotonic calibration transform may leave the selected top-five set unchanged. If a score is described as a probability of top-group membership, probability quality matters. ECE means expected calibration error; in the notebook artifact used here, it is computed with 10 quantile bins.</p><p>The main holdout calibration view is:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!clWe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!clWe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 424w, https://substackcdn.com/image/fetch/$s_!clWe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 848w, https://substackcdn.com/image/fetch/$s_!clWe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 1272w, https://substackcdn.com/image/fetch/$s_!clWe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!clWe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png" width="1412" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfde929f-e676-4241-82b9-009085c44e1c_1412x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61505,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!clWe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 424w, https://substackcdn.com/image/fetch/$s_!clWe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 848w, https://substackcdn.com/image/fetch/$s_!clWe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 1272w, https://substackcdn.com/image/fetch/$s_!clWe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfde929f-e676-4241-82b9-009085c44e1c_1412x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>TabPFN direct has the best AP among the listed scorers, while XGBoost final has the lowest expected calibration error among the listed learned scorers. That is not a contradiction. Ranking quality and probability reliability are different properties.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uye5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uye5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 424w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 848w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uye5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png" width="1159" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1159,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110758,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uye5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 424w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 848w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Uye5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df4f867-6fcf-458e-87b7-afffee9a3fcb_1159x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The metadata-only logistic score is a useful caution. Its AP is good, but its Brier score, log loss, and ECE are poor. That means it can rank some positives above negatives while still producing poor probability estimates. A practitioner familiar with supervised ML will recognize this pattern: a model can be directionally useful as a score and still poorly calibrated as a probability model.</p><h3>Runtime and research cost</h3><p>Runtime is part of the research result. The direct TabPFN holdout scorer takes about 55 seconds in this run. XGBoost takes about 4,352 seconds because the workflow performs a GPU-accelerated XGBoost model-selection process. This is not a perfectly fair runtime contest because the workflows are doing different things. It is still practically relevant.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mIMR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mIMR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 424w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 848w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 1272w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mIMR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png" width="1337" height="915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1337,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mIMR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 424w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 848w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 1272w, https://substackcdn.com/image/fetch/$s_!mIMR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f534186-0377-4f3a-a907-60aeae7a6c93_1337x915.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For a practitioner, the TFM capability is attractive when quick direct scoring is valuable. XGBoost remains attractive when task-specific fitting, feature importance workflows, and controlled hyperparameter searches are important. I do not see these as mutually exclusive tools. In a serious research workflow, I would want both kinds of baselines.</p><h3>Uncertainty</h3><p>The notebook uses month-block bootstrap intervals. It resamples months rather than individual rows because rows from the same month share the same market environment and are tied together by the cross-sectional top-\(K\) target.</p><p>Selected row-level bootstrap intervals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AeHS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AeHS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 424w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 848w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 1272w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AeHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png" width="1413" height="671" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:671,&quot;width&quot;:1413,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:116497,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AeHS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 424w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 848w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 1272w, https://substackcdn.com/image/fetch/$s_!AeHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96de461d-3c9f-4311-be1d-d2f45a771e06_1413x671.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>Selected portfolio bootstrap intervals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A562!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A562!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 424w, https://substackcdn.com/image/fetch/$s_!A562!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 848w, https://substackcdn.com/image/fetch/$s_!A562!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 1272w, https://substackcdn.com/image/fetch/$s_!A562!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A562!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png" width="1412" height="410" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/197259706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A562!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 424w, https://substackcdn.com/image/fetch/$s_!A562!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 848w, https://substackcdn.com/image/fetch/$s_!A562!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 1272w, https://substackcdn.com/image/fetch/$s_!A562!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1ec462-8498-4f38-8248-7a7215b42157_1412x410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The intervals explain the cautious language. TabPFN has the best point estimates in the main diagnostics, but the uncertainty intervals overlap with strong baselines. This is evidence of competitiveness in one public-data holdout path, not proof of stable superiority.</p><h2>Known limitations</h2><p>This is still a public-data research workflow. It uses yfinance, public Cboe VIX history, and public FRED series. That is useful for reproducibility, but it is not equivalent to a licensed point-in-time institutional market data system. Vendor corrections, corporate-action controls, survivorship handling, data release timing, and operational monitoring are outside the scope of this post.</p><p>The liquidity, spread, and market-impact proxy artifact should not be interpreted in this run. The notebook created <code>portfolio_liquidity_impact_proxy.csv</code>, but the artifact values are all zero because the underlying dollar-volume and high-low-spread proxy fields were missing. I am leaving the issue documented rather than fixing it today because the main post can stand without that artifact. Any claim about liquidity, spread, or participation should wait for a corrected rerun.</p><p>The feature sensitivity section is not a full model-family sensitivity study. It uses Logistic Regression diagnostics to inspect feature-policy effects. A stronger version would rerun TabPFN, XGBoost, and any enabled TFM scorers across all feature variants under the same chronological validation design.</p><p>The multi-horizon and benchmark-relative sections are audits of existing one-month scores. They do not retrain the main models for 3-month, 6-month, or benchmark-relative objectives. A proper alternative-objective study would define those labels as the main targets and repeat model selection and validation.</p><p>The portfolio diagnostics are not production backtests. They do not include a real execution model, tax lots, mandate constraints, borrow constraints, capacity, market-impact estimation, ex ante risk models, or benchmark-relative optimization. They are useful for testing whether model scores survive simple allocation translation, not for approving live capital.</p><p>TabICL is not part of the final empirical comparison in this post. That is an engineering limitation of my current public notebook run, not a statement about TabICL as a model family. The guarded TabICL code path remains in the notebook, but the completed results here should be read as TabPFN, XGBoost, Logistic Regression diagnostics, and deterministic-rule results.</p><p>TFM embeddings are still out of scope. This post evaluates direct TabPFN scoring. It does not test whether TabPFN or TabICL embeddings improve a downstream XGBoost, neural ranking, or portfolio optimization workflow.</p><p>Finally, my own goal in this series is learning by building. I am using the notebook to make assumptions visible and testable. The claims are intentionally limited to what this public experiment can support.</p><h2>Conclusion</h2><p>This follow-up makes the P20 tactical allocation experiment more demanding. The universe grows from 9 to 25 ETFs, the target becomes top 5 of 25, the execution convention changes to next-open-to-next-open, static identity features are removed from the main model matrix, and the portfolio diagnostics become more realistic.</p><p>Within this configuration, direct TabPFN produces the strongest completed row-level ranking result: AP 0.313 against a 0.200 base rate. It also leads the main portfolio diagnostic, with 18.3 percent CAGR, 1.12 Sharpe, and final equity of 2.86 over the 2020 to 2026 holdout. XGBoost and 12-month momentum remain competitive. The bootstrap intervals overlap enough that I would not claim stable superiority for any model family.</p><p>The most useful conclusion is methodological. A tabular foundation model can be inserted into a supervised financial tabular workflow as a direct scorer, but it still has to be evaluated like any serious model: with chronological splits, leakage checks, baseline rules, calibration diagnostics, uncertainty intervals, feature sensitivity, execution assumptions, and portfolio translation. The newer capability is the pretrained in-context scoring mechanism. The older discipline of supervised ML and financial validation is still necessary.</p>]]></content:encoded></item><item><title><![CDATA[[P20] Tactical asset allocation with TabPFN, TabICL, and XGBoost]]></title><description><![CDATA[Given information available at a monthly signal date, can TabPFN, TabICL, XGBoost, and simple allocation rules rank assets in a public ETF universe by next-month relative attractiveness?]]></description><link>https://newsletter.dsaiengineering.com/p/p20-tactical-asset-allocation-with-tabpfn-tabicl-xgboost</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p20-tactical-asset-allocation-with-tabpfn-tabicl-xgboost</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Fri, 08 May 2026 21:30:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fofE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues the quantitative-finance part of my tabular foundation model (TFM) series. After the volatility-regime forecasting notebooks in <a href="https://open.substack.com/pub/dsaiengineering/p/p18-volatility-regime-forecasting-tabpfn-tabicl--classical-tabular-models?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P18</a> and <a href="https://open.substack.com/pub/dsaiengineering/p/p19-volatility-regime-forecasting-tabpfn-tabicl-classical-ml-models-2?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P19</a>, I am moving from single-market risk scoring to a cross-asset allocation question. The practical question is:</p><blockquote><p>Given information available at a monthly signal date, can TabPFN, TabICL, XGBoost, and simple allocation rules rank assets in a public ETF universe by next-month relative attractiveness?</p></blockquote><p>More precisely, I build a supervised tabular workflow that converts monthly tactical asset allocation into a cross-sectional top-group classification problem. Each row is an asset-month observation. The label is whether that asset is among the next-month top three assets in a nine-ETF universe. The models are evaluated as allocation scorers, not as price forecasters. A useful score should help rank assets within each month, remain credible under chronological validation, and survive comparison with simple momentum and benchmark allocation rules.</p><p>This is a useful place to test direct TFM scoring because the input-output table is familiar to a supervised ML practitioner, but the workflow mechanism is different. XGBoost learns task-specific parameters from the labelled allocation table. TabPFN and TabICL use labelled rows as context for a pretrained tabular learner and then score later query rows. The notebook therefore asks whether that context-based prediction capability remains useful once the workflow includes chronological splits, leakage checks, calibration diagnostics, deterministic finance rules, and portfolio diagnostics.</p><p>The main result is deliberately modest. Direct TabICL and TabPFN are competitive row-level rankers, with holdout Average Precision around 0.409 against a 0.333 base rate. XGBoost has lower row-level Average Precision in this run but stronger monthly portfolio diagnostic point estimates. The 12-month momentum rule remains a serious baseline. The uncertainty intervals overlap enough that I do not treat the exercise as a model-ranking contest. The useful conclusion is that direct tabular foundation model scoring can be evaluated seriously inside a tactical allocation workflow, but classical GPU-accelerated tabular ML and simple finance rules still need to be part of the comparison.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fofE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fofE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!fofE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!fofE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!fofE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fofE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png" width="1230" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fofE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!fofE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!fofE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!fofE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61a4c042-8fc0-47b8-9604-cea4839c7000_1230x862.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I treat this as a learning-by-building research workflow. The goal is to make the modeling choices explicit enough that a data or AI practitioner can inspect them, and modest enough that a quant finance professional would recognize what is still outside scope. I am not trying to present a finished allocation system. I am trying to build a careful public example of how TFMs can be placed inside a realistic validation and diagnostic structure.</p><p>You can find the notebook in my GitHub repository <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260508-tabpfn-tabicl-tactical-asset-allocation.assets/tabpfn-tabicl-tactical-asset-allocation-20260508.ipynb">here</a>, or you can <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-tactical-asset-allocation-20260508">clone it directly</a> on Kaggle. The notebook is meant to be run with GPU enabled, and the results discussed in this post come from the full research configuration.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Background and scope</h2><h3>Series concepts reused here</h3><p>The most useful prerequisites from earlier posts are P4, P12, P14, P15 to P17, and P18 to P19. I use three ideas from those posts here. First, TabPFN and TabICL can be viewed operationally through a posterior-predictive or in-context prediction lens: labelled rows provide task context for later query rows. Second, time-indexed market problems should be converted into supervised tables without breaking chronological validation. Third, model quality in an applied workflow is not one number; ranking, calibration, runtime, leakage, drift, and downstream decision diagnostics can disagree.</p><h3>The allocation task</h3><p>This post keeps the same supervised tabular framing but changes the finance task. Volatility-regime forecasting asked whether a single market object, SPY, was likely to enter a high-risk state. Tactical asset allocation asks a relative allocation question across several assets:</p><blockquote><p>Given information available at a monthly signal date, can a model rank assets in a public ETF universe by their next-month attractiveness?</p></blockquote><p>In the notebook, &#8220;attractiveness&#8221; is defined narrowly and mechanically. An asset is attractive for a given month if its next-month total return is in the top group within the configured ETF universe. This is not the only way to define tactical allocation. A production team might care about excess return over cash, drawdown risk, risk-adjusted return, benchmark-relative active return, mandate constraints, or tax-aware turnover. I use the top-group definition because it turns the allocation problem into a clear supervised tabular classification task while preserving the cross-asset ranking structure that matters for allocation.</p><p>The shift from volatility-regime forecasting to tactical allocation changes the interpretation of the supervised task. In volatility-regime forecasting, each row was a market date and the positive class was &#8220;future volatility is high.&#8221; The score was naturally an alert-prioritization or risk-attention score. In tactical allocation, each row is an asset-month pair. The positive class is not &#8220;this asset has a good absolute return.&#8221; It is &#8220;this asset belongs to the top group among the assets available in that month.&#8221; This makes the target relative. The model is not asked to forecast the whole market direction. It is asked to help rank assets within the same market environment.</p><p>This is closer to many practical allocation and cross-sectional equity workflows than a single-asset price forecast. A portfolio process often needs to decide how to distribute capital across alternatives, not only whether one asset&#8217;s return will be positive. Ranking, relative strength, turnover, and allocation stability therefore become central parts of the evaluation.</p><h3>Mathematical setup</h3><p>Let \(i\) index assets in a universe of size \(N\), let \(N_t\) denote the number of assets available at month \(t\), and let \(t\) index monthly signal dates. Let \(P_{i,t}\) be the adjusted close price of asset \(i\) at the end of month \(t\). The next-month simple return is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R_{i,t+1}\n\n= \\frac{P_{i,t+1}}{P_{i,t}} - 1.&quot;,&quot;id&quot;:&quot;XOCPZYALLZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>At signal date \(t\), \(R_{i,t+1}\) is unknown. It is the future outcome used to define the training label and later evaluate the score. The feature vector for asset \(i\) at date \(t\) is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x_{i,t}\n\n= \\phi_i(\\mathcal{F}_t),&quot;,&quot;id&quot;:&quot;AQYXHWUYKX&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\mathcal{F}_t\) denotes the information available at or before the signal date, and \(\phi_i(\cdot)\) is the feature-generation process for that asset and market state. In the notebook, examples include recent asset returns, realized volatility, downside volatility, drawdown, moving-average distance, volume z-scores, beta to SPY, VIX features, Treasury yields, yield-curve features, availability-checked public macro features, and static asset metadata.</p><p>For each month, the assets are ranked by their next-month returns. If \(K\) is the number of assets selected as the positive group, the label is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{i,t}\n\n= \\mathbf{1}\\{\n\nR_{i,t+1}\n\n\\text{ is among the top } K\n\n\\text{ returns in month } t+1\n\n\\}.&quot;,&quot;id&quot;:&quot;LTEDXEDHMG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(y_{i,t}=1\) means that asset \(i\) was in the next-month top group, and \(y_{i,t}=0\) means it was not. The indicator \(\mathbf{1}\{\cdot\}\) returns 1 when the condition is true and 0 otherwise.</p><p>The supervised learning table is then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}\n\n= \\{(x_{i,t}, y_{i,t})\\}_{i,t}.&quot;,&quot;id&quot;:&quot;UKICQYYRHT&quot;}" data-component-name="LatexBlockToDOM"></div><p>The label construction implies: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{i=1}^{N_t} y_{i,t} = K,\n\n\\quad\n\n\\bar{y}_t = \\frac{K}{N_t}.&quot;,&quot;id&quot;:&quot;DOMXMXHEVQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\bar{y}_t\) is the positive-class rate in month \(t\). In the full holdout run, \(N_t=9\) and \(K=3\), so the monthly base rate is \(3/9=0.333\). This is why the notebook treats 0.333 as the non-informative Average Precision reference.</p><p>This notation is close to earlier posts, but there is one important difference. In the volatility-regime posts, the row index was essentially one date \(t\), because the target was a single market-level event for SPY. In this notebook, the row index is a pair \((i,t)\), because each month contributes one row per asset. This turns the problem into a panel-style tabular dataset: many assets observed through time, with a target that is defined relative to other assets in the same month.</p><h3>What the score means</h3><p>A price forecast would ask for a prediction such as: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{P}_{i,t+1}\n\n\\quad \\text{or} \\quad\n\n\\hat{R}_{i,t+1}.&quot;,&quot;id&quot;:&quot;EDKQDAOZTO&quot;}" data-component-name="LatexBlockToDOM"></div><p>That is not what this notebook does. The model produces a score: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_{i,t}\n\n\\approx\n\n\\mathbb{P}(y_{i,t}=1 \\mid x_{i,t}, \\mathcal{D}_{\\text{train}}).&quot;,&quot;id&quot;:&quot;LQMALSMQVU&quot;}" data-component-name="LatexBlockToDOM"></div><p>The score can be read as a probability estimate if calibration is adequate, or more conservatively as a ranking score. In the allocation diagnostic, the practical use is ranking: for each month, sort assets by \(s_{i,t}\), select the top \(K\), and allocate equal weight to those selected assets.</p><p>This distinction matters. A high score does not say that an asset must go up next month. It says that, according to the fitted workflow, the asset has features associated with membership in the next-month top group within the configured universe. An asset can have a high score in a month where every asset loses money. Conversely, an asset can have a positive next-month return and still not be in the top group if several other assets did better.</p><p>That is why this notebook evaluates the workflow through both prediction diagnostics and allocation diagnostics. Average Precision and ROC AUC summarize row-level classification quality. Monthly rank correlation and top-\(K\) hit rate ask whether the scores are useful within each monthly cross-section. Portfolio diagnostics ask what happens when those scores are translated into a simple allocation rule with transaction costs.</p><h3>Chronological validation and leakage</h3><p>Financial data is ordered in time, so random train-test splits are inappropriate. They can leak future regimes into model selection and make a model look more stable than it would be in a real research process.</p><p>The core rule is the same as in P18 and P19:</p><blockquote><p>At signal date \(t\), features may use information available at or before \(t\), but the label uses returns after \(t\).</p></blockquote><p>The notebook uses chronological windows. Earlier data is used for model selection, a later pre-holdout window is used for calibration diagnostics, and the final period is reserved for holdout evaluation. Feature filtering and imputation are fitted before the calibration and holdout windows. That matters because even preprocessing can leak information if it is fitted on the full dataset.</p><p>The target also has a cross-sectional leakage risk. The label for asset \(i\) in month \(t\) is defined by comparing \(R_{i,t+1}\) with the other assets&#8217; \(R_{j,t+1}\) in the same future month. That future cross-sectional ranking is allowed for label construction, but none of those future returns can enter the feature vector \(x_{i,t}\). The notebook therefore excludes forward returns, forward excess returns, future ranks, and target columns from model features.</p><p>This design is still a public-data research workflow, not a production market-data system. Public ETF data from yfinance, public VIX history, and public FRED series are useful for reproducibility, but they do not provide the same guarantees as a licensed point-in-time institutional data store. That limitation is part of the reason the notebook includes an explicit leakage and reuse checklist.</p><h3>XGBoost versus direct TFMs</h3><p>The input-output table is the same for XGBoost and the direct TFM scorers: rows contain features \(x_{i,t}\), and labels contain top-\(K\) membership \(y_{i,t}\). The difference is how each model uses that table. For a classical supervised model such as XGBoost, fitting means learning task-specific parameters from the current labelled dataset. Conceptually, the model is selected from a function class \(\mathcal{G}\) by minimizing a training objective:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n= \\arg\\min_{f \\in \\mathcal{G}}\n\\sum_{(x_{i,t},y_{i,t}) \\in \\mathcal{D}_{\\text{train}}}\n\\ell(y_{i,t}, f(x_{i,t})).&quot;,&quot;id&quot;:&quot;RAVSFFWUNW&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{G}\) is the candidate class of scoring functions, \(\ell\) is the loss function, and \(\hat{f}\) is the fitted allocation scorer. Hyperparameter tuning then selects a configuration using chronological validation folds. In the notebook, XGBoost is the main classical supervised benchmark because it can run on GPU and is a strong practical baseline for tabular data.</p><p>For TabPFN and TabICL, the workflow is different. The large model is already pretrained. The labelled rows supplied through <code>.fit()</code> are better understood as task context rather than as the same kind of task-specific weight training used by XGBoost. In notation similar to P4 and P18, the prediction is closer to:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{P}(y_{\\text{new}} \\mid x_{\\text{new}}, X_{\\text{context}}, y_{\\text{context}}),&quot;,&quot;id&quot;:&quot;HZPVZUHHWF&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(x_{\text{new}}\) is a query row, \(y_{\text{new}}\) is the unknown label, \(X_{\text{context}}\) is the context feature matrix, and \(y_{\text{context}}\) is the corresponding vector of context labels.</p><p>This difference is the main reason to test tabular foundation models in this kind of notebook. A standard model such as XGBoost asks whether a task-specific learner can find a useful mapping from engineered market features to the top-\(K\) label. A tabular foundation model asks a related but different question: whether a pretrained in-context tabular learner can use the labelled allocation history as context and produce useful scores for later asset-month rows without the same task-specific training process.</p><p>The new capability being exercised here is therefore not that TFMs remove the need for careful data construction. They do not. The capability being tested is whether a pretrained tabular learner can use a relatively small labelled context and produce useful task-specific scores without the same hyperparameter-search and task-specific fitting procedure used by XGBoost. That is why the notebook still includes deterministic rules, chronological validation, calibration checks, and portfolio diagnostics. The TFM component is only one part of the workflow.</p><p>This is not a claim that tabular foundation models are automatically better for allocation. The value has to be demonstrated empirically against strong baselines, simple finance rules, runtime costs, and validation constraints. In this notebook, TabPFN and TabICL are evaluated as direct scorers. They are not used as embedding generators in the default path. That keeps today&#8217;s comparison focused on a simpler operational question: can direct TFM scoring be useful in a monthly allocation workflow?</p><h3>Evaluation scope</h3><p>The row-level classification target is imbalanced because only the top \(K\) assets in each month receive label 1. If the universe has \(N\) assets and all assets are available, the base rate is approximately \(K/N\). Accuracy is therefore not a very informative primary metric. A model can look accurate by mostly predicting the majority class and still be unhelpful for allocation.</p><p>Average Precision is useful because it evaluates the ranked list of asset-month rows by summarizing precision as recall increases. Precision is the fraction of selected or flagged rows that are true positives, and recall is the fraction of all true positives that have been selected or flagged. For a non-informative ranking, the expected precision is close to the positive-class base rate, which is why the 0.333 top-k label rate is the main reference point here. ROC AUC is also reported because it asks how often a randomly chosen positive row is ranked above a randomly chosen negative row. For allocation, however, row-level metrics are not sufficient. The model is used month by month, so the cross-sectional ordering inside each month matters.</p><p>A monthly rank diagnostic asks whether high scores correspond to higher next-month returns within the same month. One summary is Spearman rank correlation between scores and realized next-month returns:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_t\n\n= \\operatorname{corr}_{\\text{Spearman}}\n\n\\left(\n\n\\{s_{i,t}\\}_{i=1}^{N_t},\n\n\\{R_{i,t+1}\\}_{i=1}^{N_t}\n\n\\right).\n\n&quot;,&quot;id&quot;:&quot;UTJPCAVCEZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(N_t\) is the number of available assets in month \(t\). A positive value means that, in that month, higher model scores tended to align with higher next-month returns. This is only one diagnostic; it does not include transaction costs or portfolio constraints.</p><p>The allocation diagnostic converts scores into portfolio weights. In the simple top-\(K\) equal-weight rule used here:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;w_{i,t}\n\n=\n\n\\begin{cases}\n\n1/K, &amp; \\text{if asset } i \\text{ is selected in the top } K \\text{ by score at } t, \\\\\n\n0, &amp; \\text{otherwise.}\n\n\\end{cases}&quot;,&quot;id&quot;:&quot;SAYORNQLGJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The next-month portfolio return before costs is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;R^{p}_{t+1}\n\n= \\sum_i w_{i,t} R_{i,t+1}.&quot;,&quot;id&quot;:&quot;ELITOGESOG&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notebook also subtracts a simple transaction-cost term based on absolute weight turnover. If \(c\) is the transaction-cost rate and:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau_t\n\n= \\sum_i |w_{i,t} - w_{i,t-1}|,&quot;,&quot;id&quot;:&quot;TIMUZHUFVW&quot;}" data-component-name="LatexBlockToDOM"></div><p>then the net diagnostic return is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\widetilde{R}^{p}_{t+1}\n\n= R^{p}_{t+1} - c \\tau_t.\n\n&quot;,&quot;id&quot;:&quot;FDKLINIXCO&quot;}" data-component-name="LatexBlockToDOM"></div><p>The first month is treated as an initial allocation in the notebook. This does not make the diagnostic a production backtest. It is a controlled way to ask whether the score remains interesting after a basic cost penalty and monthly reallocation mechanics are included.</p><p>Calibration is a separate question. If the score is used only to rank assets within a month, a monotonic transformation of the score may not change the allocation. If the score is interpreted as: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{P}(y_{i,t}=1 \\mid x_{i,t}),&quot;,&quot;id&quot;:&quot;DHBYWQGZFN&quot;}" data-component-name="LatexBlockToDOM"></div><p>then probability quality matters. The notebook therefore reports Brier score, log loss, expected calibration error, and reliability-bin artifacts. Brier score is the mean squared error of predicted probabilities, log loss penalizes confident wrong probabilities, and expected calibration error compares binned predicted probabilities with observed frequencies. As in previous posts, I treat calibration diagnostics as complementary to ranking diagnostics rather than as replacements for them.</p><p>With these diagnostics defined, the scope of the post is intentionally narrow. This is a first tactical asset-allocation workflow test. It is in scope to ask whether deterministic allocation rules, XGBoost, direct TabPFN, and direct TabICL produce useful next-month top-group scores under chronological validation.</p><p>It is also in scope to inspect row-level classification quality, monthly rank quality, calibration, runtime, month-block bootstrap uncertainty, feature drift, leakage checks, and a simple transaction-cost-aware allocation diagnostic.</p><p>It is not in scope to claim a trading strategy, forecast exact asset prices, optimize a production portfolio, account for taxes, model liquidity in detail, or produce a definitive benchmark for financial ML. </p><p>The useful contribution is narrower: a reproducible workflow for testing tabular foundation models as components inside a realistic, but still public-data-based, tactical allocation research pipeline.</p><h2>Code discussion and results</h2><h3>Dataset and chronological splits</h3><p>The notebook builds a monthly panel from nine liquid ETFs: SPY, QQQ, IWM, TLT, IEF, GLD, HYG, EEM, and VNQ. Each row is an asset-month observation. The target is whether that asset is among the next-month top three assets in the universe. With nine assets and three positives per full month, the natural class base rate is close to 33.3 percent. This base rate is important when reading the results: an Average Precision near 0.33 is approximately what a non-informative scorer should achieve, while values above that level indicate some ranking information.</p><p>The final modeling frame contains 2,172 asset-month rows across 243 monthly signal dates. The holdout window contains 675 rows across 75 months from January 2020 through March 2026. This period is a useful stress period for evaluation because it includes the COVID shock, the 2022 inflation and rate-hiking cycle, the subsequent recovery, and several large cross-asset rotations. It is also a difficult period for stable model selection, so the results should be interpreted as evidence from one demanding holdout window rather than as a universal ranking of methods.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JuGb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JuGb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 424w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 848w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 1272w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JuGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png" width="1411" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71616,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JuGb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 424w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 848w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 1272w, https://substackcdn.com/image/fetch/$s_!JuGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db06637-69fb-40fd-8d44-75fbbbc4fad6_1411x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The split design is deliberately chronological. Model selection uses rolling validation folds inside the pre-holdout history, calibration is evaluated after the selection window, and the final holdout remains untouched until the final scoring step. The feature policy is also fitted before calibration and holdout evaluation. That detail matters because feature filtering, missingness decisions, and imputation can otherwise leak holdout distribution information into the model.</p><h3>Feature construction and data availability</h3><p>The full run starts from market, volatility, rate, volume, relative-strength, drawdown, beta, and asset-identity features. The feature policy keeps candidate features only if they have enough non-missing observations in the model-selection source window and do not exceed the configured missingness threshold. After the FRED availability gate, 209 numeric candidate features are reviewed, all 209 base features are retained, 187 missingness indicators are added, and the final model matrix contains 396 columns.</p><p>This is a practical compromise. Public market and macro series are useful for reproducibility, but they are not equivalent to a licensed point-in-time data warehouse. A concrete example is credit-spread data. In this run, five FRED series have enough selection-window coverage and are included in features. The two public OAS series attempted through FRED, <code>BAMLH0A0HYM2</code> and <code>BAMLC0A0CM</code>, start only on 2023-05-09 in the downloaded file and have zero non-missing observations in the model-selection window. The notebook records their availability and excludes them from the model matrix rather than silently imputing unavailable credit-spread history.</p><p>The static asset-identity signal is encoded through ticker indicators rather than an ordinal numeric <code>asset_id</code>. This avoids giving the model an artificial numeric ordering among ETFs. A fixed identity signal is still a modeling choice: it lets the model learn persistent differences between asset classes, but it also means the result is tied to the configured universe rather than to a fully universe-agnostic allocation rule.</p><h3>Scorers compared</h3><p>The notebook compares five scorer families. The deterministic rules provide finance-domain references, including momentum and low-volatility rules. The dummy prior provides a non-informative classification reference at the 0.333 base rate, but it is excluded from portfolio diagnostics because constant scores create arbitrary top-k tie-breaking. XGBoost is the GPU-accelerated classical supervised learner with rolling chronological model selection and calibration variants. TabPFN and TabICL are evaluated as direct allocation scorers: they receive labelled asset-month rows as context and score later holdout rows without the same task-specific fitting process used by XGBoost.</p><h3>Row-level ranking results</h3><p>The first result view is row-level classification performance on the holdout period. Average Precision is the main metric because the target is a top-group label with a 33.3 percent base rate. ROC AUC, Brier score, expected calibration error, and runtime are shown here because they answer different questions. The notebook also saves log loss in the full artifact table.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JdAL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JdAL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 424w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 848w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 1272w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JdAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png" width="1412" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112092,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JdAL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 424w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 848w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 1272w, https://substackcdn.com/image/fetch/$s_!JdAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04fe68ea-217c-43c6-a70e-234f9416d2e3_1412x760.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The direct TabICL and TabPFN scores have the highest holdout Average Precision point estimates, both around 0.409. This is above the 0.333 base rate and above the deterministic momentum rules. The magnitude should be read carefully, though. The lift is meaningful for a difficult monthly cross-sectional task, but it is not large enough to support a strong claim of stable model-family superiority.</p><p>XGBoost illustrates why several metrics are needed. Its calibrated variant has a lower Average Precision point estimate than TabPFN and TabICL, but its ROC AUC is slightly higher. Average Precision focuses more directly on the positive class and on early ranking quality, while ROC AUC averages pairwise ranking behavior over the whole score distribution. In an allocation workflow that selects only the top few assets each month, Average Precision and monthly top-k diagnostics are usually more directly relevant than ROC AUC alone.</p><p>The deterministic rules are important baselines. The 12-month momentum rule is competitive with the learned models, which is not surprising in an asset-allocation setting. A learned model that cannot improve on simple relative-strength rules would not be very persuasive. The low-volatility rule performs poorly in this particular holdout window, which is also informative: it shows that not every simple financial heuristic is rewarded in the 2020-2026 regime.</p><p>The dummy prior is retained for classification calibration context, but it is not used as a portfolio scorer. A constant-prior score creates ties across all assets; converting those ties into a top-k portfolio would depend on arbitrary tie-breaking and can produce misleading deterministic selections.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TL-7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TL-7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TL-7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png" width="1213" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1213,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119747,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TL-7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 424w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 848w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 1272w, https://substackcdn.com/image/fetch/$s_!TL-7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58b75b20-2a41-4e2e-9624-db8c0982f813_1213x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The precision-recall figure shows the same result visually. The horizontal reference line is the holdout base rate. Curves that stay above this line are ranking positives better than a non-informative scorer. The TabPFN and TabICL curves sit modestly above the baseline over useful parts of the recall range, but they are not separated enough from the other competitive methods to justify strong superiority language.</p><h3>Monthly cross-sectional ranking results</h3><p>Row-level classification metrics pool all asset-month rows together. Tactical allocation is implemented month by month, so the notebook also evaluates whether each scorer ranks assets well inside each monthly cross-section. The monthly diagnostics include mean Spearman information coefficient, top-k hit rate, average selected next-month return, average universe next-month return, and the difference between selected and universe returns.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!63Zx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!63Zx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 424w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 848w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 1272w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!63Zx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png" width="1414" height="877" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:877,&quot;width&quot;:1414,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134875,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!63Zx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 424w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 848w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 1272w, https://substackcdn.com/image/fetch/$s_!63Zx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcac8e11-4585-48d6-8dd2-79221bf366ae_1414x877.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This view changes the emphasis. XGBoost final has the strongest monthly rank diagnostic point estimates in the full run, even though TabPFN and TabICL have stronger row-level Average Precision point estimates. That is not a contradiction. Average Precision pools rows across the full holdout. Monthly rank diagnostics ask a narrower allocation question: in each month, did the scorer put the better next-month assets closer to the top of that month&#8217;s list?</p><p>The 12-month momentum rule remains a serious baseline here as well. Its mean monthly IC and top-k hit rate are close to the direct TabPFN result. For a professional reader, this is an important result rather than an inconvenience. In tactical allocation, a model has to be compared against simple, interpretable allocation rules that already encode domain structure.</p><p>The negative low-volatility result is also useful. It indicates that this holdout period rewarded relative strength and risk-on selection more than defensive low-volatility selection. That interpretation is period-specific; it should not be generalized into a permanent statement about low-volatility investing.</p><h3>Portfolio translation</h3><p>The portfolio diagnostic converts each score into a simple monthly top-three equal-weight portfolio and subtracts a transaction-cost assumption of 5 basis points per unit of absolute weight turnover. This is not a production backtest. It does not model taxes, market impact, execution timing beyond the close-to-close convention, fund-specific trading costs, mandate constraints, or liquidity rules. It is a controlled diagnostic for asking whether a score remains useful after it is translated into a repeatable allocation rule.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RKUa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RKUa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 424w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 848w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 1272w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RKUa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png" width="1210" height="905" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:905,&quot;width&quot;:1210,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:142876,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RKUa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 424w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 848w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 1272w, https://substackcdn.com/image/fetch/$s_!RKUa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe621e633-bd2d-480d-8800-d9108cbdccf4_1210x905.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The portfolio table gives a different perspective from the row-level model table. XGBoost final has the strongest portfolio diagnostic point estimates in this run, with a 18.1 percent CAGR, 1.03 Sharpe ratio, and final equity of 2.83 over the holdout period. SPY-only has a high final equity of 2.44, but it is a concentrated single-asset benchmark and has a lower Sharpe ratio than the XGBoost final diagnostic portfolio. The 12-month momentum rule remains strong and is a serious reference point for the learned models.</p><p>The direct TabPFN and TabICL portfolios finish above initial capital and above equal weight in final equity, but they do not dominate the stronger classical and deterministic allocation baselines. Their higher row-level Average Precision does not automatically translate into the strongest allocation outcome. This is why the notebook reports several diagnostic layers. A scorer can rank positives well in aggregate, yet still produce a higher-turnover or less stable top-k portfolio.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tfrn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tfrn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tfrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png" width="1230" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182574,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tfrn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 424w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 848w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 1272w, https://substackcdn.com/image/fetch/$s_!Tfrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67704a77-5b44-4984-9a4f-d0f242c87aa1_1230x862.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The equity-curve figure helps separate level, volatility, and drawdown behavior. The relevant reading is not only which line ends highest. A professional allocation review would also ask whether the path is stable, whether performance is concentrated in a few months, how much turnover is required, and whether a simpler benchmark offers a similar result with less operational complexity. The figure plots the final XGBoost scorer as well as its calibrated counterpart because the portfolio table shows that the distinction matters. The notebook also excludes the constant-prior dummy and duplicated deterministic rule portfolios from this score-driven portfolio view. In the post table, I present one calibrated XGBoost portfolio view when calibration-base and calibrated selections produce the same monthly allocation path.</p><h3>Calibration and probability quality</h3><p>Calibration is evaluated separately from ranking. If scores are used only to sort assets within a month, a monotonic calibration transform may leave the top-k selection unchanged. If scores are presented as probabilities of next-month top-group membership, then calibration becomes part of the model quality assessment.</p><p>The XGBoost calibrated score improves the Brier score from 0.21875 to 0.21797 and expected calibration error from 0.0520 to 0.0366, while leaving Average Precision unchanged at 0.3877 for the calibrated-base score. This is the expected behavior for a monotonic probability calibration method: it can improve probability reliability without changing the ranking order materially.</p><p>TabPFN direct and TabICL direct have similar Brier scores around 0.219. Their expected calibration errors are about 0.045 and 0.061, respectively. These are not poor numbers for this task, but they also do not support an interpretation that the TFM scores are already production-grade probabilities. In this workflow, the most defensible interpretation of the scores is as ranking scores, with calibration diagnostics reported for transparency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R70C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R70C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 424w, https://substackcdn.com/image/fetch/$s_!R70C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 848w, https://substackcdn.com/image/fetch/$s_!R70C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 1272w, https://substackcdn.com/image/fetch/$s_!R70C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R70C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png" width="1143" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1143,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R70C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 424w, https://substackcdn.com/image/fetch/$s_!R70C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 848w, https://substackcdn.com/image/fetch/$s_!R70C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 1272w, https://substackcdn.com/image/fetch/$s_!R70C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F024321bf-411d-4f66-8508-b8799c2e227c_1143x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The calibration figure compares predicted probability bins with observed positive rates. A perfectly calibrated model would lie close to the diagonal. The zoomed view is useful because most probabilities are concentrated in a relatively narrow range rather than across the full 0 to 1 interval. That concentration is reasonable in this task: with three positives out of nine assets each month, even useful scores should not usually produce extreme probabilities.</p><h3>Runtime and research cost</h3><p>Runtime is part of the research result, not just an engineering detail. XGBoost performs a broad randomized hyperparameter search under rolling chronological validation, so its workflow time is much longer than the direct TFM scorers in this run. TabPFN direct takes about 25 seconds and TabICL direct about 13 seconds, while the full XGBoost search takes about 39 minutes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RCkJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RCkJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 424w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 848w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 1272w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RCkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png" width="1337" height="915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1337,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88494,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RCkJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 424w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 848w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 1272w, https://substackcdn.com/image/fetch/$s_!RCkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4a447f-0dee-4acc-a693-9259482916b4_1337x915.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The runtime versus Average Precision figure shows the main engineering tradeoff. Direct TabPFN and TabICL provide competitive Average Precision at much lower workflow time. XGBoost is slower because it is being tuned more extensively, but in this run its final scorer produces the strongest portfolio diagnostic point estimates. The practical implication is not that one method is always better. It is that the right model choice depends on the research objective: quick direct scoring, probability quality, portfolio behavior, interpretability, or a more exhaustive supervised tuning process.</p><h3>Uncertainty, drift, and leakage</h3><p>The notebook uses month-block bootstrap intervals so that uncertainty is resampled at the month level rather than by treating individual asset rows as independent observations. This matters because rows from the same month share the same market regime and are linked by the top-k target definition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FkcV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FkcV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 424w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 848w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 1272w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FkcV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png" width="1212" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1212,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:117185,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196948489?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FkcV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 424w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 848w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 1272w, https://substackcdn.com/image/fetch/$s_!FkcV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28d7729-be8c-4aaa-b11e-fd5b2282ad5d_1212x716.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The intervals are the main reason to avoid strong superiority language. TabICL and TabPFN have the highest AP point estimates, but their AP intervals overlap with the intervals for momentum and XGBoost. XGBoost final has the highest monthly IC and top-k hit-rate point estimates, and its IC interval is positive in this run, but it still should be read as holdout evidence rather than proof of a stable trading edge.</p><p>The low-volatility rule has the least favorable diagnostic profile in this comparison. Its Average Precision interval is below most other methods, and its mean monthly IC is negative. Even there, the correct interpretation is period-specific: the holdout regime did not favor this defensive rule under the notebook&#8217;s top-k target and universe.</p><p>The feature-drift report shows substantial distribution changes between model-selection history and holdout. The largest population stability index values are concentrated in rate, yield-curve, beta, and volatility-related features. Examples include EEM beta to SPY, the 10-year minus 2-year yield curve, fed funds rate changes, Treasury yields, and ETF beta features.</p><p>This is not a defect in the notebook. It is the market environment the model has to face. A holdout from 2020 to 2026 is materially different from much of the 2006 to 2017 selection period. The implication is that model performance should be interpreted together with drift: a scorer that looks useful here has been evaluated through a demanding regime shift, but the same drift also warns against assuming stationarity.</p><p>The leakage checklist passes the main mechanical checks: target columns are excluded from the feature matrix, chronological splitting is preserved, feature policy and imputation are fitted before calibration and holdout evaluation, and next-month returns are used only as labels. The notebook also records the close-to-close convention. Features are computed from month-end values and next-month returns are measured from that same adjusted close to the next month-end adjusted close. This is a diagnostic convention rather than a fully executable live trading assumption. A production implementation would need an explicit trade-date and execution-price model.</p><h3>Result interpretation</h3><p>The empirical story is deliberately mixed. Direct TabPFN and TabICL have the strongest row-level Average Precision point estimates, which is useful evidence that direct TFM scoring can be competitive in this allocation framing. XGBoost final has the strongest monthly rank and portfolio diagnostic point estimates, which shows why a classical supervised baseline remains essential. The 12-month momentum rule remains close enough to the learned models that it should be treated as a serious domain benchmark rather than a weak baseline. The bootstrap intervals overlap materially, so the right reading is not that one model family wins. The right reading is that direct TFM scoring, GPU XGBoost, and simple allocation rules each answer part of the workflow question, and the final judgment depends on ranking quality, allocation behavior, calibration, runtime, uncertainty, and data limitations together.</p><h2>Known limitations</h2><p>The most important limitation is that this is a public-data research workflow, not an institutional production allocation system. The notebook uses yfinance, public Cboe VIX history, and public FRED files. That makes the workflow reproducible, but it does not provide the data lineage, point-in-time guarantees, vendor corrections, corporate-action controls, or operational monitoring that a production investment process would require.</p><p>The execution convention is also simplified. Features are computed from month-end close information, and the allocation diagnostic applies scores to the next close-to-close monthly return. The notebook documents this assumption, but it does not model when the month-end data becomes available, what price could actually be traded, whether a next-open execution convention would change the result, or how intraday slippage and market impact would affect rebalancing.</p><p>The universe is intentionally small. A nine-ETF universe is useful for a controlled notebook because the target is easy to inspect and the model behavior can be reviewed asset by asset. It is not large enough to represent the full complexity of a professional tactical allocation universe. The top-three label is also coarse: with nine assets, one third of the universe is labelled positive each month. A broader universe would make the ranking problem more realistic and would likely change the relative value of different models.</p><p>The portfolio diagnostic should not be read as a deployable backtest. It uses equal-weight top-k selection, a simple turnover-cost assumption, and a small number of benchmark portfolios. It does not include taxes, borrow constraints, fund-specific bid-ask spreads, market impact, capacity, mandate constraints, benchmark-relative risk budgets, turnover penalties during model selection, or drawdown-aware optimization. The diagnostic is useful for testing whether scores have allocation relevance; it is not sufficient for live portfolio approval.</p><p>The feature design includes ticker identity indicators and asset-group metadata. That is defensible for a fixed-universe allocator because the model may need to distinguish persistent asset-class behavior. It also limits the interpretation. Some of the signal may come from learning persistent ETF identities rather than from fully reusable market-state relationships. A stronger follow-up should include an identity-ablated run and compare how much performance remains when ticker identity is removed.</p><p>The uncertainty analysis is helpful but not definitive. Month-block bootstrap intervals are more appropriate than row-level independent resampling for this panel, but they still summarize only the observed holdout period. The 2020-2026 holdout contains important market regimes, yet it is still one historical path. The overlapping confidence intervals across TabPFN, TabICL, XGBoost, and momentum rules mean that the results should be interpreted as evidence of competitiveness, not as proof of stable model superiority.</p><p>Finally, the notebook evaluates direct TabPFN and TabICL scoring, not every possible way to use tabular foundation models. It does not test TFM embeddings, hybrid TFM-plus-XGBoost pipelines, larger universes, multi-horizon targets, benchmark-relative objectives, or constrained portfolio optimization. Those are natural extensions, but they are outside the scope of this post.</p><h2>Summary and conclusion</h2><p>This post treated tactical asset allocation as a supervised tabular ranking problem. The objective was not to forecast exact prices or claim a deployable trading strategy. The objective was narrower: build a reproducible public-data workflow that asks whether different scorers can identify the next-month top group within a small ETF universe under chronological validation.</p><p>The empirical picture is mixed in a useful way. Direct TabICL and TabPFN produce the strongest holdout Average Precision point estimates, around 0.409 versus a 0.333 base rate. That is consistent with the pretrained tabular foundation models using the labelled asset-month context in a way that contains some cross-sectional ranking information. At the same time, the portfolio diagnostic favors the final XGBoost scorer, and the 12-month momentum rule remains highly competitive. This is the kind of result I would expect from a serious workflow test: the newer model class can be useful in this setup, but it does not make strong classical baselines or simple domain rules irrelevant.</p><p>The uncertainty analysis also matters. The confidence intervals overlap materially across the main competitive methods. XGBoost has the strongest monthly IC and top-k hit-rate point estimates in the holdout, while TabICL and TabPFN have the strongest row-level Average Precision point estimates. These are not contradictory findings; they show that different evaluation views answer different questions. For allocation work, row-level ranking, monthly cross-sectional ranking, portfolio behavior, turnover, calibration, and drift should be read together.</p><p>My conclusion is that direct TabPFN and TabICL scoring deserves attention in applied financial tabular workflows, but the evidence should be presented carefully. In this notebook, TFMs are credible components in a tactical allocation research pipeline. They are not shown to be universally better than XGBoost or momentum rules, and the notebook does not establish live trading suitability. The stronger contribution is the workflow itself: chronological splits, leakage checks, GPU-first model comparison, saved artifacts, calibration diagnostics, transaction-cost-aware portfolio diagnostics, month-block uncertainty, and explicit documentation of public-data limitations.</p><h2>Outlook</h2><p>The next step is to make the allocation testbench more demanding. The current universe has nine ETFs and a top-three monthly target. That is enough for a controlled blog example, but it is still a small universe. A broader ETF universe, sector funds, international exposures, commodities, currencies, and duration buckets would create a more realistic cross-sectional ranking problem and make top-k selection less coarse.</p><p>The second direction is model-design sensitivity. The current notebook uses ticker identity indicators and asset-group metadata. That is defensible for a fixed-universe allocator, but it should be tested. A useful follow-up would compare the current feature set with an identity-ablated version, a metadata-only version, and a strictly time-series feature version. If performance depends heavily on persistent ticker identity, the interpretation is different from a model that generalizes mainly from market-state and asset-behavior features.</p><p>The third direction is portfolio realism. The current diagnostic includes monthly turnover costs, but it still uses a simple close-to-close convention. A stronger research version would include next-open or volume-weighted execution assumptions, wider transaction-cost sensitivity, liquidity filters, concentration limits, turnover constraints, drawdown-aware objectives, and benchmark-relative risk controls. Those additions would move the notebook closer to an institutional research prototype, although still not to a production trading system.</p><p>Finally, there is room to compare direct TFM scoring with the embedding workflows from earlier posts. In this notebook, TabPFN and TabICL are used as direct allocation scorers. A later version could test whether TFM embeddings improve a supervised XGBoost or neural ranking pipeline, and whether that hybrid approach is more stable than direct TFM scoring under regime drift. That would connect this allocation workflow back to the embedding-based fraud workflows from P16 and P17, while keeping the evaluation discipline from the volatility-regime and allocation posts.</p>]]></content:encoded></item><item><title><![CDATA[[P19] Volatility-Regime Forecasting with TabPFN, TabICL, and Classical Tabular Models - 2]]></title><description><![CDATA[Given the information available at the close of date t, can a model score whether SPY's realized volatility over the next 20 trading days will fall into a high-volatility regime?]]></description><link>https://newsletter.dsaiengineering.com/p/p19-volatility-regime-forecasting-tabpfn-tabicl-classical-ml-models-2</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p19-volatility-regime-forecasting-tabpfn-tabicl-classical-ml-models-2</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Fri, 08 May 2026 14:48:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aLF7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a href="https://open.substack.com/pub/dsaiengineering/p/p18-volatility-regime-forecasting-tabpfn-tabicl--classical-tabular-models?utm_campaign=post-expanded-share&amp;utm_medium=web">P18</a>, I discussed volatility-regime forecasting, where the question was:</p><blockquote><p>Given the information available at the close of date t, can a model score whether SPY&#8217;s realized volatility over the next 20 trading days will fall into a high-volatility regime?</p></blockquote><p>That kind of score can be useful in a business workflow in several ways. A portfolio team might use it to decide when to review gross exposure. A risk team might use it to prioritize stress checks. A model-risk team might use it to monitor whether market conditions are moving outside the regime where a strategy was validated. A product team building investing tools might use it as a risk-state feature rather than as a trading signal. The value is in improving the timing and quality of risk attention.</p><p>To explore this question, I built a supervised tabular workflow that used information available at the close date t to score whether SPY&#8217;s realized volatility over the next 20 trading days would be high. In that workflow, I compared the performance and workflow-related pros and cons of direct TabPFN and TabICL scoring, XGBoost on raw features, XGBoost on raw features plus TabPFN or TabICL embeddings, and simple volatility-domain rules.</p><p>The main result was that direct TabPFN and TabICL looked more useful as scoring models than the embedding-enhanced XGBoost workflows, while raw XGBoost and volatility-domain rules formed competitive baselines.</p><p>This post follows up on P18 while focusing on the business aspects. At the end of that post, I highlighted several limitations of the workflow. In this version, I addressed the most important ones: the validation window is less thin, the volatility-domain baselines are stronger, and the diagnostics now look at threshold sensitivity and market regimes more directly. Some limitations remain, especially around public data, overlapping targets, and regime-dependent validation.</p><p>Because the volatility-domain rules were so competitive in P18, I added four more rules to learn how much simple finance structure can explain before a learned model is worth the added complexity. This is important for business evaluation. If a simple volatility-domain rule gives almost the same risk-ranking value as a learned model for almost no expense, the learned model has to justify itself through a clearer alert queue, better robustness, better probability quality, better complementarity, or better downstream decision impact.</p><p>I also removed the embedding-enhanced XGBoost workflows from this version. Across P16, P17, and P18, those embedding-enhanced workflows did not consistently improve performance. My guess is that I was doing something wrong there, so I took them out until I understand the reason.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aLF7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aLF7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 424w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 848w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 1272w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aLF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png" width="1214" height="733" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:733,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:228266,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aLF7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 424w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 848w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 1272w, https://substackcdn.com/image/fetch/$s_!aLF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3b84b7-cfe2-468e-a39e-7821ca2f12ca_1214x733.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find the notebook in my GitHub repository <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260507-tabpfn-tabicl-volatility-regime-forecasting-2.assets/tabpfn-tabicl-volatility-regime-forecasting-20260507.ipynb">here</a> or you can also <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-volatility-forecasting-20260507">clone it directly</a> on Kaggle. The notebook is meant to be run with GPU enabled.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Discussion and results</h2><h3>Technical changes</h3><p>The target and final holdout are kept stable. The target is still future 20-trading-day SPY realized volatility crossing an 80th percentile threshold estimated only from data through 2017-12-29. The threshold is 0.200603 annualized realized volatility, and the final holdout is still 2020-01-02 to 2026-04-02.</p><p>The important changes are:</p><ul><li><p>The post-context tuning window now starts in 2010 instead of 2012. That gives optional benchmark tuning more high-volatility examples.</p></li><li><p>The volatility-domain baseline set is stronger. P18 used a narrower baseline set; this version reports VIX close, VIX z-score, close-to-close realized volatility, Parkinson realized volatility, Garman-Klass realized volatility, and a VIX-minus-realized-volatility gap rule. These rules have the following definitions:</p><ul><li><p><code>VIX close</code>: the market&#8217;s implied volatility gauge for the S&amp;P 500. It is a natural forward-looking risk baseline.</p></li><li><p><code>VIX z-score 252d</code>: VIX relative to its own one-year history. This asks whether implied volatility is unusually high compared with its recent context.</p></li><li><p><code>SPY realized volatility 20d</code>: recent close-to-close volatility of SPY. This asks whether volatility has already been elevated.</p></li><li><p><code>SPY Parkinson realized volatility 20d</code>: a high-low range based volatility estimate. It uses intraday range information rather than only close-to-close moves.</p></li><li><p><code>SPY Garman-Klass realized volatility 20d</code>: another range-based volatility estimate that uses open, high, low, and close information.</p></li><li><p><code>VIX/realized volatility gap 20d</code>: the spread between implied and recently realized volatility. In this run, that particular construction is not well aligned with the future high-volatility target.</p></li></ul></li><li><p>The comparison is now direct: TabPFN, TabICL, raw all-history XGBoost, and domain rules.</p></li><li><p>The notebook reports threshold sensitivity, named market-regime behavior, score-decile checks, runtime, calibration, uncertainty, drift/leakage checks, and a risk-control diagnostic.</p></li><li><p>The feature policy also stays close to P18 but expands slightly with the additional volatility features.</p></li></ul><p>The split is:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1o7Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1o7Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 424w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 848w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 1272w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1o7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png" width="1408" height="713" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:713,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129071,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1o7Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 424w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 848w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 1272w, https://substackcdn.com/image/fetch/$s_!1o7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6c4c71f-9f59-484c-a823-eb52eabbdfb9_1408x713.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This split matters because a business user does not care whether a model only works in a quiet 2012-2017 slice. The score has to survive a holdout that includes COVID, the inflation/rate shock, quiet years, and the more recent 2024-2026 period.</p><h3>Main holdout result</h3><p>The holdout base rate is 23.3%. Average Precision is the main ranking metric because this is an alert-prioritization problem: sort dates from highest expected risk to lowest expected risk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OBpd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OBpd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 424w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 848w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 1272w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OBpd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png" width="1411" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OBpd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 424w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 848w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 1272w, https://substackcdn.com/image/fetch/$s_!OBpd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e9ce7f4-2322-49d9-83e8-d098ea8b27c3_1411x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The business interpretation is:</p><ol><li><p>Direct TabPFN has the best broad ranking point estimate.</p></li><li><p>VIX close is almost as useful and costs essentially nothing.</p></li><li><p>TabICL is competitive and faster than TabPFN in this run.</p></li><li><p>XGBoost is respectable but expensive because the workflow includes randomized hyperparameter search.</p></li><li><p>The VIX-realized-volatility gap is a useful negative control: plausible finance features still need validation.</p></li></ol><p>Although the foundation models and the classical ML benchmark perform well at this task, performance alone would not justify their use in production if simple domain rules provide nearly the same ranking value at almost no additional cost. The learned models need to improve the alert queue, add complementary coverage, improve probability quality, or support a better downstream decision.</p><h3>Precision-recall curve</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p1Lw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p1Lw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 424w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 848w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 1272w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p1Lw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p1Lw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 424w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 848w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 1272w, https://substackcdn.com/image/fetch/$s_!p1Lw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5348d48-5d03-4f08-a8ce-fb77fcb84734_1511x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The precision-recall curve shows the performance visually. Direct TabPFN, VIX close, Parkinson volatility, Garman-Klass volatility, TabICL direct, XGBoost, close-to-close realized volatility, and VIX z-score all sit well above the random baseline for much of the useful recall range. The curve shows that there are several viable risk-scoring candidates; the operating-point table shows what the queue would actually look like.</p><h3>Alert queue impact</h3><p>The strongest business view is the alert queue.</p><p>At 50% recall, direct TabPFN captures half of future high-volatility rows with the shortest queue:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!goA8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!goA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 424w, https://substackcdn.com/image/fetch/$s_!goA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 848w, https://substackcdn.com/image/fetch/$s_!goA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 1272w, https://substackcdn.com/image/fetch/$s_!goA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!goA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png" width="1412" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!goA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 424w, https://substackcdn.com/image/fetch/$s_!goA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 848w, https://substackcdn.com/image/fetch/$s_!goA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 1272w, https://substackcdn.com/image/fetch/$s_!goA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e49e250-ef71-401b-a733-7ab3ce358c55_1412x408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That is a practical point in TabPFN&#8217;s favor. A smaller queue means fewer dates requiring review for the same captured share of future high-volatility events.</p><p>At 80% recall, the story changes:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nnSk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nnSk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 424w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 848w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 1272w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nnSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png" width="1410" height="539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:539,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nnSk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 424w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 848w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 1272w, https://substackcdn.com/image/fetch/$s_!nnSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F339a5988-7ef9-4af9-bc9e-6e1af2770757_1410x539.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At high recall, VIX is still slightly more efficient than TabPFN. The difference between VIX close and TabPFN is only seven alerts, so I would not overstate it. The practical message is that the simple implied-volatility signal remains extremely hard to beat when the business goal is broad risk coverage.</p><p>At 90% recall, VIX z-score has the shortest queue: 751 alerts with precision 0.439. That is a reminder that &#8220;best model&#8221; depends on the operating requirement. If the business goal is to capture nearly all high-volatility periods, a normalized VIX signal can be more useful than the model with the best overall AP.</p><h3>Threshold sensitivity</h3><p>A production definition of &#8220;high volatility&#8221; is not sacred. Risk teams may care about moderately elevated volatility, severe tail volatility, or multiple tiers.</p><p>The notebook tests this by holding scores fixed and changing the high-volatility threshold from the 70th to the 90th percentile. This is not a refit. It asks whether the same scores still rank risk well under alternate definitions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kExJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kExJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 424w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 848w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 1272w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kExJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png" width="1411" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:411,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65752,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kExJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 424w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 848w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 1272w, https://substackcdn.com/image/fetch/$s_!kExJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91a3e76-e507-4a51-87f8-48c68a817a89_1411x411.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is valuable from a business perspective. TabPFN is robust across the 70th to 85th percentile definitions. TabICL becomes strongest at the stricter 90th percentile definition. That suggests TabICL may be especially worth inspecting for more severe tail-risk definitions, even though TabPFN is stronger at the primary threshold.</p><h3>Market regime behavior</h3><p>The 2020-forward holdout is not one market. It contains different regimes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BKDO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BKDO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 424w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 848w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 1272w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BKDO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png" width="1410" height="521" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:521,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87301,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BKDO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 424w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 848w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 1272w, https://substackcdn.com/image/fetch/$s_!BKDO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a5ed00d-b9f8-4b9d-9adc-5e5b352a5f79_1410x521.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The quiet 2023 regime is intentionally absent from this table because it has no positive high-volatility rows under the primary target definition, so AP and ROC AUC are undefined. That absence is itself informative: some market blocks do not contain enough target events to support a meaningful model comparison.</p><p>This is probably the most important risk-management lesson in the run. Model quality is regime-dependent.</p><p>In 2020, a VIX z-score is highly informative because market-implied volatility was extreme relative to its own history. In 2022, range-based realized-volatility estimators work very well because the high-volatility label dominates the year. In the later 2024-2026 period, where the positive rate is much lower, direct TabPFN has the best AP.</p><p>For a business user, this means the model should not be monitored only with one full-holdout AP number. It should be monitored by market state. A risk score that works in crisis conditions may not be the best score in quiet conditions, and the reverse can also be true.</p><h3>Score-decile sanity check</h3><p>A risk score should concentrate future high-volatility events in its highest score buckets. The decile check makes that visible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zy2U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zy2U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 424w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 848w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 1272w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zy2U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png" width="1413" height="476" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:476,&quot;width&quot;:1413,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66790,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zy2U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 424w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 848w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 1272w, https://substackcdn.com/image/fetch/$s_!Zy2U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe07fb1c2-179f-4627-af77-603d37cec1a9_1413x476.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the kind of diagnostic one might want in a risk dashboard. The top bucket should be meaningfully different from the bottom bucket. Most of the strong models pass that test. The VIX-realized-volatility gap fails it badly in this construction, which explains its poor AP and ROC AUC.</p><h3>Runtime and complexity</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KaGB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KaGB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 424w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 848w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 1272w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KaGB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png" width="1427" height="872" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:872,&quot;width&quot;:1427,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KaGB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 424w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 848w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 1272w, https://substackcdn.com/image/fetch/$s_!KaGB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fb572c-a2a0-4054-b095-b14d0af3c75a_1427x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Runtime should be interpreted carefully. This is notebook workflow time, not a controlled benchmark. Still, the cost signal is useful.</p><p>The domain rules are effectively free. VIX close gives AP 0.633 with no model fitting. Parkinson volatility gives AP 0.628 with no model fitting. That is a serious business baseline.</p><p>Direct TabICL takes 32.0 seconds in this run and gives AP 0.627. Direct TabPFN takes 69.4 seconds and gives AP 0.664. Those are reasonable workflow costs for an offline research notebook.</p><p>The raw all-history XGBoost incumbent takes 1592.9 seconds because it includes randomized hyperparameter search. It gives AP 0.611. That does not mean XGBoost is bad. It means that in this notebook configuration, the extra tuning cost does not buy a stronger holdout ranking than the simpler domain rules or direct TabPFN/TabICL.</p><h3>Calibration and probability quality</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Jpn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Jpn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 424w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 848w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 1272w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Jpn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png" width="1369" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1369,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98891,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Jpn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 424w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 848w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 1272w, https://substackcdn.com/image/fetch/$s_!7Jpn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e599daf-4cbd-4ed6-818b-2fde429122d4_1369x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ranking and probability quality are different. A model can sort high-risk dates well while still producing probabilities that are too high or too low.</p><p>For the selected model configurations that output probabilities:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eRlP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eRlP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 424w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 848w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 1272w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eRlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png" width="1410" height="275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10e06a9a-0141-4918-be56-606d702024c3_1410x275.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:275,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38241,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eRlP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 424w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 848w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 1272w, https://substackcdn.com/image/fetch/$s_!eRlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10e06a9a-0141-4918-be56-606d702024c3_1410x275.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Lower is better for these diagnostics. The probability diagnostics are reasonably close across the learned models. I would still treat these scores primarily as ranking and risk-prioritization tools unless a downstream workflow explicitly needs calibrated probabilities.</p><h3>Risk-control diagnostic</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jFz6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jFz6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 424w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 848w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 1272w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jFz6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230871,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jFz6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 424w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 848w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 1272w, https://substackcdn.com/image/fetch/$s_!jFz6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4693c8f7-f02b-4272-a032-3bf4d73ee8e5_1515x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This section is not a trading strategy. It is a decision diagnostic.</p><p>The notebook asks: if a score marks a date as high risk, what happens if exposure is reduced according to a pre-specified rule? The selected policy uses a 20% target action rate and a 0.5 high-risk weight. The threshold is chosen from the calibration window, so the realized action rate in the holdout can differ from 20%. Transaction costs are included at 1 basis point.</p><p>The result shows the tradeoff clearly:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M7UZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M7UZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 424w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 848w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 1272w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M7UZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png" width="1409" height="539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:539,&quot;width&quot;:1409,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85521,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M7UZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 424w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 848w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 1272w, https://substackcdn.com/image/fetch/$s_!M7UZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29c835cf-51d7-4237-b861-fbcae8e8ca3a_1409x539.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Buy-and-hold has the highest final equity in this period, but it also has the largest drawdown. Several risk-score policies reduce drawdown materially while giving up some final equity. That is the actual business tradeoff: risk control versus participation.</p><p>The best Sharpe in the selected policy comes from VIX z-score, not from TabPFN. TabPFN gives a more compact classification queue at 50% recall and the highest AP point estimate, but risk-policy utility is not identical to classification ranking.</p><h3>Uncertainty</h3><p>The notebook uses 300 calendar-month block-bootstrap samples because the 20-day target windows overlap and market regimes cluster in time.</p><p>The AP confidence intervals overlap heavily:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PxPg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PxPg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 424w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 848w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 1272w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PxPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png" width="1411" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74250,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196908084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PxPg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 424w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 848w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 1272w, https://substackcdn.com/image/fetch/$s_!PxPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a104354-c433-49c3-88a1-c499ec873c61_1411x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This limits the claim. I would say TabPFN has the best AP point estimate in this run. I would not say it is conclusively better than VIX, TabICL, XGBoost, or the strongest realized-volatility rules.</p><h2>Summary and conclusion</h2><p>This post is less about proving that one model wins and more about understanding how a volatility-regime score could create business value.</p><p>Direct TabPFN looks promising as a compact risk-ranking model. It has the strongest holdout AP point estimate, the most compact 50% recall queue, and good score-decile separation. TabICL is also worth attention: it is competitive at the primary threshold, faster than TabPFN in this run, and strongest at the 90th percentile sensitivity threshold.</p><p>XGBoost is a useful classical incumbent but not the best business tradeoff in this notebook configuration. Its AP is respectable, but the randomized-search workflow cost is high relative to the value added over domain rules and direct TabPFN/TabICL scoring.</p><p>VIX and realized-volatility rules remain essential baselines. VIX is transparent, economically meaningful, free to compute, and nearly tied with TabPFN at the 80% recall operating point. VIX z-score is especially useful at very high recall and in the risk-control diagnostic.</p><p>The risk-control diagnostic shows that useful classification scores do not automatically translate into the best portfolio-style tradeoff, and the regime diagnostics show that the full holdout is not one stable market state. Moreover, no single metric is enough. AP, alert queues, threshold sensitivity, regime behavior, calibration, runtime, and risk-control diagnostics each answer a different business question.</p><p>The conclusion is that tabular foundation models can add value in a finance risk-scoring workflow, but only when they are evaluated next to simple domain rules, operating constraints, regime shifts, and decision diagnostics.</p>]]></content:encoded></item><item><title><![CDATA[[P18] Volatility-Regime Forecasting with TabPFN, TabICL, and Classical Tabular Models]]></title><description><![CDATA[Can we identify market states where future risk is unusually high?]]></description><link>https://newsletter.dsaiengineering.com/p/p18-volatility-regime-forecasting-tabpfn-tabicl--classical-tabular-models</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p18-volatility-regime-forecasting-tabpfn-tabicl--classical-tabular-models</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Wed, 06 May 2026 22:28:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PR7X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models (TFMs). So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo's classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN's predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>, using TabPFN for causal inference in <a href="https://open.substack.com/pub/dsaiengineering/p/p13-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P13</a>, comparing TabPFN, TabICL, and supervised ML models in <a href="https://open.substack.com/pub/dsaiengineering/p/p14-tabular-foundation-models-comparing?utm_campaign=post-expanded-share&amp;utm_medium=web">P14</a>, using TabPFN and TabICL directly for fraud detection in <a href="https://open.substack.com/pub/dsaiengineering/p/p15-tabpfn-and-tabicl-for-fraud-detection?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P15</a>, and using TabPFN and TabICL embeddings in an XGBoost fraud workflow in <a href="https://open.substack.com/pub/dsaiengineering/p/p16-tabpfn-and-tabicl-embeddings-fraud-detection-workflows?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P16</a> and in <a href="https://open.substack.com/pub/dsaiengineering/p/p17-tabpfn-and-tabicl-embeddings-for-fraud-detection-workflows?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P17</a>.</p><p>In this post, I am turning the direction of this series toward quantitative finance. I am starting with volatility-regime forecasting because it is a natural bridge between supervised tabular ML and a practical finance workflow. The question is:</p><blockquote><p>Can we identify market states where future risk is unusually high?</p></blockquote><p>More precisely, I build a supervised tabular workflow that uses information available at the close of date <code>t</code> to score whether SPY&#8217;s realized volatility over the next 20 trading days will be high. This is not a price forecast, a trading signal, or an investment recommendation. It is a workflow test: can TabPFN, TabICL, XGBoost, TFM embeddings, and simple volatility-domain rules produce useful high-volatility rankings under a chronological market-regime split?</p><p>The main result is that direct TabPFN and TabICL look more useful as scoring models than the appended-embedding workflows look as feature generators for XGBoost. In this run, direct TabPFN gives the strongest holdout Average Precision point estimate, VIX remains a very strong domain baseline, and the downstream XGBoost embedding variants do not improve ranking under substantial market-regime shift.</p><p>The second point is about evaluation. Tabular foundation models should be evaluated as workflow components, not only as model names on a leaderboard. Direct prediction, offline embeddings, calibration, runtime, and risk-control thresholds can tell different stories. This notebook is part of my learning-by-building process, and the goal is to make those workflow tradeoffs visible in a quantitative-finance setting.</p><p>Because this is the first version of the workflow, I frame the results as exploratory evidence under regime shift, not as a definitive benchmark or trading claim.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PR7X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PR7X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 424w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 848w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 1272w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PR7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png" width="1094" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1094,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:199446,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PR7X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 424w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 848w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 1272w, https://substackcdn.com/image/fetch/$s_!PR7X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6495d8c7-4248-48bc-85ea-aed623da6bf6_1094x784.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find the notebook in my GitHub repository <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260506-tabpfn-tabicl-volatility-regime-forecasting.assets/saved-versions/tabpfn-tabicl-volatility-forecasting-20260506-v3.ipynb">here</a> or you can also <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-volatility-forecasting-20260506-v3">clone it directly</a> on Kaggle. The notebook is meant to be run with GPU enabled.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Background and scope</h2><p>To follow this notebook, the most useful references from earlier posts are P4, P10, P12, P14, P15, P16, and P17.</p><p>P4 introduced the posterior predictive distribution viewpoint. That remains useful here because TabPFN and TabICL are not trained from scratch in the same way as XGBoost. They use labelled rows as context and produce predictions for query rows. P10 introduced row embeddings, which are reused here as offline features. P12 showed how a time-indexed problem can be converted into a tabular prediction problem. P14 compared TabPFN, TabICL, and standard supervised ML models. P15 to P17 moved the discussion toward practical workflow tests: chronological splits, rare-event ranking, alert queues, calibration, runtime, and embedding-enhanced XGBoost workflows.</p><p>Today&#8217;s notebook keeps that workflow-testing style but changes the domain. Instead of fraud detection, the task is volatility-regime forecasting. The models are still used in supervised tabular form, but the data now comes from public market, volatility, macro, and credit series. The evaluation is also more finance-specific: VIX, the Cboe Volatility Index that summarizes market-implied 30-day volatility expectations for the S&amp;P 500, and recent realized volatility become substantive domain baselines. The split must respect time, and the final diagnostic asks whether scores can support a simple risk-control rule. That rule is only a diagnostic. It is not a trading recommendation.</p><h3>Volatility-regime forecasting as tabular classification</h3><p>The raw financial object is a price series. Let \(P_t\) be the adjusted close price of SPY on trading day \(t\). A common daily log return is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;r_t = \\log(P_t) - \\log(P_{t-1}).&quot;,&quot;id&quot;:&quot;TNLNBZCQWM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Volatility is the variability of returns. In this notebook, the target is based on future realized volatility over the next \(H=20\) trading days. A simple way to write the future realized volatility from signal date \(t\) is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;RV_{t,H}\n\n= \\sqrt{252}\n\n\\sqrt{\n\n\\frac{1}{H-1}\n\n\\sum_{h=1}^{H}\n\n\\left(r_{t+h} - \\bar{r}_{t,H}\\right)^2\n\n}.&quot;,&quot;id&quot;:&quot;XTDAQQIWZC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(H=20\) is the forecast horizon, and \(\bar{r}_{t,H}\) is the average of the future returns \(r_{t+1}, \ldots, r_{t+H}\). The multiplier \(\sqrt{252}\) annualizes daily volatility under the usual convention of about 252 trading days per year. This matches the notebook&#8217;s use of a rolling standard deviation of future log returns. Some finance texts define realized volatility from sums of squared returns without demeaning; this post uses the rolling-standard-deviation convention because that is what the notebook computes. Because this formula uses future returns, \(RV_{t,H}\) is not a feature. It is the target we are trying to forecast.</p><p>The notebook turns this into a binary classification problem. Let \(q\) be a high-volatility threshold estimated only from pre-calibration history. Then the label is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t =\n\n\\mathbf{1}\\{RV_{t,20} \\geq q\\}.&quot;,&quot;id&quot;:&quot;VYFCKBLVWG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(y_t=1\) means that the next 20 trading days fall into a high-volatility regime, and \(y_t=0\) means they do not. The indicator \(\mathbf{1}\{\cdot\}\) returns 1 when the condition is true and 0 otherwise. The notebook uses \(\geq q\), so rows exactly at the threshold are included in the high-volatility class.</p><p>The supervised learning table then has the familiar form:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}\n\n= \\{(x_t, y_t)\\}_{t=1}^{n}.&quot;,&quot;id&quot;:&quot;OYRKFYYMTW&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{D}\) is the labelled tabular dataset and \(n\) is the number of rows. The row \(x_t\) contains features available at or before the close of date \(t\). Examples include recent returns, recent realized volatility, drawdowns, moving-average distances, VIX, interest rates, credit spreads, and cross-asset relationships. The label \(y_t\) uses future SPY realized volatility and is only used for training and evaluation.</p><p>This is similar to the time-series framing in P12: the sequence problem is converted into a supervised tabular problem. The difference is that P12 predicted future values directly, while today&#8217;s notebook predicts whether a future risk measure crosses a regime threshold.</p><h3>Why this is not a price forecast</h3><p>A price forecast asks for a predicted future price or return, for example: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{P}_{t+H}\n\n\\quad \\text{or} \\quad\n\n\\hat{r}_{t+H}.&quot;,&quot;id&quot;:&quot;LYWMKSLIJK&quot;}" data-component-name="LatexBlockToDOM"></div><p>That is not what this notebook does. It asks for a score: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_t \\approx \\mathbb{P}(y_t=1 | x_t, \\mathcal{D}_{\\text{train}}).&quot;,&quot;id&quot;:&quot;OIQFDFQPQM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(s_t\) is the model score for date \(t\), \(\mathbb{P}(\cdot)\) denotes probability, and \(\mathcal{D}_{\text{train}}\) is the training subset of the labelled table. The score is an estimate of the probability, or at least the rank-order risk, that the next 20 trading days will be high-volatility. This distinction matters. A high score does not say that SPY will go up or down. It says that the row has features associated with higher future realized volatility in the context of the fitted workflow.</p><p>This is why the evaluation focuses on ranking, calibration, and operating points rather than directional price accuracy. A useful volatility-regime score should put many future high-volatility dates near the top of the ranked list, and if it is interpreted as a probability, its probability scale should also be checked.</p><h3>Chronological validation and leakage</h3><p>In ordinary tabular demos, it is common to use random train-test splits. That would be inappropriate here. Financial data is time ordered, market regimes change, and future information must not leak into past decisions.</p><p>The core rule is:</p><blockquote><p>At signal date \(t\), features may use information available at or before \(t\), but the label uses returns after \(t\).</p></blockquote><p>So the notebook uses chronological windows rather than random splits. The earlier windows provide context, tuning data, and calibration data. The final 2020-forward period is held out as the future-facing evaluation window.</p><p>This design handles several separate questions:</p><ul><li><p>Can the model learn from earlier market conditions and generalize to later conditions?</p></li><li><p>Does the high-volatility threshold come from pre-holdout history rather than from the final holdout?</p></li><li><p>Are TabPFN and TabICL context labels kept separate from downstream XGBoost labels when embeddings are used?</p></li><li><p>Does calibration use a separate window rather than the same rows used for final holdout evaluation?</p></li></ul><p>This does not remove every limitation. The future 20-day realized-volatility targets overlap across nearby dates. For example, the target for date \(t\) and the target for date \(t+1\) share many future return observations. That is why independent-and-identically-distributed, or IID, assumptions are weak here and why the notebook later uses calendar-month block bootstrap for uncertainty.</p><h3>Ranking metrics and operating points</h3><p>The positive class is not fraud-rare in the same way as the credit-card fraud dataset in P15 to P17, but this is still an alert-style problem. The model produces a score for each date, and we sort dates from highest predicted risk to lowest predicted risk.</p><p>Precision asks:</p><blockquote><p>Among dates flagged as high risk, what fraction were actually followed by high realized volatility?</p></blockquote><p>Recall asks:</p><blockquote><p>Among all future high-volatility dates, what fraction did the score identify within the flagged set?</p></blockquote><p>Average Precision, or AP, summarizes the precision-recall curve. It is useful because it evaluates the ranked list across many possible thresholds. ROC AUC is also reported. ROC AUC asks how often a randomly chosen high-volatility row is ranked above a randomly chosen low-volatility row. I still treat AP and operating-point metrics as more directly relevant here because the output is naturally used as a ranked risk queue.</p><p>Operating points ask more concrete questions:</p><ul><li><p>If I inspect the top 10% or top 20% highest-risk dates, how much future high volatility do I capture?</p></li><li><p>If I want 80% recall, how many dates must be placed into the alert queue?</p></li><li><p>If I want 90% recall, does the queue become too broad to be useful?</p></li></ul><p>These questions are important because two models can have similar AP but behave differently at the threshold a practitioner actually uses.</p><h3>Calibration is different from ranking</h3><p>Ranking quality and probability quality are different. A model can sort high-risk dates above low-risk dates while still producing probability values that are too high or too low.</p><p>If the score is only used for ordering, monotonic transformations do not matter much. If the score is interpreted as: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{P}(y_t=1 | x_t),&quot;,&quot;id&quot;:&quot;FTEIJXPPOO&quot;}" data-component-name="LatexBlockToDOM"></div><p>then calibration matters. A calibrated model should have the property that, among rows receiving scores near 0.30, roughly 30% should actually be positive. The notebook therefore reports probability diagnostics such as Brier score, log loss, and expected calibration error.</p><p>Brier score is the mean squared error of predicted probabilities: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{1}{n}\\sum_{t=1}^{n}(s_t - y_t)^2.&quot;,&quot;id&quot;:&quot;RTKCWKCINW&quot;}" data-component-name="LatexBlockToDOM"></div><p>Log loss penalizes confident wrong probabilities more sharply: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;-\\frac{1}{n}\\sum_{t=1}^{n}\n\n\\left[\n\ny_t \\log(s_t) + (1-y_t)\\log(1-s_t)\n\n\\right].&quot;,&quot;id&quot;:&quot;OUBCHBQWKM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Expected calibration error, or ECE, bins predictions by score and compares average predicted probability with observed event frequency inside each bin. The notebook reports ECE with 10 score bins. These are probability-quality diagnostics, not replacements for AP.</p><p>For Brier score, log loss, and ECE, lower values are better. For AP and ROC AUC, higher values are better.</p><h3>Classical supervised ML versus tabular foundation models</h3><p>For a classical supervised model such as XGBoost, fitting means learning a task-specific model from the current training table. Conceptually, this is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n\n= \\arg\\min_{f \\in \\mathcal{F}}\n\n\\sum_{(x_t,y_t)\\in \\mathcal{D}_{\\text{train}}}\n\n\\ell(y_t, f(x_t)).&quot;,&quot;id&quot;:&quot;NELVQKZSFF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{F}\) is the model class, \(\ell\) is the training loss, and \(\hat{f}\) is the fitted model. Hyperparameter tuning then selects a configuration using validation data.</p><p>For TabPFN and TabICL, the mental model is different. The large model is pretrained. The labelled rows in <code>.fit()</code> are better understood as task context rather than as a full weight-training dataset. Using notation closer to P4, the prediction is closer to:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y_{\\text{new}} | x_{\\text{new}}, X_{\\text{context}}, y_{\\text{context}}),&quot;,&quot;id&quot;:&quot;QAIAJPKALU&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(x_{\text{new}}\) is a new query row, \(y_{\text{new}}\) is its unknown label, \(X_{\text{context}}\) is the context feature matrix, and \(y_{\text{context}}\) is the corresponding context-label vector. This is the same posterior-predictive-distribution idea discussed in P4, adapted to the volatility-regime label used here.</p><p>This gives two possible workflow roles for tabular foundation models:</p><ul><li><p>Direct scorer: TabPFN or TabICL receives labelled context rows and directly predicts high-volatility probabilities or scores for later rows.</p></li><li><p>Representation generator: TabPFN or TabICL produces row embeddings, and those embeddings are appended to raw features for a downstream model such as XGBoost.</p></li></ul><p>The second role is the same integration pattern explored in P16 and P17. It is not guaranteed to help. Embeddings may add useful information, but they may also be redundant with the raw features, interact poorly with a small or quiet tuning window, or make the downstream feature space harder to tune. That is why this notebook evaluates direct TFM scoring and embedding-enhanced XGBoost as separate workflow components.</p><h3>What is in scope today</h3><p>This post is a first volatility-regime workflow test. It is in scope to ask whether direct TabPFN, direct TabICL, raw XGBoost, embedding-enhanced XGBoost, VIX, and recent realized volatility produce useful high-volatility rankings under a chronological holdout.</p><p>It is also in scope to inspect probability calibration, runtime, uncertainty, feature drift, leakage checks, and a simple risk-control diagnostic.</p><p>It is not in scope to claim a trading strategy, forecast exact future prices, optimize a portfolio, or produce a definitive benchmark for all financial datasets. The useful contribution is narrower: a reproducible workflow for testing tabular foundation models as components in a realistic volatility-regime forecasting pipeline.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Code and results</h2><p>With the scope in place, this section walks through the notebook outputs. The practical question is:</p><blockquote><p>Given the information available at the close of date <code>t</code>, can a model score whether SPY&#8217;s realized volatility over the next 20 trading days will fall into a high-volatility regime?</p></blockquote><p>I discuss the outputs as a workflow test under regime shift. The notebook compares direct TabPFN and TabICL scoring, XGBoost on raw features, XGBoost on raw features plus TabPFN or TabICL embeddings, simple volatility-domain rules, calibration diagnostics, operating-point behavior, uncertainty, runtime, risk-control diagnostics, drift, and leakage checks.</p><h3>1. Data and target</h3><p>The target asset is SPY. The feature set uses public daily data from yfinance, VIX history from Cboe, and macro/credit series from FRED. The market features include SPY, QQQ, IWM, TLT, IEF, GLD, HYG, EEM, and VNQ. The macro and credit inputs include 10-year Treasury yield, 2-year Treasury yield, the 10y-2y yield curve, high-yield OAS, investment-grade OAS, and the effective federal funds rate.</p><p>This public-data choice matters. It makes the notebook reproducible, but it is not the same thing as an institutional point-in-time market data system. So I read the results as a research workflow demonstration rather than as a deployable market-data pipeline.</p><p>The target is a binary high-volatility label. The notebook computes SPY&#8217;s future 20-trading-day realized volatility, annualizes it, and labels a row as high-volatility if that future realized volatility is at least a pre-holdout threshold. The threshold is estimated only from history ending on 2017-12-29. In this run, the threshold is the 80th percentile of pre-calibration history, which gives:</p><ul><li><p>Target asset: SPY.</p></li><li><p>Forecast horizon: 20 trading days.</p></li><li><p>High-volatility threshold: 0.200602 annualized realized volatility.</p></li><li><p>Threshold source window: data available through 2017-12-29.</p></li><li><p>Rows after target filtering: 5,094.</p></li></ul><p>This thresholding policy is important because it prevents the 2020-forward holdout from defining what &#8220;high volatility&#8221; means. The holdout can then be used as a future-facing test period rather than as part of the label-design procedure. This is a practical approximation for a public-data notebook, not a claim that this threshold would be the right operational definition in every volatility workflow.</p><h3>2. Feature table</h3><p>The feature table is daily and indexed by signal date. The notebook builds features that would be available at or before the signal date: returns, realized volatility, downside volatility, drawdowns, moving-average distances, volume changes, VIX features, rates, credit-spread features, and cross-asset correlations.</p><p>The feature policy is intentionally broad but audited. The notebook starts with 297 candidate features. Seven sparse credit-spread-derived features are dropped because they do not have enough usable pre-calibration history. The retained feature table has:</p><ul><li><p>290 base features.</p></li><li><p>284 missingness indicators.</p></li><li><p>574 final model features.</p></li></ul><p>The missingness indicators are part of the feature policy rather than a cosmetic detail. Public macro and credit series do not all start at the same time or update with the same density. If missingness itself is informative, the model can use those indicators. If missingness is mainly a data artifact, later leakage and drift checks help keep that visible.</p><h3>3. Chronological split design</h3><p>The split design is central to the notebook because volatility forecasting is not an iid random-split problem.</p><p>The notebook uses these chronological windows:</p><ul><li><p>TabPFN/TabICL representation context: 2006-01-03 to 2011-12-30, with 1,511 rows and 36.0% high-volatility rows.</p></li><li><p>Fair downstream tuning window: 2012-01-03 to 2017-12-29, with 1,509 rows and 4.0% high-volatility rows.</p></li><li><p>Calibration window: 2018-01-02 to 2019-12-31, with 503 rows and 22.3% high-volatility rows.</p></li><li><p>Holdout: 2020-01-02 to 2026-04-02, with 1,571 rows and 23.3% high-volatility rows.</p></li><li><p>Raw all-history final training window: 2006-01-03 to 2019-12-31, with 3,523 rows and 20.3% high-volatility rows.</p></li></ul><p>The split rates already tell us why this is a hard experiment. The fair downstream tuning window is unusually quiet: only 60 of 1,509 rows are high-volatility rows. The holdout then includes COVID, the 2022 inflation/rate shock, the post-2022 lower-volatility period, and the 2025/early-2026 episodes. A model selected in a quiet 2012-2017 validation setting may not transfer cleanly into a noisier 2020-forward regime.</p><p>The representation-context split also keeps the embedding comparison honest. TabPFN and TabICL use the early context labels to condition their representations. Therefore, the fair downstream XGBoost workflows using those embeddings do not reuse that same context window as downstream XGBoost training labels. The raw all-history XGBoost model is reported separately because it answers a different practical question: what if a classical model simply uses all pre-holdout labels instead of reserving some of them for a TFM context?</p><h3>4. Model configurations</h3><p>The notebook compares three kinds of scorers.</p><p>First, it includes simple domain rules:</p><ul><li><p><code>Rule[VIX close]</code>, which uses VIX as a volatility-domain score.</p></li><li><p><code>Rule[SPY realized volatility 20d]</code>, which uses recent SPY realized volatility as a score.</p></li></ul><p>Second, it includes XGBoost workflows:</p><ul><li><p><code>XGBoost[Raw]</code>, trained on the fair raw feature set.</p></li><li><p><code>XGBoost[Raw all-history incumbent]</code>, a stronger raw-feature reference that uses all pre-holdout labels.</p></li><li><p><code>XGBoost[Raw + TabPFN embeddings]</code>, with 574 raw features plus 192 TabPFN embedding columns, for 766 total features.</p></li><li><p><code>XGBoost[Raw + TabICL embeddings]</code>, with 574 raw features plus 512 TabICL embedding columns, for 1,086 total features.</p></li></ul><p>Third, it includes direct TFM scoring:</p><ul><li><p><code>TabPFN[Direct all-history]</code>.</p></li><li><p><code>TabICL[Direct all-history]</code>.</p></li></ul><p>This distinction is important. Direct TFM scoring and TFM embeddings answer different workflow questions. Direct scoring asks whether TabPFN or TabICL can act as the classifier. The embedding path asks whether a TFM representation improves a downstream XGBoost model. This run gives different answers to those two questions.</p><h3>5. Tuning and validation caveat</h3><p>The XGBoost workflows use randomized hyperparameter search with chronological validation folds. This is better aligned with the time-ordered nature of the problem than random cross-validation. However, the fair raw-vs-embedding path has only one usable chronological fold after the positive-count checks. That fold has 920 training rows with 26 high-volatility rows and 196 validation rows with 33 high-volatility rows.</p><p>That is the main experimental caveat in this section. The run is internally coherent, but the fair XGBoost model-selection signal is fragile. I would not read the XGBoost rankings as a definitive statement about all possible XGBoost configurations. I read them as the result of this specific, audited workflow under a difficult regime shift.</p><h3>6. Main holdout results</h3><p>The final holdout has 1,571 rows and 366 high-volatility rows, so the high-volatility base rate is 23.3%. Average Precision is the main ranking metric because this is naturally an alert-style problem: the score is useful if it pushes future high-volatility periods toward the top of the ranked list.</p><p>The main holdout results are:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZUro!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZUro!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 424w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 848w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 1272w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZUro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png" width="1412" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133008,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZUro!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 424w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 848w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 1272w, https://substackcdn.com/image/fetch/$s_!ZUro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d57230-2ca0-4370-86e3-3884735bf7ca_1412x826.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The result is not &#8220;TFM embeddings improve XGBoost.&#8221; In this run, they do not. The more useful result is that direct TabPFN, direct TabICL, and simple volatility-domain scores are competitive with the tuned XGBoost workflows, and in some views they are ahead of them.</p><p>The clearest positive result is direct TabPFN&#8217;s ranking quality. It has the highest AP and the highest Top 20% Recall. The most important baseline result is VIX. VIX has the highest ROC AUC and the most efficient 80% recall queue overall. This is a useful reminder that, in a finance workflow, a simple domain signal can be a substantive baseline.</p><p>The direct TabICL result is also useful, but less dominant. Its AP is essentially tied with the raw all-history XGBoost incumbent, while its runtime is much lower. The embedding-enhanced XGBoost result is the clearest non-improvement result: adding TabPFN or TabICL row embeddings to raw features lowers downstream XGBoost AP in this holdout.</p><h3>7. Precision-recall curves</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MyXA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MyXA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 424w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 848w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 1272w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MyXA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png" width="1423" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1423,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MyXA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 424w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 848w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 1272w, https://substackcdn.com/image/fetch/$s_!MyXA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7b78152-9b21-47f9-8b24-c08e87612a59_1423x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This figure shows precision on the vertical axis and recall on the horizontal axis. The dashed horizontal line is the holdout base rate, about 0.233. A model above that line is ranking high-volatility rows better than random ordering.</p><p>The first thing to notice is that all models are above the base-rate line for much of the curve. This suggests that the scores contain useful ranking signal in this holdout, rather than behaving like random ordering.</p><p>The second thing to notice is the grouping. Direct TabPFN, VIX, raw all-history XGBoost, TabICL direct, and recent SPY realized volatility form the stronger group. They are not separated by a huge margin, but they sit above the fair raw XGBoost and the embedding-enhanced XGBoost workflows through much of the useful recall range.</p><p>The third thing to notice is the embedding result. If TabPFN or TabICL embeddings were improving XGBoost as general-purpose features in this workflow, I would expect their curves to move above raw XGBoost or at least approach the stronger group. Instead, the embedding curves sit lower. The TabPFN embedding workflow has some early precision strength, but it does not hold up through the middle of the curve. The TabICL embedding workflow has the lowest AP in this run.</p><p>This figure is why I would frame the experiment as a workflow-component test. Direct TFM scores can be useful even when offline TFM embeddings do not help a downstream tree model.</p><h3>8. Operating points</h3><p>AP summarizes the whole ranked list, but an actual alerting or risk-control workflow usually operates at a finite queue size. That is why the notebook also reports target-recall and fixed-top-rate views.</p><p>At 50% recall, the most efficient queues are:</p><ul><li><p>TabPFN direct: 271 alerts, precision 0.675.</p></li><li><p>XGBoost raw all-history: 273 alerts, precision 0.670.</p></li><li><p>VIX rule: 282 alerts, precision 0.649.</p></li></ul><p>At this moderate-recall operating point, TabPFN direct, raw all-history XGBoost, and VIX are very close. A practitioner choosing among them would probably care about stability, calibration, cost, and interpretability, not only the two-alert difference between the first two rows.</p><p>At 80% recall, the ordering changes:</p><ul><li><p>VIX rule: 611 alerts, precision 0.480.</p></li><li><p>TabPFN direct: 639 alerts, precision 0.459.</p></li><li><p>SPY realized-volatility rule: 659 alerts, precision 0.445.</p></li><li><p>XGBoost raw all-history: 725 alerts, precision 0.404.</p></li></ul><p>Here VIX is the most efficient score. This is an important practical result. Direct TabPFN has the best AP, but VIX gives the shortest 80% recall queue.</p><p>At 90% recall, VIX has the shortest queue among the reported scores:</p><ul><li><p>VIX rule: 865 alerts, precision 0.382.</p></li><li><p>XGBoost raw all-history: 1037 alerts, precision 0.318.</p></li><li><p>TabPFN direct: 1061 alerts, precision 0.311.</p></li><li><p>TabICL direct: 1065 alerts, precision 0.310.</p></li></ul><p>The fixed top-rate view tells a similar story but emphasizes TabPFN&#8217;s broad ranking strength:</p><ul><li><p>At the top 5% of holdout rows, TabPFN direct, TabICL direct, and XGBoost raw plus TabPFN embeddings each reach 18.0% recall.</p></li><li><p>At the top 10%, TabPFN direct leads with 33.1% recall.</p></li><li><p>At the top 20%, TabPFN direct leads with 56.0% recall, narrowly ahead of VIX and raw all-history XGBoost.</p></li><li><p>At the top 30%, TabPFN direct leads with 71.9% recall, narrowly ahead of the recent-SPY-realized-volatility rule.</p></li></ul><p>So the operating-point conclusion is more nuanced than the AP ranking. Direct TabPFN has the best broad-ranking point estimate in this run. VIX is the strongest high-recall queue baseline. The best model depends on the operating point.</p><h3>9. Uncertainty</h3><p>The notebook uses 300 calendar-month block-bootstrap iterations. This is more appropriate than IID row bootstrap here because the 20-day target windows overlap and market regimes are temporally clustered.</p><p>For AP, the point estimates and 95% intervals are:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q55J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q55J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 424w, https://substackcdn.com/image/fetch/$s_!q55J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 848w, https://substackcdn.com/image/fetch/$s_!q55J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 1272w, https://substackcdn.com/image/fetch/$s_!q55J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q55J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png" width="1456" height="639" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8556524b-394d-4512-9e05-794429b380b7_1609x706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:639,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q55J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 424w, https://substackcdn.com/image/fetch/$s_!q55J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 848w, https://substackcdn.com/image/fetch/$s_!q55J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 1272w, https://substackcdn.com/image/fetch/$s_!q55J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8556524b-394d-4512-9e05-794429b380b7_1609x706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The intervals overlap heavily among the stronger models. That does not erase the point-estimate story, but it limits the claim. I would not say that TabPFN &#8220;wins&#8221; as a definitive benchmark result. The more defensible statement is that TabPFN direct has the strongest point AP in this run, while VIX, TabICL direct, and raw all-history XGBoost are close enough that the result should be treated as exploratory.</p><p>The embedding result is still meaningful in a workflow sense. The embedding-enhanced XGBoost variants do not show a clear downstream ranking benefit here, and they cost more than raw XGBoost.</p><h3>10. Year-by-year behavior</h3><p>The yearly breakdown explains why a single holdout AP number is not enough.</p><p>High-volatility labels are concentrated unevenly:</p><ul><li><p>2020: 131 high-volatility rows out of 253.</p></li><li><p>2021: 3 high-volatility rows out of 252.</p></li><li><p>2022: 178 high-volatility rows out of 251.</p></li><li><p>2023: 0 high-volatility rows out of 250.</p></li><li><p>2024: 14 high-volatility rows out of 252.</p></li><li><p>2025: 38 high-volatility rows out of 250.</p></li><li><p>2026 partial: 2 high-volatility rows out of 63.</p></li></ul><p>This is not a stable IID classification problem. Entire years can be mostly high-volatility, mostly quiet, or impossible to evaluate with AP and ROC AUC because there are no positive labels.</p><p>The yearly AP leaders also change:</p><ul><li><p>In 2020, XGBoost raw plus TabICL embeddings has the highest AP at 0.778.</p></li><li><p>In 2021, there are only 3 positives; raw XGBoost has the highest AP at 0.071, but the sample is too small for a strong conclusion.</p></li><li><p>In 2022, raw XGBoost leads AP at 0.842, with VIX close behind at 0.826.</p></li><li><p>In 2023, there are no positives, so AP and ROC AUC are undefined.</p></li><li><p>In 2024, there are only 14 positives; VIX has the highest AP at 0.062.</p></li><li><p>In 2025, raw all-history XGBoost has the highest AP at 0.563.</p></li><li><p>In the partial 2026 window, raw all-history XGBoost has the highest AP at 0.171, but there are only 2 positives.</p></li></ul><p>This is one of the most useful scientific lessons from the notebook. The full-holdout AP aggregates several different market regimes. A model can look useful overall while behaving differently across the COVID shock, the 2022 inflation/rate shock, and quieter years. That is why I read this as a regime-sensitivity workflow, not a final model ranking.</p><h3>11. Runtime versus ranking quality</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CBHr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CBHr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 424w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 848w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 1272w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CBHr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png" width="1428" height="872" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:872,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CBHr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 424w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 848w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 1272w, https://substackcdn.com/image/fetch/$s_!CBHr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7083e540-039a-46a3-8e3b-8554522b22c6_1428x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The runtime plot is a workflow-cost diagnostic, not a controlled timing benchmark. It puts holdout AP on the vertical axis and total workflow seconds on the horizontal axis, so the most attractive region is the upper-left: higher ranking quality with lower measured workflow cost.</p><p>The caveat is that the timings are not all produced under the same validation workload. In this run, the raw all-history XGBoost incumbent used three chronological validation folds, while the fair raw-vs-embedding XGBoost workflows had only one usable fold after the positive-count checks. Direct TabPFN and direct TabICL were not tuned through rolling CV in this notebook; each was fitted once on the all-history pre-holdout context and then scored on holdout. A fair timing benchmark would rerun comparable workflows under a matched evaluation policy. For this post, I read the figure as a record of the cost of this specific notebook execution, alongside the ranking results.</p><p>Under that interpretation, the domain rules remain important. They sit at essentially zero workflow time and are strong, especially VIX with AP 0.633 and ROC AUC 0.832. This reinforces the earlier point that a simple market-implied volatility signal is a substantive baseline.</p><p>The direct TFM points are also useful in this view, with that one-fit interpretation in mind. TabICL direct finishes in 25.6 seconds with AP 0.612. TabPFN direct takes 61.0 seconds and has the highest AP point estimate at 0.655. These numbers are not meant to be a universal speed comparison against every possible XGBoost setup, but they show that direct TFM scoring is not expensive in this particular workflow.</p><p>The embedding-enhanced XGBoost points are less attractive in this run because they add feature width without improving holdout ranking. TabPFN embeddings add 192 columns, and TabICL embeddings add 512 columns. The final feature sets grow to 766 and 1,086 columns respectively, but the larger feature matrices do not improve AP in the 2020-forward holdout.</p><p>So the practical takeaway is measured rather than absolute: in this volatility-regime workflow, direct TFM scoring looks more promising than using TFM row embeddings as appended features for downstream XGBoost, while the exact runtime comparison should be revisited with a matched fold policy.</p><h3>12. Calibration and probability quality</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O0cX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O0cX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 424w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 848w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 1272w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O0cX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png" width="1382" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1382,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:187402,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O0cX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 424w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 848w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 1272w, https://substackcdn.com/image/fetch/$s_!O0cX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69a6ce-d8de-4f8c-b0d4-85bae27fb56e_1382x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The calibration figure plots mean predicted probability against observed high-volatility rate by score bin. The diagonal line is perfect calibration: if the model says 30%, about 30% of those rows should be positive.</p><p>The direct TFM probability diagnostics are relatively clean among the reported configurations:</p><ul><li><p>TabPFN direct: Brier 0.125, log loss 0.406, ECE 0.039.</p></li><li><p>TabICL direct: Brier 0.134, log loss 0.427, ECE 0.041.</p></li><li><p>XGBoost raw all-history: Brier 0.133, log loss 0.424, ECE 0.040.</p></li><li><p>XGBoost raw: Brier 0.188, log loss 0.601, ECE 0.154.</p></li><li><p>XGBoost raw plus TabPFN embeddings: Brier 0.167, log loss 0.701, ECE 0.105.</p></li><li><p>XGBoost raw plus TabICL embeddings: Brier 0.194, log loss 0.638, ECE 0.163.</p></li></ul><p>The main calibration issue is the fair XGBoost probability scale. The calibration-quality view shows that sigmoid calibration improves those probability diagnostics. For raw XGBoost, ECE improves from 0.203 to 0.093 and log loss improves from 0.735 to 0.432 on the holdout rows in the calibration-quality comparison. For raw plus TabPFN embeddings, ECE improves from 0.162 to 0.094. For raw plus TabICL embeddings, ECE improves from 0.189 to 0.095.</p><p>However, sigmoid calibration is monotonic, so it does not change the ranking. It can make the scores more interpretable as probabilities, but it does not change a lower AP curve into a higher AP curve. This is another example of why the notebook evaluates the workflow through multiple lenses: ranking quality, probability quality, runtime, and operating thresholds can tell different stories.</p><h3>13. Risk-control diagnostic</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!62d_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!62d_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 424w, https://substackcdn.com/image/fetch/$s_!62d_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 848w, https://substackcdn.com/image/fetch/$s_!62d_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 1272w, https://substackcdn.com/image/fetch/$s_!62d_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!62d_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png" width="1456" height="868" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:868,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:237428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196713235?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!62d_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 424w, https://substackcdn.com/image/fetch/$s_!62d_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 848w, https://substackcdn.com/image/fetch/$s_!62d_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 1272w, https://substackcdn.com/image/fetch/$s_!62d_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c9bc184-43ea-4984-9901-2236e4a79502_1528x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This figure needs careful framing. It is not evidence of a trading strategy. It is a diagnostic that asks whether a high-volatility score can support a simple pre-specified risk-control rule.</p><p>The policy uses calibration-window score quantiles to define high-risk periods. On holdout, the policy reduces SPY exposure during periods scored as high risk. The selected view discussed below uses the notebook&#8217;s 20% target action-rate setting and a 0.5 high-risk weight, meaning the policy holds half SPY exposure during periods classified as high risk. The figure compares growth of 1.0 before taxes, with transaction costs included in the notebook&#8217;s policy accounting.</p><p>For learned models in this diagnostic, the selected policy rows use calibration-base score variants where needed because the calibration window is reserved for threshold selection. I shorten the labels below for readability.</p><p>Buy-and-hold has the highest final equity at 2.219, but it also has the largest max drawdown at -0.337 and Sharpe 0.726. Several score-driven risk-control policies have lower final equity but smoother paths:</p><ul><li><p>XGBoost raw plus TabICL embeddings: Sharpe 0.888, final equity 2.155, max drawdown -0.277.</p></li><li><p>XGBoost raw plus TabPFN embeddings: Sharpe 0.838, final equity 2.098, max drawdown -0.263.</p></li><li><p>XGBoost raw: Sharpe 0.824, final equity 2.030, max drawdown -0.255.</p></li><li><p>TabPFN direct: Sharpe 0.817, final equity 1.843, max drawdown -0.199.</p></li><li><p>TabICL direct: Sharpe 0.807, final equity 1.835, max drawdown -0.199.</p></li><li><p>VIX rule: Sharpe 0.742, final equity 1.690, max drawdown -0.199.</p></li></ul><p>This is interesting because the best high-volatility classifiers are not automatically the best risk-control policies. The policy result depends on threshold timing, exposure reduction, transaction costs, and the path of SPY returns, not only on classification AP. The right conclusion is not &#8220;use this as a strategy.&#8221; The right conclusion is that score utility can differ from classification ranking, so a production research workflow should test both.</p><h3>14. Drift and leakage checks</h3><p>The drift table confirms that the holdout is a real distribution-shift test. The notebook reports population stability index, or PSI, as a drift diagnostic. Larger PSI values indicate larger differences between the pre-holdout and holdout feature distributions. The largest drift features include:</p><ul><li><p>Computed 10y-2y yield curve: preholdout mean 1.333, holdout mean 0.240, PSI 5.42.</p></li><li><p>FRED 10y-2y yield curve: preholdout mean 1.333, holdout mean 0.240, PSI 5.41.</p></li><li><p>SPY-TLT 126-day correlation: preholdout mean -0.393, holdout mean -0.043, PSI 3.02.</p></li><li><p>Federal funds rate: preholdout mean 1.300, holdout mean 2.783, PSI 2.43.</p></li><li><p>QQQ-TLT 126-day correlation: preholdout mean -0.340, holdout mean -0.003, PSI 2.36.</p></li></ul><p>This drift is not incidental. The 2020-forward holdout contains a different macro environment from much of the pre-holdout period: COVID, the zero-rate aftermath, inflation, rate hikes, and changed stock-bond correlation behavior. That helps explain why validation and holdout behavior can diverge, and why simple market-implied signals like VIX remain strong.</p><p>The leakage checks are also mostly clean:</p><ul><li><p>Target columns are excluded from the feature set.</p></li><li><p>The high-volatility threshold is estimated before calibration and holdout.</p></li><li><p>The chronological split order passes.</p></li><li><p>The TabPFN/TabICL representation context is not reused as downstream XGBoost training labels in the fair embedding path.</p></li><li><p>The raw all-history XGBoost incumbent is reported separately from the fair raw-vs-embedding comparison.</p></li><li><p>Overlapping 20-day labels are recorded as a known limitation, which is why the uncertainty view uses calendar-month block bootstrap.</p></li><li><p>The risk-policy section is explicitly marked as educational and diagnostic, not an investment recommendation.</p></li></ul><p>This is the final reason I would keep the claims cautious. The notebook passes the checks needed for a clean public-data experiment, but the holdout is a hard regime-shift period and the target windows overlap. That makes the result scientifically useful, but not definitive.</p><h2>Known limitations</h2><p>The issues are experimental rather than mechanical:</p><ol><li><p>The fair raw-vs-embedding XGBoost path has only one usable chronological CV fold. This makes the downstream XGBoost tuning less stable than ideal.</p></li><li><p>The fair tuning window has a much lower high-volatility rate than the later calibration and holdout windows. This can create model-selection mismatch.</p></li><li><p>The raw all-history incumbent is not strictly fair to embedding workflows because it uses more labels, but it is an important practical baseline. I keep that distinction explicit rather than treating it as the same comparison as the fair raw-vs-embedding path.</p></li><li><p>Direct all-history TFM models and embedding-enhanced XGBoost models answer different questions. Direct TFM scores can be strong even if embeddings do not help XGBoost.</p></li><li><p>The risk-control section is path-dependent and should stay framed as a diagnostic. It should not be described as investment evidence.</p></li><li><p>Year-level metrics are unstable in years with very few positives. 2023 has zero positives, so AP and ROC AUC are undefined. 2021 and 2026 have too few positives for strong model comparisons.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ol><h2>Summary and conclusion</h2><p>This first volatility-regime notebook gives a useful but cautious result. Direct tabular foundation model scoring looks promising for this workflow, especially direct TabPFN. On the 2020-forward holdout, TabPFN direct has the strongest Average Precision point estimate and the strongest Top 20% Recall. Direct TabICL is also competitive with the raw all-history XGBoost incumbent while being much faster in this run.</p><p>The result is not a simple &#8220;foundation models beat classical ML&#8221; story. VIX remains a very strong domain baseline and is the most efficient score at some high-recall operating points. That is an important practical anchor. If a market-implied volatility signal is already strong, then any ML workflow has to justify its added complexity against that baseline.</p><p>The clearest non-improvement result is the embedding path. In this run, appending TabPFN or TabICL row embeddings to raw features does not improve downstream XGBoost ranking. That does not mean TFM embeddings are useless in general. It means that, in this volatility-regime setup, useful direct TFM scores do not automatically translate into useful offline features for a downstream tree model.</p><p>The broader lesson is that tabular foundation models should be evaluated as components inside a workflow. Direct scoring, embedding generation, calibration, runtime, alert thresholds, risk-policy diagnostics, and drift checks answer different questions. A model can look strong in one view and weaker in another. For this reason, I read today&#8217;s result as an exploratory workflow test under market-regime shift, not as a definitive benchmark or trading claim.</p><h2>Outlook</h2><p>This is the first attempt at this volatility-regime workflow, and it needs more work. The next step is to make the experiment more robust before drawing stronger conclusions.</p><p>The first improvement is validation depth. The fair raw-vs-embedding XGBoost path had only one usable chronological CV fold after the positive-count checks. I want to redesign the split or validation policy so that the downstream XGBoost comparison has a stronger model-selection signal.</p><p>The second improvement is regime analysis. The yearly results show that 2020, 2022, quiet years, and partial 2026 behave very differently. I want to inspect performance by market phase more carefully rather than relying mainly on full-holdout aggregates.</p><p>The third improvement is workflow realism. I want to add stronger feature-lag checks, richer drift diagnostics, sensitivity tests for the high-volatility threshold, and comparisons against more finance-standard volatility baselines. I also want to keep the risk-control section clearly diagnostic while making it more rigorous.</p><p>My plan is to continue iterating on this example until it becomes a reusable volatility-regime testbench for TabPFN, TabICL, and classical supervised tabular models.</p>]]></content:encoded></item><item><title><![CDATA[[P17] TabPFN and TabICL embeddings for fraud-detection workflows]]></title><description><![CDATA[Can offline row embeddings from TabPFN or TabICL improve a production-style XGBoost fraud-detection workflow enough to justify the extra representation step?]]></description><link>https://newsletter.dsaiengineering.com/p/p17-tabpfn-and-tabicl-embeddings-for-fraud-detection-workflows</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p17-tabpfn-and-tabicl-embeddings-for-fraud-detection-workflows</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Tue, 05 May 2026 17:47:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uuyR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo's classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN's predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>, using TabPFN for causal inference in <a href="https://open.substack.com/pub/dsaiengineering/p/p13-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P13</a>, comparing TabPFN, TabICL, and supervised ML models in <a href="https://open.substack.com/pub/dsaiengineering/p/p14-tabular-foundation-models-comparing?utm_campaign=post-expanded-share&amp;utm_medium=web">P14</a>, using TabPFN and TabICL directly for fraud detection in <a href="https://open.substack.com/pub/dsaiengineering/p/p15-tabpfn-and-tabicl-for-fraud-detection?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P15</a>, and using TabPFN and TabICL embeddings in an XGBoost fraud workflow in <a href="https://open.substack.com/pub/dsaiengineering/p/p16-tabpfn-and-tabicl-embeddings-fraud-detection-workflows?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P16</a>.</p><p>This post builds on that last idea. The question is not whether TabPFN or TabICL should replace XGBoost as the fraud model. The question is more practical:</p><blockquote><p>Can offline row embeddings from TabPFN or TabICL improve a production-style XGBoost fraud-detection workflow enough to justify the extra representation step?</p></blockquote><p>I am learning these models by building examples and testing workflow patterns, so I treat this as a workflow demonstration rather than a benchmark claim. The goal is to make the integration pattern, the possible benefits, and the caveats visible. That can be useful both for labs building tabular foundation models and for data and AI practitioners who want to understand how these models might fit into workflows they already know.</p><p>You can find the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/main/blog/20260505-tabpfn-tabicl-embeddings-fraud-det-workflows.assets/tabpfn-tabicl-fraud-detection-20260505.ipynb">in my GitHub repository</a>, and you can also <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-fraud-detection-20260505-v3">clone it directly on Kaggle</a>. The notebook is meant to be run on Kaggle with GPU enabled. It installs cuDF for pandas acceleration, uses CUDA for TabPFN and TabICL, and uses a CuPy-backed XGBoost path so that the downstream scorer also runs on GPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uuyR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uuyR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 424w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 848w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 1272w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uuyR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png" width="1285" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1285,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:181661,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196569214?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uuyR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 424w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 848w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 1272w, https://substackcdn.com/image/fetch/$s_!uuyR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F462c573d-380b-4c15-9e2c-dc7a4fe948b8_1285x764.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Background and scope</h2><p>To follow the work presented here, a few concepts from earlier posts are useful. I have discussed them in more detail before, so I will use this section to connect the current notebook to that background.</p><p>If you need a refresher on row embeddings from TabPFN, you can refer to <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>. If you want the broader comparison between TabPFN, TabICL, and standard supervised ML models, <a href="https://open.substack.com/pub/dsaiengineering/p/p14-tabular-foundation-models-comparing?utm_campaign=post-expanded-share&amp;utm_medium=web">P14</a> is the relevant reference. For the fraud-detection setup, rare-event metrics, and the first direct use of TabPFN and TabICL as fraud scorers, see <a href="https://open.substack.com/pub/dsaiengineering/p/p15-tabpfn-and-tabicl-for-fraud-detection?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P15</a>. For the earlier version of the embedding-plus-XGBoost workflow, including the motivation for using embeddings as offline representation features, see <a href="https://open.substack.com/pub/dsaiengineering/p/p16-tabpfn-and-tabicl-embeddings-fraud-detection-workflows?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P16</a>.</p><h3>What changed in this notebook</h3><p>The main integration pattern is the same one introduced in P16: TabPFN and TabICL are still used as offline representation generators, and XGBoost remains the downstream fraud scorer. The models themselves have not changed: the notebook still compares raw XGBoost, a raw all-history XGBoost incumbent, raw + TabPFN embeddings, and raw + TabICL embeddings.</p><p>The main changes are in the comparison design and the notebook structure.</p><p>First, the combined <code>Raw + TabPFN + TabICL</code> feature set is no longer part of the main comparison. This version evaluates one embedding source at a time. That makes the adoption question cleaner because using both embedding systems together would add a different level of operational cost and complexity.</p><p>Second, the tuning setup has been strengthened. Yesterday&#8217;s notebook intended to use chronological cross-validation, but the fair raw-vs-embedding path effectively had only one usable fold after the fraud-count checks. That made the tuning signal weaker than the design intended. In this version, the fair raw-vs-embedding comparison uses the downstream tuning history after excluding the representation-context rows, and it has five valid chronological folds. The raw all-history incumbent also uses five chronological folds, so the main model configurations are selected under a more consistent tuning protocol before the holdout results are inspected.</p><p>Third, the calibration section is organized more carefully. In this workflow, a calibrated model cannot use the calibration window for base model fitting, because that window is held out for fitting the post-hoc sigmoid calibration map. If I compare a calibrated configuration only against a fully trained uncalibrated configuration, two things change at once: the score transformation and the amount of data used to fit the base XGBoost model. This version therefore keeps calibration-base configurations separate from sigmoid-calibrated configurations. The calibration-base configuration shows the uncalibrated model trained on the same pre-calibration data as the calibrated model. The calibrated configuration then shows what changes after applying the sigmoid calibration step. That makes calibration easier to inspect as its own diagnostic.</p><p>Fourth, the notebook now leaves behind cleaner review artifacts: curated result tables, bootstrap uncertainty tables, provenance information, embedding matrix summaries, CUDA memory summaries, and final figures. These files are saved in the Kaggle output directory and are available to download as a zip file after the run completes. That makes the run easier to audit after execution instead of relying only on displayed notebook output.</p><h3>Evaluation setup reminders</h3><p>The evaluation logic follows the same principles discussed in P15 and P16. The public fraud dataset has a <code>Time</code> column, so the notebook uses chronological windows rather than random splitting. The percentage partitioning is the same as yesterday&#8217;s embedding workflow: earliest 20% as the source window for the TabPFN/TabICL representation context, next 40% for downstream XGBoost training, next 10% for validation, next 10% for calibration, and final 20% for holdout evaluation. The actual TabPFN/TabICL context passed to the embedding models is sampled from the earliest window, while the fair downstream comparison still excludes that full earliest window from XGBoost tuning and fitting. The important point is that embeddings for later rows are generated only from earlier labelled context rows.</p><p>The metric logic is also the same as before. Accuracy is not useful for this rare-event fraud dataset, so the notebook reports Average Precision and alert-queue metrics: top-alert recall, top-percent recall, and the number of alerts needed to reach target recall levels.</p><p>Calibration remains a diagnostic rather than the main comparison target. The difference in this notebook is not the definition of calibration, but the cleaner organization of calibration-base and sigmoid-calibrated configurations described above. The results section reports Brier score, log loss, ECE 10, and reliability artifacts where they help interpret probability quality. Here, ECE 10 means expected calibration error computed with 10 bins.</p><p>With that scope in place, the rest of the post focuses on the experimental results rather than re-explaining the earlier fraud-detection setup.</p><h2>Results</h2><h3>Comparison design</h3><p>The notebook compares four model configurations. <code>Raw XGBoost</code> is the standard supervised-learning baseline using the 29 raw model features. <code>Raw all-history XGBoost</code> is a stronger incumbent-style baseline: it still uses only raw features, but it can use all pre-holdout labelled history because it does not need to reserve an earlier window for TabPFN or TabICL context. <code>Raw + TabPFN</code> adds 192 TabPFN embedding features to the raw features, and <code>Raw + TabICL</code> adds 512 TabICL embedding features to the raw features.</p><p>This setup separates two practical questions. The raw baselines ask whether standard XGBoost is already strong enough, especially when it can use all available pre-holdout raw history. The embedding configurations ask whether a context-conditioned representation from TabPFN or TabICL adds useful information beyond those raw features.</p><p>The tuning design is meant to keep that comparison clear. The embedding configurations cannot use the representation-context labels again as downstream training labels, because those labels were already used to condition the embedding model. The fair raw-vs-embedding comparison therefore excludes the representation-context window from downstream tuning. The all-history raw incumbent is reported separately because it answers a different question: how competitive is a strong raw-feature XGBoost workflow if it simply uses more labelled history instead of adding a foundation-model representation step?</p><p>Compared with yesterday&#8217;s notebook, the important tuning change is qualitative rather than just numeric: model selection is no longer based on a single usable chronological fold. The fair raw-vs-embedding path and the raw all-history incumbent both use five valid chronological folds before the final holdout is inspected.</p><p>The next sections read the results through several views: full-holdout ranking, alert-queue behavior, uncertainty, runtime and memory, and calibration. I use those views together because no single metric captures the whole workflow tradeoff.</p><h3>Full-holdout results</h3><p>The full holdout is the deployment-facing view because it keeps the final-window fraud base rate. It contains 56,962 transactions and only 75 fraud cases, so small movements in the top of the ranking can change the interpretation.</p><p>The first question is whether the embedding features improve the overall fraud ranking. Average Precision is the main summary for that question. The compact result list is:</p><ul><li><p>Raw all-history XGBoost: AP 0.8097, workflow time 173.3 seconds, Top 0.5% Recall 0.8533, Top 1% Recall 0.8533, Brier 0.000424, ECE 10 0.000104.</p></li><li><p>Raw XGBoost: AP 0.8034, workflow time 147.0 seconds, Top 0.5% Recall 0.8400, Top 1% Recall 0.8667, Brier 0.000403, ECE 10 0.000231.</p></li><li><p>Raw + TabICL: AP 0.7970, workflow time 1147.8 seconds, Top 0.5% Recall 0.8267, Top 1% Recall 0.9067, Brier 0.000451, ECE 10 0.000485.</p></li><li><p>Raw + TabPFN: AP 0.7909, workflow time 840.2 seconds, Top 0.5% Recall 0.8000, Top 1% Recall 0.8400, Brier 0.000381, ECE 10 0.000184.</p></li></ul><p>I read these numbers in three layers.</p><p>First, the point AP ranking favors the raw-feature baselines. Raw all-history XGBoost is highest, and standard raw XGBoost is close behind. If the TabPFN or TabICL embeddings were adding a broadly useful ranking signal, I would expect one of the embedding configurations to move above the raw baselines on AP. That does not happen in this run.</p><p>Second, runtime changes the practical bar. Raw XGBoost finishes in about 147 seconds, while the embedding workflows take much longer because they add representation generation and wider downstream feature matrices. For an embedding workflow to be attractive in this kind of setting, I would want to see either a clear ranking improvement, a meaningful operating-point improvement, or some other production value that justifies the additional representation step.</p><p>Third, the metrics are seeing different things. TabICL has the best Top 1% recall even though it does not have the best AP. TabPFN has the best Brier score among these four configurations, but weaker ranking and alert behavior. That means the result is not a single-metric result where one configuration is best on every view. It separates ranking quality, alert-queue behavior, probability diagnostics, and engineering cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PoAt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PoAt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 424w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 848w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 1272w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PoAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png" width="1374" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1374,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75333,&quot;alt&quot;:&quot;Full-holdout precision-recall curves.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196569214?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full-holdout precision-recall curves." title="Full-holdout precision-recall curves." srcset="https://substackcdn.com/image/fetch/$s_!PoAt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 424w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 848w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 1272w, https://substackcdn.com/image/fetch/$s_!PoAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e7e5d5-790f-471c-8655-597f7622ccd3_1374x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full-holdout precision-recall curves.</figcaption></figure></div><p>The precision-recall figure shows the same story visually. The dashed horizontal line is the fraud base rate. Because fraud is rare, that line sits very low; a useful model should lift precision far above that line for the transactions it ranks near the top.</p><p>The four curves are all well above the base-rate line in the high-score region, which means all four XGBoost-based configurations are learning useful fraud rankings. But the curves are close to each other. There is no clean visual separation where an embedding curve stays clearly above both raw baselines across most of the recall range. If that had happened, it would be stronger evidence that the TabPFN or TabICL representation was improving the ranking broadly.</p><p>Instead, the figure suggests a narrower interpretation. The raw all-history and raw XGBoost curves are already strong. The embedding curves are competitive, but they do not visibly dominate. When curves are this close, the figure should not be read alone; the alert-budget results below are needed because fraud teams operate at specific review capacities, not across the whole precision-recall curve at once.</p><h3>Alert-budget interpretation</h3><p>The alert-budget view asks a more operational question than AP. Instead of summarizing the whole precision-recall curve, it asks how many transactions a team would need to review to recover a target share of the known fraud cases.</p><p>On this full holdout, 80% recall means finding 60 of the 75 fraud cases. At that target:</p><ul><li><p>Raw all-history XGBoost needs 94 alerts to find 60 frauds, with precision 0.6383.</p></li><li><p>Raw XGBoost needs 116 alerts to find 60 frauds, with precision 0.5172.</p></li><li><p>Raw + TabICL needs 127 alerts to find 60 frauds, with precision 0.4724.</p></li><li><p>Raw + TabPFN needs 199 alerts to find 60 frauds, with precision 0.3015.</p></li></ul><p>At this operating point, the raw all-history incumbent is the most efficient queue. This is what I would expect if the standard raw-feature workflow already captures most of the easy-to-rank fraud cases. If TabPFN or TabICL embeddings had helped strongly at this level, they would have reduced the alert count needed to find those same 60 frauds. They do not do that here.</p><p>The picture changes at 90% recall. Here the target is to find 68 of the 75 fraud cases, so the model has to rank deeper into the difficult part of the fraud set:</p><ul><li><p>Raw + TabICL needs 490 alerts to find 68 frauds, with precision 0.1388.</p></li><li><p>Raw all-history XGBoost needs 1,064 alerts to find 68 frauds, with precision 0.0639.</p></li><li><p>Raw XGBoost needs 1,177 alerts to find 68 frauds, with precision 0.0578.</p></li><li><p>Raw + TabPFN needs 1,598 alerts to find 68 frauds, with precision 0.0426.</p></li></ul><p>This is the main operational nuance. AP favors the raw all-history incumbent, but the 90% recall operating point favors TabICL. One way to read this is that TabICL does not improve the whole ranking enough to lead on AP, but it may move some additional fraud cases into a useful part of the high-recall review queue.</p><p>If the AP leader and the 90% recall leader had been the same model, the conclusion would be simpler. Here they differ. That is why I would not summarize this notebook with only AP. A fraud team targeting a compact 80% recall queue might prefer the raw all-history incumbent. A team targeting very high recall could reasonably investigate the TabICL queue behavior further.</p><p>For fixed alert budgets on the full holdout:</p><ul><li><p>At 100 alerts, Raw all-history XGBoost finds 60 frauds and reaches 80.0% recall.</p></li><li><p>At 500 alerts, Raw + TabICL finds 68 frauds and reaches 90.7% recall.</p></li><li><p>At 1000 alerts, Raw + TabICL still finds 68 frauds and remains at 90.7% recall.</p></li><li><p>At the top 0.5% of transactions, Raw all-history XGBoost finds 64 frauds and reaches 85.3% recall.</p></li><li><p>At the top 1% of transactions, Raw + TabICL finds 68 frauds and reaches 90.7% recall.</p></li></ul><p>These fixed-budget numbers give the same intuition from the opposite direction. If the team can only review about 100 transactions, raw all-history XGBoost gives the best result. If the team can review around 500 transactions or the top 1% of this holdout, TabICL finds more of the hard-to-catch fraud cases. AP, fixed alert budgets, and target-recall alert counts each answer a different question.</p><h3>Uncertainty</h3><p>The notebook uses bootstrap resampling on the full holdout to estimate uncertainty. This matters because the full holdout has only 75 fraud cases. With so few positives, a small number of transactions moving up or down the ranked list can change AP or alert-count estimates.</p><p>For full-holdout AP:</p><ul><li><p>Raw XGBoost has point AP 0.8034, bootstrap median 0.8032, and 95% interval from 0.6972 to 0.8709.</p></li><li><p>Raw all-history XGBoost has point AP 0.8097, bootstrap median 0.8140, and 95% interval from 0.7189 to 0.8947.</p></li><li><p>Raw + TabPFN has point AP 0.7909, bootstrap median 0.7905, and 95% interval from 0.7061 to 0.8687.</p></li><li><p>Raw + TabICL has point AP 0.7970, bootstrap median 0.7973, and 95% interval from 0.7063 to 0.8723.</p></li></ul><p>These intervals overlap heavily. That does not make the point estimates useless, but it does change the strength of the claim. My reading is that raw all-history XGBoost has the best AP point estimate, not that it is clearly separated from every other configuration. If the intervals had been well separated, I would be more comfortable making a stronger ranking claim.</p><p>For alerts needed at 90% recall:</p><ul><li><p>Raw XGBoost has point estimate 1,177, bootstrap median 1,230, and 95% interval from 212 to 9,822 alerts.</p></li><li><p>Raw all-history XGBoost has point estimate 1,064, bootstrap median 1,046, and 95% interval from 129 to 7,311 alerts.</p></li><li><p>Raw + TabPFN has point estimate 1,598, bootstrap median 1,598, and 95% interval from 451 to 7,972 alerts.</p></li><li><p>Raw + TabICL has point estimate 490, bootstrap median 495, and 95% interval from 254 to 3,449 alerts.</p></li></ul><p>The TabICL point estimate remains interesting because it is much lower than the raw baselines, but the intervals are still wide. I would not read this as a deployment recommendation. I read it as a signal that this operating point is worth a more careful follow-up test on richer and larger fraud data.</p><h3>Runtime and memory</h3><p>The runtime plot shows the engineering tradeoff:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D3Z6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D3Z6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 424w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 848w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 1272w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D3Z6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png" width="1419" height="872" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:872,&quot;width&quot;:1419,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72367,&quot;alt&quot;:&quot;Full-holdout runtime versus Average Precision.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196569214?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full-holdout runtime versus Average Precision." title="Full-holdout runtime versus Average Precision." srcset="https://substackcdn.com/image/fetch/$s_!D3Z6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 424w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 848w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 1272w, https://substackcdn.com/image/fetch/$s_!D3Z6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5e9d229-6dcc-4b6f-8bd4-f0ac2b916317_1419x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full-holdout runtime versus Average Precision.</figcaption></figure></div><p>In this figure, the vertical axis is AP and the horizontal axis is workflow time. The most attractive region is the upper-left: high AP with low runtime. A point far to the right needs a meaningful quality improvement to justify its extra cost.</p><p>The raw-feature models sit much closer to that attractive region. They are fast and have the best AP point estimates. The embedding workflows move far to the right because they add offline representation generation and wider downstream feature matrices. In this run, they do not move upward enough on AP to compensate.</p><p>The workflow times are:</p><ul><li><p>Raw XGBoost: 147.0 seconds.</p></li><li><p>Raw all-history XGBoost: 173.3 seconds.</p></li><li><p>Raw + TabPFN: 840.2 seconds, including 371.4 seconds of shared TabPFN embedding preparation.</p></li><li><p>Raw + TabICL: 1147.8 seconds, including 120.5 seconds of shared TabICL embedding preparation.</p></li></ul><p>One subtle point is that TabICL embedding preparation is faster than TabPFN embedding preparation in this run, but the full Raw + TabICL workflow is slower. The likely reason is the downstream XGBoost search over a wider feature matrix: TabICL contributes 512 embedding columns, while TabPFN contributes 192. This is an important workflow lesson. Embedding extraction time alone is not the whole cost. The downstream model also has to tune and fit on the expanded feature set.</p><p>The embedding matrix sizes also matter:</p><ul><li><p>TabPFN final training embeddings: 170,884 rows, 192 columns, about 125.2 MB.</p></li><li><p>TabPFN full-holdout embeddings: 56,962 rows, 192 columns, about 41.7 MB.</p></li><li><p>TabICL final training embeddings: 170,884 rows, 512 columns, about 333.8 MB.</p></li><li><p>TabICL full-holdout embeddings: 56,962 rows, 512 columns, about 111.3 MB.</p></li></ul><p>Both embedding paths fit on the Kaggle two-T4 GPU runtime used for the notebook. TabPFN used both CUDA devices and reached about 936.8 MB maximum allocated memory per device during extraction. TabICL used device 0 more heavily, reaching about 3657.3 MB maximum allocated memory and 4548.0 MB reserved memory after extraction.</p><p>For practitioners, the lesson is that representation quality should be judged together with representation cost. If the embedding point had moved clearly upward in the runtime plot, the extra cost might be easy to defend. Here the AP view does not justify the added cost by itself, so the main reason to investigate TabICL further is the high-recall operating-point behavior seen above.</p><h3>Calibration diagnostics</h3><p>Calibration remains diagnostic rather than a clear improvement. This section asks a different question from ranking: if the model gives a transaction a low or high fraud probability, should that score be interpreted as a calibrated probability?</p><p>The clean comparison is between each calibration-base model and its sigmoid-calibrated version. AP stays the same for those pairs because sigmoid calibration is monotonic: it changes the probability scale but does not change the ranking order.</p><p>The probability-quality picture is mixed:</p><ul><li><p>For raw XGBoost, sigmoid calibration keeps AP at 0.8047 but worsens Brier score, log loss, and ECE 10 relative to the raw calibration-base configuration.</p></li><li><p>For raw all-history XGBoost, sigmoid calibration keeps AP at 0.7997, improves Brier score, worsens log loss, and worsens ECE 10 relative to the raw all-history calibration-base configuration.</p></li><li><p>For Raw + TabPFN, sigmoid calibration keeps AP at 0.7918 but worsens Brier score, log loss, and ECE 10 relative to the TabPFN calibration-base configuration.</p></li><li><p>For Raw + TabICL, sigmoid calibration keeps AP at 0.7978 and improves Brier score and ECE 10 relative to the TabICL calibration-base configuration, but worsens log loss.</p></li><li><p>Among the uncalibrated configurations, Raw all-history XGBoost has the best ECE 10, while Raw + TabPFN has the best Brier score and log loss.</p></li></ul><p>That is why I do not treat calibration as a clear improvement in this notebook. If calibration had consistently improved Brier score, log loss, and ECE without harming the workflow, I would read it as useful probability cleanup. Here it improves some diagnostics for TabICL but not enough to make a broad claim.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P1iU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P1iU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 424w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 848w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 1272w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P1iU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png" width="1368" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1368,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159870,&quot;alt&quot;:&quot;Full-holdout calibration curves.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196569214?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full-holdout calibration curves." title="Full-holdout calibration curves." srcset="https://substackcdn.com/image/fetch/$s_!P1iU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 424w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 848w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 1272w, https://substackcdn.com/image/fetch/$s_!P1iU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221c12dc-2ef2-421b-800c-e429eda49bd0_1368x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full-holdout calibration curves.</figcaption></figure></div><p>The calibration figure is also harder to read than the precision-recall figure. Most scores and observed fraud rates are very close to zero because the event is so rare. Visually, that compresses the reliability curves near the origin. The diagonal reference line shows ideal calibration: predicted probability and observed fraud rate would match along that line.</p><p>The curves do not provide a clean visual story where one calibrated model clearly tracks the diagonal and the others clearly do not. For this reason, I treat the calibration plot as a warning to inspect probability quality, not as decisive evidence. The reliability-bin CSVs saved by the notebook are more useful than the figure when reviewing calibration in detail.</p><h3>Public-data constraints</h3><p>The leakage checks are included to keep the result in perspective. The notebook verifies the checks that are possible in this public dataset: the target is excluded from features, <code>Time</code> is used for chronological splitting, and <code>Time</code> is not used as a model feature in the default run.</p><p>The dataset is anonymized, so important production checks remain unavailable. I cannot test customer-level, card-level, merchant-level, or account-level leakage. I also cannot verify raw feature lineage or label availability timing. That means the results should be read as a workflow demonstration, not as a production fraud benchmark.</p><p>For a real fraud dataset, I would repeat this same workflow with entity-aware splits, delayed-label handling, feature timestamp checks, and drift monitoring before trusting the result.</p><h2>Known limitations</h2><p>The public-data constraints above are not the only caveats. There are also limits in this particular experiment design.</p><p>This is one public dataset, so I would not generalize the result to all fraud datasets, transaction workflows, or tabular foundation models.</p><p>The full holdout has only 75 fraud cases. The bootstrap intervals help, but they also show why point estimates should be interpreted cautiously.</p><p>The TabPFN and TabICL representation context is sampled from the earliest 20% source window rather than using every row in that window. The sampling keeps the rare fraud rows but caps the number of normal rows, so the context seen by the embedding models is smaller and more fraud-enriched than the full chronological source window. That could affect the learned row representations and may be one reason the embedding configurations do not improve AP here. This notebook does not isolate that factor, so I treat it as a hypothesis for follow-up rather than as an explanation proven by the run.</p><p>The TabICL embedding path uses model internals rather than a stable public embedding method comparable to TabPFN&#8217;s <code>get_embeddings</code>. That does not make the experiment invalid, but I would version-pin and review that code path.</p><p>I have not yet added interpretability methods such as SHAP, missing-data stress tests, categorical stress tests, drift-by-period analysis, or group-aware splitting. Those are important next steps for a broader testbench.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Summary and conclusion</h2><p>This notebook tests a practical integration pattern:</p><ol><li><p>Keep XGBoost as the downstream fraud scorer.</p></li><li><p>Use TabPFN or TabICL as an offline row-embedding generator.</p></li><li><p>Append those embeddings to raw transaction features.</p></li><li><p>Evaluate the result with chronological splits, AP, alert counts, runtime, memory, calibration diagnostics, and leakage checks.</p></li></ol><p>The result does not show tabular foundation model embeddings outperforming the raw baselines across the main summary view. Raw all-history XGBoost has the best full-holdout point AP, and raw XGBoost is close while being much faster. Single-source TabPFN and TabICL embeddings do not beat the raw baselines by AP in this run. The bootstrap intervals overlap heavily, so I read the AP ranking cautiously.</p><p>The useful nuance is the high-recall operating point. At 90% recall, Raw + TabICL needs 490 alerts to recover 68 of 75 fraud cases, compared with 1,064 alerts for raw all-history XGBoost and 1,177 alerts for raw XGBoost. That makes TabICL worth investigating for high-recall review-queue settings, even though it is not the best AP/runtime configuration overall.</p><p>My interpretation is that tabular foundation model embeddings are best evaluated as workflow components, not only as standalone model scores. The important question is not only whether an embedding configuration has the highest AP. It is whether the embedding improves a business-relevant operating point enough to justify the added representation path.</p><p>For practitioners, the reusable lesson is the evaluation design. A tabular foundation model embedding experiment needs comparison against a strong classical baseline, with time-aware splitting, alert-budget metrics, calibration diagnostics, runtime accounting, memory accounting, and leakage checks.</p><p>For labs and researchers, this kind of notebook can be useful as a field-facing test. It does not replace formal benchmarks, but it can show how model capabilities appear when inserted into workflows that data teams already understand.</p><h2>Outlook</h2><p>With this dataset, I am reaching the point where the next useful experiments may need richer transaction context than the public file provides. I plan to keep extending the workflow in directions that matter for real data science teams, including representation-context ablations, interpretability, missing-data behavior, categorical features, time-derived feature policy, drift by time period, and group-aware splitting when entity IDs are available. For the embedding workflow specifically, I would like to test larger context samples, different normal-row sampling policies, repeated context draws, and context choices that preserve the source-window class balance more closely when model limits allow it.</p><p>However, these experiments take time, and I may change direction if I find a better experiment or a more useful problem to work on. If this line of testing is useful to you, comments are a good place to tell me whether you want to see these experiments carried through and what parts of the workflow you think would be most worth testing next.</p><p>My current goal is not to prove that one model family is always better. It is to build reusable examples that make benefits, costs, and caveats visible enough for both model builders and practitioners to reason about them.</p>]]></content:encoded></item><item><title><![CDATA[[P16] TabPFN and TabICL embeddings for fraud-detection workflows]]></title><description><![CDATA[Enhancing XGBoost with embeddings from TabPFN and TabICL]]></description><link>https://newsletter.dsaiengineering.com/p/p16-tabpfn-and-tabicl-embeddings-fraud-detection-workflows</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p16-tabpfn-and-tabicl-embeddings-fraud-detection-workflows</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Mon, 04 May 2026 21:06:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ruf_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo's classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN's predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>, using TabPFN for causal inference in <a href="https://open.substack.com/pub/dsaiengineering/p/p13-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P13</a>, comparing TabPFN, TabICL, and supervised ML models in <a href="https://open.substack.com/pub/dsaiengineering/p/p14-tabular-foundation-models-comparing?utm_campaign=post-expanded-share&amp;utm_medium=web">P14</a>, and using TabPFN and TabICL directly for fraud detection in <a href="https://open.substack.com/pub/dsaiengineering/p/p15-tabpfn-and-tabicl-for-fraud-detection?r=535odk&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">P15</a>.<br><br>P15 used the public credit-card fraud dataset to compare TabPFN, TabICL, Logistic Regression, and XGBoost as direct fraud scorers. That is a useful first question, but it is not the most practical industry question. Many teams already have classical ML workflows in production, and tabular foundation models can be expensive at inference time. So the more relevant question is: can these models improve an existing workflow without replacing the production scorer?<br><br>This notebook is built around that question. TabPFN and TabICL are not used as direct fraud scorers. Instead, they are used as offline representation models. They see an earlier labelled context, generate row embeddings, and those embeddings are appended to the raw features. The downstream fraud scorer remains GPU-accelerated XGBoost. In other words, the notebook asks whether a practical classical fraud workflow becomes stronger when we add TabPFN and TabICL embeddings as additional features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ruf_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ruf_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 424w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 848w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 1272w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ruf_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png" width="1102" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:1102,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196470502?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ruf_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 424w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 848w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 1272w, https://substackcdn.com/image/fetch/$s_!ruf_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab8ba60-d352-46bd-978c-a8f6ffc3a290_1102x404.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/9011f9685c0273a01beec78abb8ed64e17643cd7/blog/20260504-tabpfn-tabicl-embeddings-fraud-det-workflows.assets/tabpfn-tabicl-fraud-detection-20260504.ipynb">in my GitHub repository</a>, and you can also <a href="https://www.kaggle.com/code/msaharan/tabpfn-tabicl-fraud-detection-20260504">clone it directly on Kaggle</a>. The notebook is meant to be run on Kaggle with GPU enabled. It installs cuDF for pandas acceleration, uses CUDA for TabPFN and TabICL, and uses a CuPy-backed XGBoost path so that the downstream scorer also runs on GPU.</p><h2>Conceptual background</h2><h3>The practical question</h3><p>In many tabular ML systems, especially in fraud, credit risk, churn, pricing, and transaction monitoring, the production model is often a classical supervised model. XGBoost, LightGBM, CatBoost, Random Forest, and Logistic Regression remain common because they are fast, stable, easy to monitor, and familiar to data teams.</p><p>Tabular foundation models are interesting because they can learn useful representations from tabular data. But for an industry team, replacing an existing production scorer is a high bar. A replacement model has to be better, fast enough, stable under drift, explainable enough for the decision process, compatible with existing feature stores and monitoring, and acceptable under licensing and infrastructure constraints.</p><p>The notebook uses a lower-friction integration pattern:</p><ol><li><p>Keep the production-style downstream model as XGBoost.</p></li><li><p>Use TabPFN and TabICL offline to generate row embeddings.</p></li><li><p>Append those embeddings to the raw transaction features.</p></li><li><p>Compare XGBoost trained on raw features against XGBoost trained on raw + embedding features.</p></li></ol><p>This is closer to how a team might pilot tabular foundation models without rebuilding its whole fraud system.</p><p>Before looking at the dataset and results, it is useful to make the integration point precise. The downstream model is still ordinary supervised ML. The tabular foundation model contribution is an extra representation step before XGBoost.</p><h3>Where tabular foundation model embeddings enter the supervised workflow</h3><p>For a supervised model such as XGBoost, the workflow is familiar. We start with raw features \(x_i\) for transaction \(i\), a binary label \(y_i \in \{0,1\}\), and train a task-specific model </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{p}_i = h_\\theta(x_i)&quot;,&quot;id&quot;:&quot;BCDQOUALSI&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(\hat{p}_i\) is the predicted fraud score or fraud probability, and \(\theta\) represents the parameters learned from this dataset.</p><p>That part is still present in this notebook. XGBoost remains the downstream supervised model. The new part is the feature-generation step before XGBoost.</p><p>A tabular foundation model (TFM) is a pretrained model intended to work across many tabular tasks. In this notebook, TabPFN and TabICL are given an earlier labelled context </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;C = \\{(x_j, y_j)\\}_{j=1}^{m}&quot;,&quot;id&quot;:&quot;CWCJLDEKNC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(j\) indexes the context rows and \(m\) is the number of rows in the context set. An embedding is a dense numerical vector that represents how a model internally encodes a row. The TFM maps a later row \(x_i\) into an embedding vector </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_i = f_{\\text{TFM}}(x_i; C)&quot;,&quot;id&quot;:&quot;EHZJIBHIMY&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(f_{\text{TFM}}\) is the embedding function after conditioning on \(C\), and \(z_i\) is the resulting representation produced for row \(i\). The downstream XGBoost model is then trained on an augmented feature vector: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{x}_i = [x_i, z_i]&quot;,&quot;id&quot;:&quot;GMUDSDBNMZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where the brackets mean feature concatenation. So the similarity to ordinary supervised ML is that we still train and evaluate a supervised fraud model with labels, validation data, calibration data, and a future holdout. The difference is that the model receives additional representation features from a pretrained context-conditioned tabular model. That embedding-generation capability is the part that comes from TabPFN and TabICL.</p><h3>Dataset</h3><p>The notebook uses a public copy of the credit-card fraud dataset loaded from TensorFlow&#8217;s storage bucket. It contains:</p><ul><li><p>284,807 transactions;</p></li><li><p>492 fraud transactions;</p></li><li><p>284,315 non-fraud transactions;</p></li><li><p>a fraud rate of about 0.1727%;</p></li><li><p>anonymized PCA-style features <code>V1</code> to <code>V28</code>;</p></li><li><p><code>Amount</code>;</p></li><li><p><code>Time</code>;</p></li><li><p>the binary target <code>Class</code>.</p></li></ul><p>This is not a perfect production dataset. The features are anonymized, and there are no customer, card, merchant, device, or account identifiers. That limits what we can test. For example, a production fraud model should usually check entity-level leakage, delayed labels, future aggregate features, and drift by customer or merchant segment. This public dataset does not expose enough raw business context for that.</p><p>Even with those limitations, it is useful for this notebook because it has the basic shape of a fraud problem: severe class imbalance, a time column, and a binary rare-event target.</p><h3>Fraud detection is a ranking problem before it is a classification problem</h3><p>Because the target is so rare, metric choice matters before model choice. A model can look good under a broad metric and still be useless for an alert queue.</p><p>Accuracy is not a useful headline metric here. A model that predicts &#8220;not fraud&#8221; for every transaction would be more than 99% accurate, but it would catch no fraud.</p><p>The more useful deployment question is whether the model ranks fraud cases near the top of the risk queue. This is why the notebook focuses on precision, recall, Average Precision, and alert counts.</p><p>For binary labels: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Precision} = \\dfrac{TP}{TP + FP}&quot;,&quot;id&quot;:&quot;KTZPADIFQX&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Recall} = \\dfrac{TP}{TP + FN}&quot;,&quot;id&quot;:&quot;DGBVLTLIJB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(TP\) means true positives, \(FP\) means false positives, and \(FN\) means false negatives.</p><p>Precision asks: among the transactions we flag, what fraction are truly fraud?</p><p>Recall asks: among all fraud transactions, what fraction did we catch?</p><p>Average Precision summarizes the precision-recall curve. One common way to write it is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;AP = \\sum_{k=1}^{K} (R_k - R_{k-1}) P_k&quot;,&quot;id&quot;:&quot;EXKTYOYRQL&quot;}" data-component-name="LatexBlockToDOM"></div><p>where the steps are ordered by increasing recall, \(K\) is the number of evaluated threshold steps, and \(P_k\) and \(R_k\) are precision and recall at step \(k\). In rare-event problems, Average Precision is usually more informative than ROC AUC because it focuses directly on the positive class and the quality of the alert queue.</p><p>Still, even Average Precision is not the end of the story. Fraud teams work with review capacity. A team may ask:</p><ul><li><p>How many alerts do we need to review to catch 80% of fraud cases?</p></li><li><p>How many alerts do we need to review to catch 90%?</p></li><li><p>At a fixed alert budget, how many fraud cases are found?</p></li><li><p>Does the model score behave like a calibrated probability, or only like a ranking score?</p></li></ul><p>The notebook therefore includes both model-quality metrics and operating-point tables.</p><h3>Chronological evaluation</h3><p>Good metrics are still not enough if the split does not match deployment. For fraud, the model is trained on past transactions and used on future transactions.</p><p>Random train-test splits can be misleading because the real use case is future prediction. So the notebook sorts the data by <code>Time</code> and uses chronological windows:</p><ul><li><p>Earliest 20%: TabPFN/TabICL representation context.</p></li><li><p>Next 40%: downstream XGBoost training window.</p></li><li><p>Next 10%: validation window for model selection.</p></li><li><p>Next 10%: calibration window.</p></li><li><p>Final 20%: full holdout.</p></li></ul><p>The representation-context window is important. TabPFN and TabICL need labelled rows to condition their embeddings. To avoid giving a row access to its own label, the TFM context is earlier than the downstream training, validation, calibration, and holdout rows.</p><p>The downstream model sees these feature sets:</p><ul><li><p>Raw: 29 features.</p></li><li><p>Raw + TabPFN embeddings: 221 features.</p></li><li><p>Raw + TabICL embeddings: 541 features.</p></li><li><p>Raw + TabPFN + TabICL embeddings: 733 features.</p></li></ul><p>The raw features exclude the target. In this default run, <code>Time</code> is used for splitting but not used as a model feature. That is a conservative default because <code>Time</code> can encode period-specific artifacts in public fraud datasets. A production team may use timestamp-derived features, but those features should be reviewed carefully.</p><h3>Why the training windows are sampled</h3><p>The full dataset has a very low fraud rate. If we train or condition a model on a small chronological slice without sampling, the positive class may be too sparse for useful learning or tuning. The notebook therefore uses fraud-enriched sampled windows for two parts of the workflow:</p><ul><li><p>the TFM representation context;</p></li><li><p>the fair raw-vs-embedding downstream training window.</p></li></ul><p>The sampling keeps all fraud rows in those windows and caps the number of normal rows. This is a case-control style design. It is common in rare-event modeling, but it must be interpreted correctly: the sampled training prevalence is not the deployment prevalence.</p><p>That is why validation, calibration, and the full holdout are kept at full prevalence. The final comparison is made on the future holdout with the real final-window base rate. The sampled training window also should not be confused with the final XGBoost fit size: after tuning, the uncalibrated fair raw-vs-embedding models are refit on the full downstream pre-holdout history for which embeddings were generated, excluding the earlier TFM context rows.</p><h3>Model choices</h3><p>With the representation, split, and sampling policy defined, the remaining question is what should score the transactions.</p><p>The notebook uses XGBoost as the production-style scorer. That choice is deliberate. XGBoost is a strong and widely used supervised model for tabular data, and it is a credible incumbent for fraud workflows.</p><p>The raw all-history incumbent is a separate XGBoost baseline that uses raw features only and can use every pre-holdout row, including the earlier representation-context rows. This is a stronger classical baseline than the fair raw-vs-embedding row because it asks whether using more ordinary historical data is already competitive with adding TFM embeddings.</p><p>TabPFN and TabICL are used as embedding generators rather than as final scorers. In this setup:</p><ul><li><p>TabPFN exposes a public embedding API through <code>get_embeddings</code>.</p></li><li><p>TabICL does not expose the same sklearn-level public embedding API, so the notebook extracts row representations through the fitted model&#8217;s internal representation-cache path and records the TabICL version.</p></li></ul><p>That TabICL detail is important. It means the TabICL embedding path is useful for exploration, but if someone uses it as benchmark infrastructure, the version should be pinned and reviewed.</p><p>The notebook also keeps Logistic Regression as an optional CPU benchmark, but it is not enabled in the default run. The main workflow uses GPU XGBoost because the industrial question is about a strong deployable scorer, and because sklearn Logistic Regression would make the notebook slower without answering the main question.</p><p>In implementation terms, the notebook does the following:</p><ol><li><p>Load the fraud CSV and validate that <code>Time</code> and <code>Class</code> are present.</p></li><li><p>Sort rows by <code>Time</code>.</p></li><li><p>Build the chronological context, training, validation, calibration, and holdout windows.</p></li><li><p>Extract TabPFN and TabICL embeddings from the earlier context into later windows.</p></li><li><p>Build raw and embedding-enhanced feature bundles.</p></li><li><p>Tune GPU XGBoost with chronological validation folds.</p></li><li><p>Refit the selected models, evaluate the full holdout, and inspect operating points, calibration, runtime, and leakage checks.</p></li></ol><h3>Hyperparameter tuning and calibration</h3><p>The notebook uses randomized hyperparameter search with chronological folds. Randomized search is used because it can explore a wider parameter space than a small manual grid without testing every possible combination. The folds preserve time order: each candidate is trained on earlier rows and validated on a later contiguous slice.</p><p>Because fraud is rare, the code also checks that each tuning fold has enough fraud rows in both the training and validation sides. If a fold has too few fraud cases, it is not a useful fold for model selection. In this run, the fair raw-vs-embedding tuning had only one valid chronological fold after those checks. That is a limitation, and I discuss it again in the outlook section.</p><p>The notebook also evaluates sigmoid-calibrated variants of XGBoost. Calibration is different from ranking. A model can rank fraud cases well but produce scores that should not be interpreted as probabilities.</p><p>Two probability-quality metrics are reported: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Brier score} = \\frac{1}{N}\\sum_{i=1}^{N}(\\hat{p}_i - y_i)^2&quot;,&quot;id&quot;:&quot;JNIWMIFNYS&quot;}" data-component-name="LatexBlockToDOM"></div><p>and </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Log loss} = -\\frac{1}{N}\\sum_{i=1}^{N}\\left[y_i\\log(\\hat{p}_i) + (1-y_i)\\log(1-\\hat{p}_i)\\right]&quot;,&quot;id&quot;:&quot;WZXYOOSLGO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(N\) is the number of evaluated rows, \(i\) indexes those rows, \(y_i\) is the true label, and \(\hat{p}_i\) is the model&#8217;s predicted fraud probability. Lower is better for both metrics. In a fraud workflow, calibration matters if the score is used as a probability for thresholds, policies, or downstream decisions. If the score is used only for ranking a queue, calibration is still useful to inspect but not necessarily the primary objective.</p><h2>Hands-on demo</h2><p>The rest of the post reports the completed notebook run and focuses on the full holdout because that is the closest view to deployment.</p><h3>Run integrity</h3><p>The completed notebook ran without execution errors. The environment used Python 3.12.12, pandas 2.3.3, cuDF 26.2.1, CuPy 14.0.1, scikit-learn 1.6.1, XGBoost 3.2.0, torch 2.10.0+cu128, TabPFN 7.1.1 (pip version), and TabICL 2.1.1. The run used two CUDA devices.</p><p>The notebook also checks basic data quality and leakage conditions. The target is excluded from the features, the chronological split is ordered by <code>Time</code>, and the default feature set does not include <code>Time</code>. The duplicate diagnostics are more nuanced: exact duplicate rows exist, but exact duplicate groups do not cross time windows. Model-feature duplicate groups do cross time windows, but the detected cross-window groups did not contain fraud rows. I treat this as a review item rather than a fatal leakage finding.</p><h3>Splits</h3><p>The actual row counts after chronological splitting and sampling were:</p><ul><li><p>TFM embedding context, sampled: 4,157 rows, 157 fraud rows, 3.7768% fraud rate.</p></li><li><p>Classical training window, sampled: 4,203 rows, 203 fraud rows, 4.8299% fraud rate.</p></li><li><p>Validation window, full prevalence: 28,480 rows, 24 fraud rows, 0.0843% fraud rate.</p></li><li><p>Calibration window, full prevalence: 28,481 rows, 33 fraud rows, 0.1159% fraud rate.</p></li><li><p>Test holdout, sampled: 20,075 rows, 75 fraud rows, 0.3736% fraud rate.</p></li><li><p>Test holdout, full prevalence: 56,962 rows, 75 fraud rows, 0.1317% fraud rate.</p></li></ul><p>The full holdout is the deployment-facing view because it keeps the natural final-window fraud rate. The sampled holdout is useful for quick inspection, but the main conclusions should come from the full holdout.</p><p>For scale, the fair raw-vs-embedding tuning matrix has 32,683 rows, and the fair uncalibrated final-training matrix has 170,884 rows. The raw all-history incumbent&#8217;s uncalibrated final-training matrix has 227,845 rows because it can use the representation-context window as ordinary raw-feature history.</p><h3>Full-holdout model quality</h3><p>On the full holdout, the main results were:</p><p>Here, <code>Workflow seconds</code> means the one-path time for the feature set: shared offline embedding preparation, when applicable, plus downstream XGBoost tuning, fitting, optional calibration, and prediction time.</p><ul><li><p>Raw + TabICL embeddings, no calibration: Average Precision 0.8128, workflow 236.6 seconds.</p></li><li><p>Raw all-history incumbent, no calibration: Average Precision 0.8097, workflow 160.4 seconds.</p></li><li><p>Raw + TabICL embeddings, sigmoid calibration: Average Precision 0.8046, workflow 249.0 seconds.</p></li><li><p>Raw + TabPFN + TabICL embeddings, no calibration: Average Precision 0.8029, workflow 640.4 seconds.</p></li><li><p>Raw, sigmoid calibration: Average Precision 0.7995, workflow 21.6 seconds.</p></li><li><p>Raw, no calibration: Average Precision 0.7850, workflow 21.0 seconds.</p></li><li><p>Raw + TabPFN embeddings, no calibration: Average Precision 0.7811, workflow 428.3 seconds.</p></li></ul><p>The main read is not that every embedding helps. The main read is more specific:</p><ul><li><p>TabICL embeddings improved XGBoost&#8217;s full-holdout AP over the fair raw baseline: 0.8128 versus 0.7850.</p></li><li><p>The raw all-history incumbent was also strong: 0.8097 AP.</p></li><li><p>TabPFN embeddings alone did not help in this run.</p></li><li><p>Combining TabPFN and TabICL was slower than using TabICL alone and did not improve the full-holdout AP.</p></li></ul><p>That makes the conclusion more practical than dramatic. The useful pattern in this run is not &#8220;add all foundation-model embeddings.&#8221; It is &#8220;TabICL embeddings were useful as additional features for XGBoost, but the raw all-history XGBoost incumbent remained very competitive.&#8221;</p><h3>Precision-recall curves</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BjAQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BjAQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 424w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 848w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BjAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png" width="790" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81038,&quot;alt&quot;:&quot;Full holdout precision-recall curves.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196470502?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full holdout precision-recall curves." title="Full holdout precision-recall curves." srcset="https://substackcdn.com/image/fetch/$s_!BjAQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 424w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 848w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BjAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3322b5ce-04a5-4f31-ab0a-b0d541dc1b0a_790x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full holdout precision-recall curves.</figcaption></figure></div><p>The precision-recall plot shows the top XGBoost variants on the full holdout. The dashed baseline is near zero because the fraud base rate is about 0.13% in the final window.</p><p>The curves are close, which matters. This is not a case where the embedding-enhanced model completely changes the problem. The best rows are all strong XGBoost variants. Still, the TabICL-enhanced curve is slightly better in AP than the raw all-history incumbent and the raw fair baseline.</p><p>The honest interpretation is this: TabICL embeddings improved the ranking signal in this run, but the improvement is incremental and should be judged against runtime, feature-generation complexity, and the strength of the existing raw-feature workflow.</p><h3>Runtime versus AP</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CRTD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CRTD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 424w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 848w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 1272w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CRTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png" width="1010" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:1010,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82976,&quot;alt&quot;:&quot;Full holdout runtime versus AP.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196470502?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full holdout runtime versus AP." title="Full holdout runtime versus AP." srcset="https://substackcdn.com/image/fetch/$s_!CRTD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 424w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 848w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 1272w, https://substackcdn.com/image/fetch/$s_!CRTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78907b5f-91c0-44ea-9ae2-f9783ef24096_1010x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full holdout runtime versus AP.</figcaption></figure></div><p>The runtime plot is one of the most useful figures in the notebook. It shows that the best AP is not the only thing that matters.</p><p>Raw XGBoost is fastest. Raw + TabICL takes longer because the TabICL embeddings have to be generated, but it improves AP. Raw + TabPFN + TabICL is much slower and does not improve AP over TabICL alone in this run. Raw + TabPFN alone is also not attractive here because it is slower than raw XGBoost and lower AP than the raw baseline in the full-holdout results.</p><p>This is the tradeoff the notebook is meant to surface. If embeddings are produced offline in a batch-scored workflow, several minutes of feature generation may be acceptable. If the use case needs low-latency online scoring, the engineering burden is different.</p><h3>Alert-count view</h3><p>The operating-point results make the comparison more concrete. On the full holdout:</p><p>At 80% recall:</p><ul><li><p>Raw + TabICL embeddings: 91 alerts, 60 frauds found, 65.93% precision.</p></li><li><p>Raw all-history incumbent: 94 alerts, 60 frauds found, 63.83% precision.</p></li><li><p>Raw: 109 alerts, 60 frauds found, 55.05% precision.</p></li><li><p>Raw + TabPFN + TabICL embeddings: 116 alerts, 60 frauds found, 51.72% precision.</p></li></ul><p>At 90% recall:</p><ul><li><p>Raw all-history incumbent: 1,064 alerts, 68 frauds found, 6.39% precision.</p></li><li><p>Raw + TabICL embeddings: 1,087 alerts, 68 frauds found, 6.26% precision.</p></li><li><p>Raw + TabPFN + TabICL embeddings: 2,147 alerts, 68 frauds found, 3.17% precision.</p></li><li><p>Raw: 2,284 alerts, 68 frauds found, 2.98% precision.</p></li></ul><p>At 80% recall, Raw + TabICL embeddings required the fewest alerts among these rows. At 90% recall, the raw all-history incumbent was slightly better than Raw + TabICL embeddings, with 1,064 alerts versus 1,087. Both were much better than the fair raw baseline.</p><p>This is a useful result because it prevents a simplistic conclusion. Depending on the operating point, the best practical choice may be Raw + TabICL or the raw all-history incumbent. The embedding path helps, but the strong incumbent baseline deserves respect.</p><h3>Calibration</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h5Xn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h5Xn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 424w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 848w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 1272w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h5Xn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png" width="689" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:689,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58169,&quot;alt&quot;:&quot;Full holdout calibration curves.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196470502?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Full holdout calibration curves." title="Full holdout calibration curves." srcset="https://substackcdn.com/image/fetch/$s_!h5Xn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 424w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 848w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 1272w, https://substackcdn.com/image/fetch/$s_!h5Xn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3ca8d5a-733c-4ae2-9fa8-cadf0245d0a2_689x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full holdout calibration curves.</figcaption></figure></div><p>The calibration results are less decisive than the ranking results.</p><p>On the full holdout:</p><ul><li><p>Raw + TabICL embeddings: AP 0.8128, Brier score 0.0004, log loss 0.0023.</p></li><li><p>Raw all-history incumbent: AP 0.8097, Brier score 0.0004, log loss 0.0028.</p></li><li><p>Raw + TabICL embeddings, calibrated: AP 0.8046, Brier score 0.0004, log loss 0.0031.</p></li><li><p>Raw: AP 0.7850, Brier score 0.0004, log loss 0.0030.</p></li><li><p>Raw, calibrated: AP 0.7995, Brier score 0.0004, log loss 0.0032.</p></li></ul><p>The calibration curves are compressed near the origin because the positive class is extremely rare. They do not, by themselves, prove that calibration improved probability quality. In this run, calibration often lowered AP, and log loss did not consistently improve.</p><p>This does not mean calibration is useless. It means this notebook should treat calibration as a diagnostic, not as a guaranteed improvement. In a production fraud workflow, I would want more reliability diagnostics: score-bin tables, expected calibration error, calibration by segment, and careful separation between the model used before calibration and the same model after calibration.</p><h3>What I take from the demo</h3><p>The notebook supports a practical hypothesis: tabular foundation model embeddings can be useful as additional features for an existing classical fraud workflow. It also shows why the comparison has to include strong classical incumbents, operating-point metrics, runtime, calibration diagnostics, and leakage checks. This is the kind of evidence I want to build: not a leaderboard claim, but a reusable workflow that shows what improves, what does not, and what still needs review.</p><h2>Known shortcomings in this version</h2><p>The most important shortcomings I see are:</p><ol><li><p>The exact ranking should be treated cautiously. The current completed run makes <code>Raw + TabICL embeddings</code> the best full-holdout AP row, with the raw all-history incumbent close behind. A previous completed execution of the same workflow put <code>Raw + TabPFN + TabICL embeddings</code> ahead by about 0.001 AP. The stable lesson is that TabICL embeddings look useful and TabPFN adds non-trivial cost here; the unstable lesson would be claiming that one embedding combination is always best.</p></li><li><p>The fair raw-vs-embedding tuning is not yet as strong as I want. The notebook requests multiple chronological folds, but after enforcing minimum fraud counts, the fair raw-vs-embedding comparison has only one valid fold. The raw all-history incumbent has more valid chronological folds. This means the hyperparameter-selection protocol is not equally strong for all rows yet.</p></li><li><p>The all-history raw incumbent is a serious baseline. This is good, but it also raises the bar. A production team may prefer a strong raw-feature XGBoost model trained on all available pre-holdout history over an embedding-enhanced path if the embedding lift is small or unstable. Future versions should keep this incumbent and make the comparison even cleaner.</p></li><li><p>Calibration is not isolated cleanly enough yet. The calibrated rows use a calibration window, while the uncalibrated final models can train on more pre-holdout labels. That means the comparison mixes two effects: calibration and different training-window sizes. I want to add an uncalibrated calibration-base row trained on the same rows as the calibrated base model, so the calibration effect can be evaluated more directly.</p></li><li><p>The calibration plots are visually weak. Because fraud is extremely rare, the reliability curves are compressed near the origin. They are useful as a warning, but not strong enough as production-grade calibration evidence. Better diagnostics would include Brier score, log loss, expected calibration error, and reliability tables by score quantile.</p></li><li><p>The sampling design needs to remain explicit. The TFM context and downstream training windows are intentionally fraud-enriched, while validation, calibration, and the full holdout remain full-prevalence windows. That is a reasonable rare-event design, but it should always be stated clearly so the reader does not confuse training prevalence with deployment prevalence.</p></li><li><p>The public dataset limits the leakage analysis. The notebook checks chronological order, target exclusion, duplicate rows, and duplicate model-feature rows. But the dataset does not expose customer, account, merchant, device, or label-timing information. In production, those would be required for stronger entity-level leakage and delayed-label checks.</p></li><li><p>The TabICL embedding path needs version discipline. TabPFN exposes a public embedding path. TabICL row representations are extracted through model internals in this notebook. That is acceptable for exploration, but a benchmark or production testbench should pin the version and review that extraction path whenever TabICL changes.</p></li></ol><p>I include this section because I do not want the post to read like a finished benchmark. It is a working notebook that already gives useful signal, but the output review also shows exactly where the next engineering iterations should go.</p><h2>Summary and Conclusion</h2><p>This post asks a different question from direct fraud scoring: can TabPFN or TabICL improve an existing XGBoost fraud workflow through embeddings?</p><p>The answer from this run is cautious but useful:</p><ol><li><p>TabICL embeddings improved XGBoost ranking quality on the full holdout.</p></li><li><p>The raw all-history XGBoost incumbent remained very competitive.</p></li><li><p>TabPFN embeddings did not help in this particular run.</p></li><li><p>Combining TabPFN and TabICL embeddings was not worth the extra runtime here.</p></li><li><p>Operating-point metrics are essential because AP alone does not tell a fraud team how large the review queue will be.</p></li><li><p>Calibration should be evaluated separately from ranking and should not be assumed to improve the workflow automatically.</p></li></ol><p>The most practical conclusion is the following. If a team already has a strong XGBoost fraud workflow, TabICL embeddings are worth testing as offline representation features. But they should be tested against a strong raw-feature incumbent, under chronological validation, with alert-count metrics and runtime included.</p><p>That is a more modest conclusion than saying tabular foundation models replace classical ML. It is also, in my view, more useful.</p><h2>Outlook</h2><p>This work is still in progress. The notebook already gives a more practical view of TFM adoption in a fraud workflow, but the known-shortcomings section above is the clearest map for the next iteration.</p><p>The next step is not simply to add more models. The next step is to make the evaluation protocol stronger: better chronological tuning, cleaner calibration comparisons, clearer figures, and more datasets. After that, it will be easier to say when TFM embeddings are genuinely useful for an existing tabular ML workflow and when a strong classical incumbent is still the better engineering choice.</p><p>I will continue improving this version in that direction. Direct prediction is one path for tabular foundation models. Offline embeddings are another. Feature auditing, data-quality checks, uncertainty estimation, and cold-start modelling may be others. I am learning by building these notebooks step by step, and the goal is to turn that learning into workflows that are useful to both model developers and data teams trying to understand where these models fit.</p>]]></content:encoded></item><item><title><![CDATA[[P15] TabPFN and TabICL for fraud detection - 1]]></title><description><![CDATA[Moving on from toy examples to realistic workflows]]></description><link>https://newsletter.dsaiengineering.com/p/p15-tabpfn-and-tabicl-for-fraud-detection</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p15-tabpfn-and-tabicl-for-fraud-detection</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Fri, 01 May 2026 18:38:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!h3us!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN&#8217;s predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>, using TabPFN for causal inference in <a href="https://open.substack.com/pub/dsaiengineering/p/p13-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P13</a>, and comparing TabPFN, TabICL, and supervised ML models in <a href="https://open.substack.com/pub/dsaiengineering/p/p14-tabular-foundation-models-comparing?utm_campaign=post-expanded-share&amp;utm_medium=web">P14</a>.</p><p>Yesterday, I created a notebook that brought TabPFN and TabICL into the same supervised-learning examples. That was a useful first comparison, but the examples were still close to toy examples. I later felt that I wanted to move on to examples that are closer to realistic data science workflow as soon as possible.</p><p>Today I moved one step in that direction. I built a fraud-detection workflow using TabICL, TabPFN, Logistic Regression, and XGBoost. I am still learning these models by building and running examples, so I treat this notebook as a practical exploration rather than a definitive evaluation. Fraud detection is a useful first workflow because it is widely used. In addition, it is an interesting example from a technical perspective because the positive class is rare, plain accuracy is misleading, and the practical question is not only whether a model has a good score, but whether it produces a usable review queue under runtime, calibration, and deployment constraints.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h3us!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h3us!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 424w, https://substackcdn.com/image/fetch/$s_!h3us!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 848w, https://substackcdn.com/image/fetch/$s_!h3us!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 1272w, https://substackcdn.com/image/fetch/$s_!h3us!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h3us!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png" width="1068" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1068,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153093,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196147796?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h3us!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 424w, https://substackcdn.com/image/fetch/$s_!h3us!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 848w, https://substackcdn.com/image/fetch/$s_!h3us!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 1272w, https://substackcdn.com/image/fetch/$s_!h3us!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4c3ae10-0f87-494c-8995-bf11bf299943_1068x784.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/34a67d7e3165c6140de2448870520a55779fc997/blog/20260501-tabpfn-tabicl-fraud-detection-1.assets/tabpfn-tabicl-fraud-detection-20260501.ipynb">here</a> on my GitHub repository. As explained in the notebook, it could also be run using time as a model feature (<code>USE_TIME_AS_FEATURE = True</code>) as the only change, and that notebook is available <a href="https://github.com/msaharan/dsaiengineering/blob/34a67d7e3165c6140de2448870520a55779fc997/blog/20260501-tabpfn-tabicl-fraud-detection-1.assets/tabpfn-tabicl-fraud-detection-20260501_USE_TIME_AS_FEATURE_True.ipynb">here</a>. The <code>Time=True</code> notebook is a sensitivity check because timestamp-derived behavior can be useful in fraud workflows, but it can also encode period-specific artifacts that need careful production review.</p><p>Usually, I write these posts with the conceptual background preceding the work of that day. Today, I am writing this post differently because I have included a lot of conceptual background and interpretation of the results in the notebook. Therefore, the conceptual background given in the following section is meant to supplement that given in the notebook, should the reader need it.</p><p>Today&#8217;s work is meant to be read directly from the notebook. In future posts, I plan to develop this notebook further to ensure a realistic data science workflow is used by testing it in various ways.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Conceptual Background</h2><h3>Tabular Foundation Models in This Notebook</h3><p>A tabular foundation model, or TFM, is a pretrained model intended to work across many tabular prediction tasks. In this post, the two TFMs are TabICLv2 and TabPFN.</p><p>For a practitioner familiar with supervised ML, the main difference is this: when I call <code>.fit()</code> on Logistic Regression or XGBoost, I am training a task-specific model from the current dataset. When I call <code>.fit()</code> on TabICL or TabPFN in this notebook, I am not training their large pretrained model weights from scratch; I am giving the model labelled rows as task context, and then the model uses that context to predict new rows.</p><p>This is why the context set matters for TFMs. The labelled rows given during <code>.fit()</code> are part of the task description the model uses at prediction time. Later in the notebook discussion, I distinguish between the context rows given to TabICL/TabPFN and the holdout rows used for evaluation.</p><h3><strong>Fraud Detection Is a Rare-Event Ranking Problem</strong></h3><p>The public credit-card fraud dataset used in the notebook has 284,807 transactions and 492 fraud cases. That is a fraud rate of about 0.17%.</p><p>In a dataset like this, accuracy is not a good primary metric by itself, where accuracy is the fraction of all predictions that are correct. A model that predicts &#8220;not fraud&#8221; for every transaction would be more than 99% accurate, but it would not help a fraud team find fraud cases.</p><p>A more relevant question is whether the model can put fraud cases near the top of a ranked list of risky transactions. For that, the useful concepts are precision, recall, and Average Precision.</p><p>Precision answers: among the transactions the model flags, what fraction are actually fraud?</p><p>Recall answers: among all fraud transactions, what fraction did the model catch?</p><p>Average Precision, or AP, summarizes the precision-recall curve across many possible thresholds. Higher AP is better. AP is more informative than accuracy here because it focuses on the rare positive class.</p><p>I also report ROC AUC. ROC AUC measures how often the model ranks a random fraud transaction above a random non-fraud transaction. Higher ROC AUC is better. However, I do not use it as the main result because ROC AUC can look good even when the actual fraud review queue still needs separate inspection.</p><h3>Fraud Teams Deploy Alert Queues, Not Average Precision</h3><p>Average Precision is useful for comparing rankers, but fraud teams usually do not deploy &#8220;Average Precision.&#8221; They deploy review queues.</p><p>For example, a team might ask:</p><ul><li><p>If we review the top 100 transactions, how many frauds do we catch?</p></li><li><p>If we can review the top 0.5% of transactions, what is precision and recall?</p></li><li><p>If we want to catch 80% of fraud cases, how many alerts must we investigate?</p></li><li><p>If we want to catch 90% of fraud cases, does the review queue become too large?</p></li></ul><p>Here, the phrase &#8220;alerts needed to reach 80% recall&#8221; means this: sort all transactions by model score from highest risk to lowest risk, then count how many transactions must be reviewed before 80% of the known fraud cases have appeared in that review list. Fewer alerts are better at a fixed recall target. Higher precision is also better because it means fewer false alerts in the review queue.</p><p>This is why the notebook includes alert-budget outputs. The goal is not only to ask which model has the highest AP, but also to inspect whether similar AP values lead to different operational outcomes.</p><h3>Time-Aware Validation Matters</h3><p>In ordinary tabular ML examples, we often use random train-test splits or random cross-validation. That is often fine for introductory demos, but it is less realistic for fraud detection. Fraud systems care about future transactions. Therefore, model selection should be based on earlier data and evaluation should happen on later data.</p><p>The notebook sorts the data by <code>Time</code> and uses four windows:</p><ul><li><p>earliest 60% of transactions: training period;</p></li><li><p>next 10% of transactions: validation period for classical model selection;</p></li><li><p>next 10% of transactions: calibration period for probability calibration;</p></li><li><p>final 20% of transactions: future holdout.</p></li></ul><p>This is still a simplified public-data workflow, but it is closer to a real fraud setting than random cross-validation. This is why the notebook has train, validation, calibration, and final holdout windows rather than one random train-test split.</p><h3>Calibration Is Different From Ranking</h3><p>A model can rank fraud cases well while producing poorly calibrated probabilities. This distinction matters.</p><p>If a model score is used only to sort transactions into a review queue, ranking quality is the main issue. If a score is interpreted as &#8220;this transaction has a 12% fraud probability,&#8221; then probability calibration matters.</p><p>A calibrated model is a model whose predicted probabilities match observed frequencies reasonably well. For example, among transactions that receive a predicted fraud probability near 10%, roughly 10% should actually be fraud if the model is well calibrated.</p><p>In the notebook, the calibrated Logistic Regression and calibrated XGBoost rows are not entirely new model families. They are the selected classical models plus a post-hoc sigmoid calibration step. The sigmoid calibration learns a mapping from the model&#8217;s scores to probabilities using the separate calibration window.</p><p>The notebook uses two calibration-related metrics:</p><ul><li><p>Brier score: the mean squared error between predicted probabilities and the true 0/1 labels. Lower is better.</p></li><li><p>Log loss: a probability-sensitive loss that strongly penalizes confident wrong probabilities. Lower is better.</p></li></ul><p>Because the TFM context is sampled differently from the full holdout, I treat probability calibration as something to inspect rather than assume.</p><h3>Why Compare Against Tuned Classical Baselines?</h3><p>If I compare TabICL and TabPFN against untuned classical models, the comparison would be weak. In normal data science workflows, a team would tune Logistic Regression or XGBoost, select hyperparameters on validation data, refit the selected model, and check calibration if probabilities are used for decisions.</p><p>So the notebook uses:</p><ul><li><p>Logistic Regression as a simple classical baseline.</p></li><li><p>XGBoost as a widely used boosted-tree baseline.</p></li><li><p>Calibrated variants of both classical models using a separate calibration window.</p></li><li><p>TabICLv2 and TabPFN as foundation-model comparators.</p></li></ul><p>I intentionally leave out Random Forest and CatBoost today to avoid leaning toward a broad benchmark. For this workflow test, I use fewer models and spend more space on validation design, full-holdout evaluation, alert-budget analysis, calibration diagnostics, runtime notes, and leakage checks.</p><h2>Closing thoughts</h2><p>This is all for today. I will pick it up next week from here. In the meantime, give the notebook a spin in Kaggle and let me know what you think.</p>]]></content:encoded></item><item><title><![CDATA[[P14] Tabular foundation models - comparing TabPFN, TabICL and supervised ML models (getting started)]]></title><description><![CDATA[Getting started with TabICLv2.]]></description><link>https://newsletter.dsaiengineering.com/p/p14-tabular-foundation-models-comparing</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p14-tabular-foundation-models-comparing</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Thu, 30 Apr 2026 18:23:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gnuc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN&#8217;s predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>, and using TabPFN for causal inference in <a href="https://open.substack.com/pub/dsaiengineering/p/p13-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P13</a>.</p><p>For a new reader, the minimum background is this: TabPFN is a pretrained tabular foundation model. Unlike XGBoost or Random Forest, its ordinary <code>.fit()</code> call does not update model weights to learn a fresh model from scratch. Instead, <code>.fit()</code> prepares the labelled rows as context for the current task, and TabPFN uses that context to predict new rows. That is why I have described TabPFN as a context-conditioned predictor throughout this series.</p><p>In previous posts, I completed the examples given in TabPFN&#8217;s hands-on demo notebook, except for unsupervised learning, which I left out intentionally. Today, I moved on to TabICLv2, which is a different tabular foundational model. I chose TabICLv2 because it&#8217;s almost as good as TabPFN and is free and open source, whereas TabPFN requires commercial license.</p><p>TabICLv2&#8217;s GitHub repository contains examples that demonstrate its usability, but there&#8217;s no comprehensive Jupyter notebook like that of TabPFN, and I felt that the examples are also basic. To test TabICLv2 and compare it with TabPFN, I felt that I first needed to develop a test bench. In this post, I am sharing with you the Jupyter notebook I created for this purpose. It contains the TabPFN-related code from my previous posts and the TabICL-related code from their GH repo. You can find the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/b3976b9d53bcfefc870fb355f9390aae9621290a/blog/20260430-understanding-tfm-tabpfn-tabicl-supml.assets/tfm-tabpfn-tabicl-supml-20260430.ipynb">here</a> in my GitHub repo.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Using the Notebook</h2><p>To use the notebook, you can download it from my GH repo and run it in Kaggle. Once you open Kaggle, - you can create a notebook as follows,</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X6Kr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X6Kr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 424w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 848w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 1272w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X6Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png" width="384" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8c97673-afd9-426e-9306-dab043379bdb_384x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:384,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20783,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X6Kr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 424w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 848w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 1272w, https://substackcdn.com/image/fetch/$s_!X6Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c97673-afd9-426e-9306-dab043379bdb_384x378.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>and import my notebook.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4uT_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4uT_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 424w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 848w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 1272w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4uT_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png" width="1456" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215414,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4uT_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 424w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 848w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 1272w, https://substackcdn.com/image/fetch/$s_!4uT_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26320773-bf99-4dd5-b99f-99930718cbb6_1853x751.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Make sure to use the GPU because both TabPFN and TabICL are extremely slow on the CPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LsR_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LsR_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 424w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 848w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 1272w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LsR_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png" width="416" height="520" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:520,&quot;width&quot;:416,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43718,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LsR_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 424w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 848w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 1272w, https://substackcdn.com/image/fetch/$s_!LsR_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1d0980a-6d94-468c-87fd-d863d2a70294_416x520.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once you have the setup ready, you would need Prior Labs' API token to download the weights of TabPFN and Hugging Face access token (optional) to download TabICL. You can specify them in Kaggle secrets, and they will be imported automatically when the notebook runs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SzD4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SzD4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 424w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 848w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 1272w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SzD4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png" width="1456" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:167929,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SzD4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 424w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 848w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 1272w, https://substackcdn.com/image/fetch/$s_!SzD4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c7fc96-93a9-47f6-baa4-4a168f04b06f_1796x582.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The secrets are always available on Kaggle (privately, only to you). So once you set them, you can always import any notebook I share on GitHub and use it immediately with your secrets.</p><h2>Contents</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Z9J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Z9J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 424w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 848w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 1272w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Z9J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png" width="1200" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173998,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Z9J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 424w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 848w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 1272w, https://substackcdn.com/image/fetch/$s_!8Z9J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F563c21ec-682f-47a8-b4e3-fbf781c6d96c_1200x771.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first four sections (0-3) of the notebook contain the code from my previous posts. I felt that the examples given in the TabICL repository were not as good as the ones I looked at in previous posts. Since I had already worked on making the classification, regression, and model interpretability (SHAP and embeddings) examples work in previous posts, I thought I would use them here and include TabICL instead of reinventing new examples. So that&#8217;s what I did.</p><p>The examples in section 4 and 5 are taken from the TabICL repository. For now, I only added them to the notebook and made them work. I wi&#8230;ll look into the details in future posts.</p><h3>Classification and Regression</h3><p>The following figures show the comparison of all the models for classification and regression tasks. I verified that the performance of TabPFN and other classical ML models is consistent with previous blog posts. TabICL shows improvement compared to TabPFN, but the difference is statistically insignificant. However, please keep in mind that this (and other observations mentioned below) is a preliminary observation, and I could have a closer look in future posts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gnuc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gnuc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gnuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png" width="990" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33788,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gnuc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!gnuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F754e5cd7-2883-4f6d-883d-a3de55400dff_990x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QSXN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QSXN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 424w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 848w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 1272w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QSXN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png" width="1389" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1389,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42042,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QSXN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 424w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 848w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 1272w, https://substackcdn.com/image/fetch/$s_!QSXN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F158529d1-a674-4dae-b9e0-e8fc5ea3f8b6_1389x617.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Model Interpretability: SHAP and Embeddings</h3><p>For now, I think there&#8217;s no need to comment on the SHAP plot other than the fact that this plot is in a different format than previous blog posts to enable the side-by-side comparison. I could look into it in future posts, but I don&#8217;t think SHAP is a priority, for now, because I am currently more interested in developing a holistic picture of tabular foundation models rather than digging deep into one topic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pwrC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pwrC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 424w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 848w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 1272w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pwrC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png" width="890" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:890,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46173,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pwrC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 424w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 848w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 1272w, https://substackcdn.com/image/fetch/$s_!pwrC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062466b2-38a5-43a8-9c0e-9c89139bad36_890x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The embeddings section highlighted an important point. While TabPFN exposes embeddings through <code>TabPFNEmbedding</code>, TabICLv2 does not currently expose the same public embedding API. However, it can cache row representations when <code>kv_cache="repr"</code> is used, and that was used to produce the comparison shown below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lQ1I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lQ1I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 424w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 848w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 1272w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lQ1I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png" width="1211" height="511" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:511,&quot;width&quot;:1211,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106369,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lQ1I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 424w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 848w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 1272w, https://substackcdn.com/image/fetch/$s_!lQ1I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a115ca-037e-48df-8cdd-18bf43ef033b_1211x511.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Probability Surfaces</h3><p>The following figure shows how each classifier distributes class probability across a noisy two-dimensional input space. The point is not to ask which model has the best score, but how each model behaves between observed training points. For now, I won&#8217;t go into the details and leave them for future posts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C_To!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C_To!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 424w, https://substackcdn.com/image/fetch/$s_!C_To!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 848w, https://substackcdn.com/image/fetch/$s_!C_To!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 1272w, https://substackcdn.com/image/fetch/$s_!C_To!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C_To!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png" width="908" height="711" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:711,&quot;width&quot;:908,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:217823,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C_To!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 424w, https://substackcdn.com/image/fetch/$s_!C_To!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 848w, https://substackcdn.com/image/fetch/$s_!C_To!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 1272w, https://substackcdn.com/image/fetch/$s_!C_To!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07d883b-1cba-4a77-a1d6-aa6e219e2408_908x711.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Quantile Regression</h3><p>The following figure shows the comparison of predictive intervals on a regression problem. Again, I will discuss the details in future post.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!18Ny!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!18Ny!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 424w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 848w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 1272w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!18Ny!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png" width="1011" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:411,&quot;width&quot;:1011,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:93332,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/196028707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!18Ny!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 424w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 848w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 1272w, https://substackcdn.com/image/fetch/$s_!18Ny!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a584656-722d-421f-b1b9-28e4f1c09bcc_1011x411.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Closing Thoughts</h2><p>With this post, this series has moved on from using only TabPFN to comparing TabPFN and TabICL. Today, I only shared the code and output. I will go deeper in comparing TabPFN and TabICL in future posts. In the meantime, I encourage you to play with the notebook. If you do, let me know in the comments what you think of these models and the modifications you made.</p>]]></content:encoded></item><item><title><![CDATA[[P13] Understanding tabular foundation models: causal inference with TabPFN]]></title><description><![CDATA[This post continues my series on tabular foundation models.]]></description><link>https://newsletter.dsaiengineering.com/p/p13-understanding-tabular-foundation</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p13-understanding-tabular-foundation</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Wed, 29 Apr 2026 13:35:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6_By!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Introduction</h2><p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, TabPFN&#8217;s predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>, and time series forecasting with TabPFN in <a href="https://open.substack.com/pub/dsaiengineering/p/p12-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P12</a>.</p><p>For a new reader, the minimum background is this: TabPFN is a pretrained tabular foundation model. Unlike XGBoost or Random Forest, its ordinary <code>.fit()</code> call does not update model weights to learn a fresh model from scratch. Instead, <code>.fit()</code> prepares the labelled rows as context for the current task, and TabPFN uses that context to predict new rows. That is why I have described TabPFN as a context-conditioned predictor throughout this series.</p><p>Today I cover the causal inference section of the official <a href="https://colab.research.google.com/github/PriorLabs/TabPFN/blob/main/examples/notebooks/TabPFN_Demo_Local.ipynb">TabPFN hands-on demo notebook</a>. You can find my local version of the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/1cda532a192c14129c7567e4a6fb789956d80292/blog/20260429-understanding-tfm-causal-inference-tabpfn.assets/tabpfn-hands-on-demo-msaharan-20260429.ipynb">here</a>.</p><p>The interesting question is different from the previous posts. In classification and regression, the goal was to predict an observed target. In causal inference, the goal is to estimate what would change if we intervened. For example: if two customers have similar covariates, but one receives a treatment and the other does not, how much of the outcome difference should be attributed to the treatment rather than to pre-existing differences between the customers?</p><p>A practical caveat is important here. TabPFN does not automatically make a causal claim true. It is still a predictive model. In the notebook, TabPFN is used as a base learner inside causal estimators from <code>econml</code>. The causal identification comes from the assumptions and estimator design. TabPFN contributes flexible outcome and propensity modeling.</p><p>The notebook places this example in a broader research direction: choosing base models for CATE estimators with AutoML (<a href="https://openreview.net/forum?id=QbOoz74GNO">Vandershueren et al., 2025</a>), using TabPFN as an out-of-the-box base learner in CATE pipelines (<a href="https://arxiv.org/pdf/2505.20003">Zhang et al., 2025</a>), and pretraining PFN-style models directly for causal effect estimation (<a href="https://arxiv.org/pdf/2506.06039">Robertson et al.</a>). I will not review those papers here. I use them as signposts for why this notebook section belongs in the TabPFN ecosystem.</p><p>The post is organized as follows:</p><ol><li><p>Introduction: why causal inference is different from ordinary prediction.</p></li><li><p>Conceptual background: potential outcomes, CATE, confounding, propensity models, outcome models, and PEHE.</p></li><li><p>Hands-on demo: generating a synthetic CATE dataset, fitting CATE estimators with TabPFN, and reading the evaluation.</p></li><li><p>Summary and conclusion: what this example shows, what it does not show, and what to keep in mind.</p></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>2. Conceptual Background</h2><p>Before going to the hands-on demo, I want to set up the concepts that make the example meaningful. This section does five things:</p><ol><li><p>It explains why causal inference is not the same as prediction.</p></li><li><p>It defines potential outcomes and the conditional average treatment effect.</p></li><li><p>It explains the observed-confounder setting used in the notebook.</p></li><li><p>It shows where outcome models and propensity models enter the workflow.</p></li><li><p>It defines the PEHE metric used to evaluate CATE estimates in the synthetic demo.</p></li></ol><h3>2.1 Working Vocabulary</h3><p>The key terms for this post are:</p><ul><li><p>Treatment: the action or exposure whose effect we want to estimate. In the notebook, this is the binary variable \(T \in \{0,1\}\).</p></li><li><p>Outcome: the response variable affected by treatment. In the notebook, this is \(Y\).</p></li><li><p>Covariates: observed features measured before treatment. In the notebook, these are the columns in \(X\).</p></li><li><p>Potential outcomes: \(Y(1)\) and \(Y(0)\), the outcomes that would be observed under treatment and no treatment.</p></li><li><p>Individual treatment effect: \(\tau_i = Y_i(1) - Y_i(0)\), the effect for one individual unit.</p></li><li><p>CATE: the conditional average treatment effect, \(\tau(x) = \mathbb{E}[Y(1) - Y(0) | X=x]\). Here, \(\mathbb{E}\) means expectation, or the average value under the relevant probability distribution.</p></li><li><p>Propensity score: \(e(x) = \mathbb{P}(T=1 | X=x)\), the probability of receiving treatment given covariates. Here, \(\mathbb{P}\) means probability.</p></li><li><p>Outcome model: a model for \(\mathbb{E}[Y | T=t, X=x]\), usually estimated separately or jointly for \(t=0\) and \(t=1\).</p></li><li><p>Confounder: a variable that affects both treatment assignment and outcome.</p></li><li><p>Unconfoundedness: the assumption that, after conditioning on observed covariates \(X\), treatment assignment is independent of the potential outcomes.</p></li></ul><p>I use \(i\) to index one row or unit, \(x\) for a specific covariate value, \(X\) for the covariate random variable or matrix depending on context, \(T\) for treatment, and \(Y\) for the observed outcome. This is consistent with the earlier posts, where lowercase symbols denote concrete values and uppercase symbols denote random variables or matrices.</p><p>In ordinary supervised learning, we usually learn a prediction rule: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}(x) \\approx \\mathbb{E}[Y | X=x].&quot;,&quot;id&quot;:&quot;GXYTJAZACU&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is useful, but it is associational. It answers: given features \(x\), what outcome do we expect?</p><p>Causal inference asks a different question:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}[Y | do(T=1), X=x]\n\n-\n\n\\mathbb{E}[Y | do(T=0), X=x].&quot;,&quot;id&quot;:&quot;JJFGNRVBYQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notation \(do(T=1)\) means that treatment is set by intervention, not merely observed. This distinction matters because people who receive a treatment may differ systematically from people who do not receive it.</p><h3>2.2 Potential Outcomes and CATE</h3><p>For each unit \(i\), imagine two possible outcomes: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Y_i(1) = \\text{outcome for unit } i \\text{ if treated},&quot;,&quot;id&quot;:&quot;XJPDVZEIGH&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Y_i(0) = \\text{outcome for unit } i \\text{ if not treated}.&quot;,&quot;id&quot;:&quot;NMOQCCBTBP&quot;}" data-component-name="LatexBlockToDOM"></div><p>The individual treatment effect is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau_i = Y_i(1) - Y_i(0).&quot;,&quot;id&quot;:&quot;RPIEIDAGGK&quot;}" data-component-name="LatexBlockToDOM"></div><p>The fundamental problem of causal inference is that, for a real unit, we usually observe only one of these two outcomes. If \(T_i=1\), we observe \(Y_i(1)\) and not \(Y_i(0)\). If \(T_i=0\), we observe \(Y_i(0)\) and not \(Y_i(1)\). The observed outcome is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Y_i = T_iY_i(1) + (1-T_i)Y_i(0).&quot;,&quot;id&quot;:&quot;SCAFJGHWNR&quot;}" data-component-name="LatexBlockToDOM"></div><p>Since individual effects are not directly observed in ordinary data, many causal workflows estimate average effects. The object in the notebook is the conditional average treatment effect:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau(x)\n\n= \\mathbb{E}[Y(1) - Y(0) | X=x].&quot;,&quot;id&quot;:&quot;OCAHQMVSEP&quot;}" data-component-name="LatexBlockToDOM"></div><p>This tells us how the treatment effect changes across the covariate space. In other words, CATE estimation is about treatment-effect heterogeneity.</p><p>This is different from the average treatment effect, which averages over the population:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{ATE} = \\mathbb{E}[Y(1) - Y(0)].&quot;,&quot;id&quot;:&quot;EFJPOJCLGL&quot;}" data-component-name="LatexBlockToDOM"></div><p>If the treatment effect is the same for everyone, CATE and ATE carry similar information. If the effect varies by \(X\), CATE is more informative because it tells us where the treatment works more or less strongly.</p><p>The same object can be written using the intervention notation: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau(x)\n\n= \\mathbb{E}[Y | do(T=1), X=x]\n\n- \\mathbb{E}[Y | do(T=0), X=x].&quot;,&quot;id&quot;:&quot;QPBOLLTBXT&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is the equation shown in the notebook&#8217;s causal inference section. The point is not only to predict \(Y\). The point is to compare two counterfactual worlds for the same covariate profile \(x\).</p><h3>2.3 Confounding and Identification</h3><p>The notebook uses a simple observed-confounder graph:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6_By!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6_By!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 424w, https://substackcdn.com/image/fetch/$s_!6_By!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 848w, https://substackcdn.com/image/fetch/$s_!6_By!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 1272w, https://substackcdn.com/image/fetch/$s_!6_By!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6_By!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png" width="660" height="499" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:660,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24403,&quot;alt&quot;:&quot;Causal graph.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195868118?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Causal graph." title="Causal graph." srcset="https://substackcdn.com/image/fetch/$s_!6_By!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 424w, https://substackcdn.com/image/fetch/$s_!6_By!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 848w, https://substackcdn.com/image/fetch/$s_!6_By!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 1272w, https://substackcdn.com/image/fetch/$s_!6_By!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4544c372-1e0b-4b26-ac4a-95db4c941b62_660x499.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Causal graph.</figcaption></figure></div><p>The graph has three arrows:</p><ul><li><p>\(X \rightarrow T\): covariates affect treatment assignment.</p></li><li><p>\(X \rightarrow Y\): covariates affect the outcome.</p></li><li><p>\(T \rightarrow Y\): treatment affects the outcome.</p></li></ul><p>This is a confounded observational setting because \(X\) affects both \(T\) and \(Y\). If we ignore \(X\), the observed association between \(T\) and \(Y\) mixes at least two effects:</p><ol><li><p>The causal effect of treatment on outcome.</p></li><li><p>The pre-existing difference in covariates between treated and untreated units.</p></li></ol><p>The notebook is constructed so that all confounding is observed in \(X\). This is the setting where the unconfoundedness assumption can hold:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(Y(0), Y(1)) \\perp\\!\\!\\!\\perp T | X.&quot;,&quot;id&quot;:&quot;ZYUDPKNMAF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\perp\!\!\!\perp\) means statistical independence, and the vertical bar \(|\) means &#8220;conditional on.&#8221; So this expression says that, after conditioning on \(X\), treatment assignment behaves as if it were independent of the two potential outcomes.</p><p>In practice, this is a strong assumption. It cannot be guaranteed by using TabPFN, XGBoost, Random Forest, or any other predictive model. It is a causal assumption about the data-generating process. In randomized experiments, randomization can help justify it. In observational data, it depends on whether the relevant confounders were measured and adjusted for.</p><p>The consistency assumption is also needed. It says that the observed outcome equals the potential outcome under the treatment actually received:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Y = Y(T).&quot;,&quot;id&quot;:&quot;IVBYNVKHDU&quot;}" data-component-name="LatexBlockToDOM"></div><p>So, if \(T=1\), the observed outcome is \(Y(1)\), and if \(T=0\), the observed outcome is \(Y(0)\).</p><p>There is another important assumption called overlap or positivity: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;0 < \\mathbb{P}(T=1 | X=x) < 1.&quot;,&quot;id&quot;:&quot;NDRNRJDWLW&quot;}" data-component-name="LatexBlockToDOM"></div><p>This means that for the covariate profiles we care about, there must be some treated and some untreated examples. If everyone with a certain covariate profile always receives treatment, then we do not have empirical support for the untreated counterfactual in that region.</p><p>For a binary treatment, under assumptions such as consistency, unconfoundedness, and overlap, the CATE can be identified from observed data: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau(x)\n\n= \\mathbb{E}[Y | T=1, X=x]\n\n- \\mathbb{E}[Y | T=0, X=x].&quot;,&quot;id&quot;:&quot;YGUYQCJISQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This equation is where predictive modeling enters the workflow. We need to estimate conditional outcome functions from data.</p><h3>2.4 Outcome Models, Propensity Models, and TabPFN</h3><p>Many CATE estimators use one or both of the following nuisance models: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;m_t(x) = \\mathbb{E}[Y | T=t, X=x],&quot;,&quot;id&quot;:&quot;LKEFMCESMV&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;e(x) = \\mathbb{P}(T=1 | X=x).&quot;,&quot;id&quot;:&quot;GVCMWTEZYO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(m_t(x)\) is an outcome model and \(e(x)\) is a propensity model. They are called nuisance models because they are not always the final object of interest, but the causal estimator uses them to estimate the treatment effect.</p><p>This is exactly where TabPFN can be useful. TabPFN is a pretrained tabular model that can serve as:</p><ul><li><p>a regression model for the outcome \(Y\);</p></li><li><p>a classification model for the treatment \(T\);</p></li><li><p>a flexible base learner inside a causal estimator.</p></li></ul><p>The notebook uses TabPFN in two ways:</p><ol><li><p>As the overall model in an S-learner.</p></li><li><p>As the outcome and treatment model inside <code>CausalForestDML</code>.</p></li></ol><p>The S-learner fits one model for the outcome: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{m}(x,t) \\approx \\mathbb{E}[Y | X=x, T=t].&quot;,&quot;id&quot;:&quot;DHEIPKZPYF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then it estimates CATE by changing only the treatment input: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{\\tau}(x) = \\hat{m}(x,1) - \\hat{m}(x,0).&quot;,&quot;id&quot;:&quot;CCXJYJTTAK&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is simple and intuitive. The same fitted outcome model is asked two counterfactual prediction questions.</p><p>The <code>CausalForestDML</code> approach uses a different strategy. At a high level, double machine learning first estimates nuisance components such as the baseline outcome model and treatment model, then uses residualized variation to estimate treatment effects. A simplified residual idea is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{Y} = Y - \\hat{g}(X),&quot;,&quot;id&quot;:&quot;OAXGGKNIMN&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tilde{T} = T - \\hat{e}(X).&quot;,&quot;id&quot;:&quot;RUPDDIJPZV&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(g(x) = \mathbb{E}[Y|X=x]\) is the baseline outcome function, \(\hat{g}(X)\) is its estimate, and \(\hat{e}(X)\) is the estimated treatment propensity. The final stage then uses the part of \(Y\) and \(T\) not already explained by \(X\) to estimate heterogeneous treatment effects. The causal forest part allows the treatment effect to vary over the feature space.</p><p>The important point for this post is: TabPFN is not replacing causal identification. It is replacing or supporting the predictive sub-models inside the causal pipeline.</p><p>This split is useful for practitioners who already know supervised ML. The causal workflow is mostly familiar model-fitting machinery, but used for a different target:</p><ul><li><p>Standard supervised ML part: fit flexible models for outcomes or treatment assignment from tabular data.</p></li><li><p>Causal inference part: decide whether the graph and assumptions justify interpreting adjusted differences as causal effects.</p></li><li><p>TabPFN-specific part: use a pretrained, context-conditioned tabular model as a strong default base learner without training a fresh task-specific model from scratch.</p></li></ul><p>So the new capability is not &#8220;causality from TabPFN.&#8221; The practical capability is that a tabular foundation model can be plugged into existing causal estimators as a low-tuning base learner for the predictive components.</p><h3>2.5 PEHE</h3><p>Since the notebook uses synthetic data, the true CATE \(\tau(x)\) is known. That means we can evaluate the estimated CATE directly. The metric used is PEHE, which stands for precision in estimation of heterogeneous effects.</p><p>For test points \(x_1,\ldots,x_n\), PEHE is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{PEHE}\n\n= \\sqrt{\n\n\\frac{1}{n}\n\n\\sum_{i=1}^{n}\n\n(\\hat{\\tau}(x_i) - \\tau(x_i))^2\n\n}.&quot;,&quot;id&quot;:&quot;DZKKAACKBN&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is simply the root mean squared error between the estimated treatment effect and the true treatment effect. Lower PEHE is better.</p><p>In real observational data, we usually do not know \(\tau(x)\), so PEHE cannot be computed directly. That is why synthetic examples are useful for learning: they let us test whether the estimator can recover a known causal effect under a controlled data-generating process.</p><h2>3. Hands-on Demo</h2><p>The conceptual background prepared us for the notebook example. The causal section of the notebook does four things:</p><ol><li><p>It creates a synthetic observed-confounder setting.</p></li><li><p>It generates potential outcomes and a true CATE value for each row.</p></li><li><p>It fits three CATE estimators.</p></li><li><p>It compares the estimated CATE values with the known true values using PEHE.</p></li></ol><p>The full notebook contains the install commands and imports. Below, I show the parts that matter for understanding the workflow.</p><h3>3.1 Generating the Causal Graph</h3><p>The notebook first draws the graph:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import networkx as nx

edges = [["X", "Y"], ["T", "Y"], ["X", "T"]]
graph = nx.DiGraph(edges)
nx.draw(graph, with_labels=True, node_size=2000)</code></pre></div><p>This graph makes the identification setup explicit. \(X\) affects treatment assignment, \(X\) also affects the outcome, and treatment affects the outcome. So a naive comparison of treated and untreated outcomes would be confounded.</p><p>The reason the example is still manageable is that \(X\) is observed. The causal estimator can adjust for \(X\).</p><h3>3.2 Creating the Synthetic CATE Dataset</h3><p>The notebook creates a dataset with 500 rows and 5 covariates:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import numpy as np

num_samples, num_features, train_test_split = 500, 5, 75

noise_scale = 0.01
exogenous_scale = 0.1
heterogeneity_scale = 2

X = np.random.normal(loc=0, scale=exogenous_scale, size=(num_samples, num_features)).astype(np.float32)
prop_eps = np.random.normal(loc=0, scale=noise_scale, size=(num_samples)).astype(np.float32)
Y_eps = np.random.normal(loc=0, scale=noise_scale, size=(num_samples)).astype(np.float32)

w_X_Y = np.random.uniform(size=num_features)
w_X_T = np.random.uniform(size=num_features)
w_X_T_effect = np.random.uniform(size=num_features)

T_0 = np.zeros(shape=num_samples).astype(np.float32)
T_1 = np.ones(shape=num_samples).astype(np.float32)

base_X_Y = np.sum(w_X_Y * X, axis=1)
base_X_T = np.sum(w_X_T * X, axis=1)
heterogeneity_term = np.sum(w_X_T_effect * X, axis=1)

propensity = np.sin(base_X_T) + prop_eps
T = (propensity &gt; propensity.mean()).astype(np.float32)

Y_0 = np.cos(base_X_Y) + np.sin(0.3 * T_0) + Y_eps
Y_1 = np.cos(base_X_Y) + np.sin(0.3 * T_1 + heterogeneity_scale * heterogeneity_term) + Y_eps

Y = Y_0 * (1 - T) + Y_1 * T
tau = Y_1 - Y_0</code></pre></div><p>Here, <code>train_test_split</code> is the notebook's variable name for the number of training rows. It is not the <code>train_test_split</code> function from <code>sklearn</code>.</p><p>The treatment assignment depends on \(X\) through <code>base_X_T</code>, which is why \(X \rightarrow T\) appears in the graph. The outcome depends on \(X\) through <code>base_X_Y</code>, which is why \(X \rightarrow Y\) appears. The treated and untreated potential outcomes differ through the treatment term, which is why \(T \rightarrow Y\) appears.</p><p>This code is useful because it makes the hidden causal quantities visible. In real data, we would observe \(X\), \(T\), and \(Y\), but not both \(Y_0\) and \(Y_1\). In this synthetic dataset, we know both potential outcomes, so we can compute: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau = Y_1 - Y_0.&quot;,&quot;id&quot;:&quot;FSTIYCEOOB&quot;}" data-component-name="LatexBlockToDOM"></div><p>The notebook stores this row-level ground truth treatment effect as <code>tau</code>. In this synthetic construction, <code>tau</code> is the target used to evaluate how well the estimators recover the heterogeneous treatment effect.</p><p>The following plot shows the true treatment-effect heterogeneity against two features, with points colored by observed treatment assignment:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({**{f"X{i}": X[:, i] for i in range(num_features)}, "T": T, "Y": Y, "Y_0": Y_0, "Y_1": Y_1, "tau": tau})
fig, axes = plt.subplots(ncols=2, figsize=(6, 3))
sns.scatterplot(data=df, x="X0", y="tau", hue="T", ax=axes[0], edgecolor="black", palette="Set1")
sns.scatterplot(data=df, x="X1", y="tau", hue="T", ax=axes[1], edgecolor="black", palette="Set1")
fig.tight_layout()</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Umk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Umk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 424w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 848w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 1272w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Umk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png" width="590" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103040,&quot;alt&quot;:&quot;Synthetic CATE heterogeneity.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195868118?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Synthetic CATE heterogeneity." title="Synthetic CATE heterogeneity." srcset="https://substackcdn.com/image/fetch/$s_!5Umk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 424w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 848w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 1272w, https://substackcdn.com/image/fetch/$s_!5Umk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f940c0-02df-496e-8a4a-61fe42ed5c1a_590x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Synthetic CATE heterogeneity.</figcaption></figure></div><p>The treatment effect is not constant. It changes with the covariates. This is what makes the task a CATE estimation task rather than only an average treatment effect estimation task.</p><p>The notebook then creates a train/test split:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">perm = np.random.permutation(num_samples)
train_idx, test_idx = perm[:train_test_split], perm[train_test_split:]
X_train, X_test = X[train_idx], X[test_idx]
Y_train, Y_test = Y[train_idx], Y[test_idx]
T_train, T_test = T[train_idx], T[test_idx]
tau_test = tau[test_idx]</code></pre></div><p>In this run, the training set has only 75 rows and the test set has 425 rows. That makes the example a useful small-data setting for TabPFN.</p><h3>3.3 Fitting CATE Estimators with TabPFN</h3><p>The notebook compares three estimators:</p><ul><li><p><code>s_tabpfn</code>: an S-learner using <code>TabPFNRegressor</code> as the overall outcome model.</p></li><li><p><code>cfdml_tabpfn</code>: <code>CausalForestDML</code> using <code>TabPFNRegressor</code> for the outcome model and <code>TabPFNClassifier</code> for the treatment model.</p></li><li><p><code>cfdml_default</code>: <code>CausalForestDML</code> with its default nuisance models.</p></li></ul><p>The imports are:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from econml.metalearners import SLearner
from econml.dml import CausalForestDML
from tabpfn import TabPFNClassifier, TabPFNRegressor</code></pre></div><p>The S-learner is fitted as:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">s_tabpfn = SLearner(overall_model=TabPFNRegressor())

s_tabpfn.fit(Y=Y_train, X=X_train, T=T_train)
s_tabpfn_cate = s_tabpfn.effect(X=X_test)</code></pre></div><p>Conceptually, this estimates a single outcome function \(\hat{m}(x,t)\), then evaluates the difference: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{\\tau}(x) = \\hat{m}(x,1) - \\hat{m}(x,0).&quot;,&quot;id&quot;:&quot;FXTKDSVKPH&quot;}" data-component-name="LatexBlockToDOM"></div><p>The TabPFN-backed causal forest is fitted as:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">cfdml_tabpfn = CausalForestDML(model_y=TabPFNRegressor(), model_t=TabPFNClassifier(), discrete_treatment=True)

cfdml_tabpfn.fit(Y=Y_train, X=X_train, T=T_train)
cfdml_tabpfn_cate = cfdml_tabpfn.effect(X=X_test)</code></pre></div><p>Here, TabPFN is used for the nuisance models. The final heterogeneous treatment-effect estimation is handled by <code>CausalForestDML</code>.</p><p>The default comparison is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">cfdml_default = CausalForestDML(discrete_treatment=True)

cfdml_default.fit(Y=Y_train, X=X_train, T=T_train)
cfdml_default_cate = cfdml_default.effect(X=X_test)</code></pre></div><p>This comparison is useful because it separates two questions:</p><ol><li><p>Does TabPFN work well as a direct base learner in a simple meta-learner?</p></li><li><p>Does TabPFN improve the nuisance-modeling stage inside a more structured causal estimator?</p></li></ol><h3>3.4 Evaluating CATE Recovery</h3><p>Since the synthetic data contains the true test-set treatment effect <code>tau_test</code>, the notebook computes PEHE:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from sklearn.metrics import root_mean_squared_error

s_tabpfn_pehe = root_mean_squared_error(tau_test, s_tabpfn_cate)
cfdml_tabpfn_pehe = root_mean_squared_error(tau_test, cfdml_tabpfn_cate)
cfdml_default_pehe = root_mean_squared_error(tau_test, cfdml_default_cate)

print(f"s_tabpfn PEHE: {s_tabpfn_pehe}")
print(f"cfdml_tabpfn PEHE: {cfdml_tabpfn_pehe}")
print(f"cfdml_default PEHE: {cfdml_default_pehe}")</code></pre></div><p>In my notebook run, the output was:</p><ul><li><p><code>s_tabpfn</code> PEHE: 0.0715.</p></li><li><p><code>cfdml_tabpfn</code> PEHE: 0.1956.</p></li><li><p><code>cfdml_default</code> PEHE: 0.2803.</p></li></ul><p>Lower PEHE means better recovery of the true heterogeneous treatment effect. In this run, the S-learner with TabPFN performs best.</p><p>The final plot compares the true CATE on the x-axis with each model&#8217;s predicted CATE on the y-axis:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">df_test = pd.DataFrame({"T": T_test, "Y": Y_test, "tau": tau_test, **{f"X{i}": X_test[:, i] for i in range(num_features)}})
df_test["s_tabpfn_pred"], df_test["cfdml_tabpfn_pred"], df_test["cfdml_default_pred"] = s_tabpfn_cate, cfdml_tabpfn_cate, cfdml_default_cate

fig, axes = plt.subplots(ncols=3, figsize=(9, 3))
sns.scatterplot(data=df_test, x="tau", y="s_tabpfn_pred", hue="T", ax=axes[0], edgecolor="black", palette="Set1")
sns.scatterplot(data=df_test, x="tau", y="cfdml_tabpfn_pred", hue="T", ax=axes[1], edgecolor="black", palette="Set1")
sns.scatterplot(data=df_test, x="tau", y="cfdml_default_pred", hue="T", ax=axes[2], edgecolor="black", palette="Set1")
fig.tight_layout()</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TMaj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TMaj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 424w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 848w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 1272w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TMaj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png" width="889" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e50602f-b357-4ef4-8b0d-223077804427_889x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:889,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:137968,&quot;alt&quot;:&quot;Predicted vs true CATE.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195868118?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Predicted vs true CATE." title="Predicted vs true CATE." srcset="https://substackcdn.com/image/fetch/$s_!TMaj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 424w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 848w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 1272w, https://substackcdn.com/image/fetch/$s_!TMaj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e50602f-b357-4ef4-8b0d-223077804427_889x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Predicted vs true CATE.</figcaption></figure></div><p></p><p>The left panel shows that <code>s_tabpfn_pred</code> closely tracks the true \tau. The middle and right panels show more compressed and noisier estimates for the two causal forest variants in this particular run.</p><p>This result should be read as a controlled notebook example, not as a general benchmark. The notebook uses a small synthetic dataset, no fixed random seed in the shown code, and one data-generating process. What it shows is narrower but still useful: in this setting, TabPFN can be a strong base learner for recovering a smooth heterogeneous treatment effect from a small context dataset.</p><p>For a stronger evaluation template, I would add two diagnostics:</p><ol><li><p>Add a diagonal y=x reference line to the predicted-vs-true CATE plots.</p></li><li><p>Repeat the experiment over many random seeds and report mean PEHE with uncertainty intervals.</p></li></ol><p>That would make the comparison more stable and easier to read.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>4. Summary and Conclusion</h2><p>In this post, I used the causal inference section of the official TabPFN hands-on demo to understand how TabPFN can be used inside CATE estimation workflows.</p><p>The conceptual section separated prediction from causal inference. Ordinary supervised learning estimates associations such as \(\mathbb{E}[Y|X=x]\). Causal inference asks intervention questions such as \(\mathbb{E}[Y|do(T=1),X=x] - \mathbb{E}[Y|do(T=0),X=x]\). This distinction is the main reason we needed potential outcomes, confounding, unconfoundedness, overlap, outcome models, and propensity models before reading the code.</p><p>The hands-on demo then created a synthetic observed-confounder setting with \(X \rightarrow T\), \(X \rightarrow Y\), and \(T \rightarrow Y\). Because the data were synthetic, the notebook had access to both potential outcomes \(Y_0\) and \(Y_1\), so it could compute the true CATE \(\tau = Y_1 - Y_0\). That made PEHE available as a direct evaluation metric.</p><p>The notebook compared three estimators: an S-learner using TabPFN, a <code>CausalForestDML</code> estimator using TabPFN for nuisance models, and a default <code>CausalForestDML</code> estimator. In my run, the S-learner with TabPFN had the lowest PEHE and the clearest predicted-vs-true CATE pattern.</p><p>The main takeaway is not that TabPFN solves causal inference by itself. It does not. The causal assumptions still matter. The estimator still matters. The graph still matters. What TabPFN may offer is a strong tabular base learner for the predictive subproblems inside causal estimators, especially in small-to-medium tabular settings where ordinary model selection and tuning can become tedious.</p><p>This also connects back to the pre-training discussion from P6. TabPFN&#8217;s synthetic pretraining already uses structural causal models as part of the prior over tabular tasks. The current demo uses TabPFN as a base model inside standard CATE estimators. The next step in the field is even more direct: pretraining PFN-style models specifically for causal effect estimation, such as the Do-PFN direction mentioned in the notebook.</p><p>With this post, I have covered another remaining section of the TabPFN hands-on demo. In the upcoming posts, I will continue exploring the parts of the TabPFN ecosystem that are most relevant for practical tabular data workflows: when TabPFN should be used directly, when it should be used as a component inside a larger method, and when classical tabular ML remains the simpler choice.</p>]]></content:encoded></item><item><title><![CDATA[[P12] Understanding tabular foundation models: time series forecasting with TabPFN]]></title><description><![CDATA[This post continues my series on tabular foundation models.]]></description><link>https://newsletter.dsaiengineering.com/p/p12-understanding-tabular-foundation</link><guid isPermaLink="false">https://newsletter.dsaiengineering.com/p/p12-understanding-tabular-foundation</guid><dc:creator><![CDATA[Mohit Saharan]]></dc:creator><pubDate>Tue, 28 Apr 2026 13:20:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BiQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Introduction</h2><p>This post continues my series on tabular foundation models. So far, I have covered the basic vocabulary of tabular foundation models in <a href="https://www.linkedin.com/posts/msaharan_20260415-tabular-foundation-models-1pdf-activity-7450221503234621441-QYwS?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P3</a>, the posterior predictive distribution in <a href="https://www.linkedin.com/posts/msaharan_20260416-understanding-tfms-ppdpdf-activity-7450580114225938432-9UYN?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P4</a>, the architecture in <a href="https://www.linkedin.com/posts/msaharan_20260417-understanding-tfm-architecture-tabpfnpdf-activity-7450946343922999318-6Lw_?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P5</a>, pre-training in <a href="https://www.linkedin.com/posts/msaharan_20260420-understanding-tfms-pretraining-synthetic-datapdf-activity-7452030755720888320-INN6?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P6</a>, the TabPFN repository in <a href="https://www.linkedin.com/posts/msaharan_20260421-understanding-tfm-tabpfn-repopdf-activity-7452397229723623425-DVO3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P7</a>, the hands-on demo&#8217;s classification and regression examples in <a href="https://www.linkedin.com/posts/msaharan_20260422-understanding-tfms-tabpfn-handson-demopdf-activity-7452807834171387904-s5Ah?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P8</a>, TabPFN Client in <a href="https://www.linkedin.com/posts/msaharan_20260423-understanding-tfm-trying-tabpfn-clientpdf-activity-7453126821384073216-2bqA?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P9</a>, TabPFN embeddings in <a href="https://www.linkedin.com/posts/msaharan_tabpfn-tabularfoundationmodels-machinelearning-activity-7453455329779941376-ymp3?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAC8005UBr31urJ8gF7KXefP2-G8r_HNvI2g">P10</a>, and TabPFN&#8217;s predictive behavior in <a href="https://open.substack.com/pub/dsaiengineering/p/p11-understanding-tabular-foundation?utm_campaign=post-expanded-share&amp;utm_medium=web">P11</a>.</p><p>For a new reader, the minimum background is this: TabPFN is a pretrained tabular foundation model. Unlike XGBoost or Random Forest, its ordinary <code>.fit()</code> call does not update model weights to learn a new model from scratch for the current dataset. Instead, <code>.fit()</code> prepares the labelled rows as context, and TabPFN uses that context to predict new rows. This is why I have repeatedly described TabPFN as a context-conditioned predictor rather than just another sklearn-like estimator.</p><p>Today I cover the time series forecasting section of the official <a href="https://colab.research.google.com/github/PriorLabs/TabPFN/blob/main/examples/notebooks/TabPFN_Demo_Local.ipynb">TabPFN hands-on demo notebook</a>. You can find my local version of the notebook <a href="https://github.com/msaharan/dsaiengineering/blob/b1aa38c94d824bef493a6dbaa2eaeb38c04ed4ef/blog/20260428-understanding-tfm-time-series-forecasting-tabpfn.assets/tabpfn-hands-on-demo-msaharan-20260428.ipynb">here</a>.</p><p>In P3, I left time series forecasting as &#8220;Later.&#8221; This post fills that gap. The interesting question is: if TabPFN is a tabular foundation model, how can it forecast a sequence? At first, this is not obvious because ordinary tabular models usually treat rows as examples in a table, while time series forecasting depends on the order of observations, seasonality, and future horizons. The answer is not that TabPFN suddenly becomes an ARIMA model, a recurrent neural network, or a native time-series transformer. The main idea in TabPFN-TS is to convert forecasting into a tabular regression problem.</p><p>This is where TabPFN-TS comes in. The notebook cites the work of Hoo et al., whose current arXiv paper is listed as <a href="https://arxiv.org/abs/2501.02945">From Tables to Time: Extending TabPFN-v2 to Time Series Forecasting</a>. That work is the reference behind the TabPFN-TS workflow used in the notebook. The <a href="https://github.com/PriorLabs/tabpfn-time-series">TabPFN-TS repository</a> summarizes the workflow as:</p><ol><li><p>Transform a time series into a table.</p></li><li><p>Extract temporal features and add them to the table.</p></li><li><p>Perform regression on the table using TabPFNv2.</p></li><li><p>Use the regression output as the time series forecast.</p></li></ol><p>The post is organized as follows:</p><ol><li><p>Introduction: why time series forecasting belongs in this TabPFN series.</p></li><li><p>Conceptual background: the forecasting vocabulary and the tabular-regression formulation.</p></li><li><p>Hands-on demo: loading the Chronos data, adding features, predicting, and reading the forecast plot.</p></li><li><p>Summary and conclusion: what this example shows and how to evaluate forecasts in practice.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ol><h2>2. Conceptual Background</h2><p>Before going to the hands-on demo, I want to set up the concepts that make the example meaningful. This section does five things:</p><ol><li><p>It defines the basic vocabulary of time series forecasting.</p></li><li><p>It explains how a sequence can be represented as a supervised tabular regression problem.</p></li><li><p>It separates what is standard supervised ML from what is specific to TabPFN-TS.</p></li><li><p>It explains why temporal features matter.</p></li><li><p>It connects point forecasts and quantile forecasts back to the predictive-distribution language from earlier posts.</p></li></ol><h3>2.1 Working Vocabulary</h3><p>The key terms for this post are:</p><ul><li><p>Time series: observations indexed by time, such as monthly tourism demand, hourly electricity load, daily sales, or sensor readings.</p></li><li><p>Forecast horizon: the future window we want to predict. In the notebook, <code>prediction_length = 24</code>, so the model predicts 24 future monthly values.</p></li><li><p>History/context window: the observed part of the time series that is available before the forecast starts.</p></li><li><p>Point forecast: a single predicted value for each future timestamp.</p></li><li><p>Probabilistic forecast: a forecast that describes uncertainty, often through quantiles.</p></li><li><p>Quantile forecast: a prediction for a chosen quantile level, such as the 0.1 or 0.9 quantile.</p></li><li><p>Covariates/features: extra columns known at prediction time, such as calendar features, holidays, weather, promotions, or a running time index.</p></li><li><p>Zero-shot forecasting: applying a pretrained model to a new forecasting problem without training a task-specific forecasting model from scratch.</p></li></ul><p>In ordinary supervised regression, we usually have a table: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;X =\n\\begin{bmatrix}\nx_1^\\top \\\\\nx_2^\\top \\\\\n\\vdots \\\\\nx_n^\\top\n\\end{bmatrix},\n\\quad\ny =\n\\begin{bmatrix}\ny_1 \\\\\ny_2 \\\\\n\\vdots \\\\\ny_n\n\\end{bmatrix}.&quot;,&quot;id&quot;:&quot;SRDRTWDSUD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each row \(x_i\) contains features, and \(y_i\) is the target value. A regressor learns or uses a mapping from rows to targets.</p><p>Here, \(X\) is the feature matrix, \(y\) is the target vector, \(n\) is the number of rows, and \(x_i^\top\) means that the feature vector for row \(i\) is written as a row vector. The superscript \(\top\) denotes transpose.</p><p>In time series forecasting, the data initially looks different. For one item, we observe: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_1, y_2, \\ldots, y_T,&quot;,&quot;id&quot;:&quot;ZVAOKUFAFZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and want to predict: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{T+1}, y_{T+2}, \\ldots, y_{T+H}&quot;,&quot;id&quot;:&quot;NYMWRGRVBJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(T\) is the last observed time index, and \(H\) is the forecast horizon.</p><p>The key move in TabPFN-TS is to make the second problem look like the first problem.</p><h3>2.2 Forecasting as Tabular Regression</h3><p>Suppose we have multiple time series indexed by item \(i\). For item \(i\), let \(y_{i,t}\) be the observed value at time \(t\), and let \(T_i\) be the last observed time index available before forecasting starts. The forecasting task is to estimate future values: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_{i,T_i+h},\n\\quad h = 1, 2, \\ldots, H.&quot;,&quot;id&quot;:&quot;THKRNZZEUU&quot;}" data-component-name="LatexBlockToDOM"></div><p>In the single-series notation above, the last observed index was \(T\). With multiple series, I write this as \(T_i\) because each item \(i\) may have its own last observed timestamp. Here, \(H\) is the forecast horizon, and \(h\) is the number of steps ahead from the end of the observed history for item \(i\). To use a tabular model, we build a feature vector for each item-time pair: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;x_{i,t} = g(i, t, \\text{calendar}(t), \\text{seasonal}(t), \\text{known covariates}_{i,t}).&quot;,&quot;id&quot;:&quot;CMDSEKCVSK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(g(\cdot)\) is the feature-construction function. It turns time information and any known covariates into ordinary tabular columns. The training table contains rows where the target is known: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}_\\text{train}\n\n= \\{(x_{i,t}, y_{i,t}) : i \\in \\mathcal{I},\\ t \\leq T_i\\}.&quot;,&quot;id&quot;:&quot;PTAXDPGMVD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{I}\) is the set of item IDs included in the forecasting task. The future table contains rows where the target is unknown: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{D}_\\text{future}\n\n= \\{x_{i,T_i+h} : i \\in \\mathcal{I},\\ h = 1, 2, \\ldots, H\\}.&quot;,&quot;id&quot;:&quot;MRXPFASEKJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Now the forecasting problem has become a tabular regression problem: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{y}_{i,T_i+h}\n\n= f(x_{i,T_i+h}; \\mathcal{D}_\\text{train}).\n\n&quot;,&quot;id&quot;:&quot;SGOXKUMMPH&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(f\) is a prediction function, and \(\hat{y}_{i,T_i+h}\) is the predicted value for item \(i\) at forecast step \(h\).</p><p>For a classical supervised model, \(f\) would usually be a model fitted specifically to the current training table. For example, a supervised regressor might choose: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{f}\n\n= \\arg\\min_{f \\in \\mathcal{F}}\n\n\\sum_{(x,y)\\in \\mathcal{D}_\\text{train}}\n\n\\ell(y, f(x)).\n\n&quot;,&quot;id&quot;:&quot;QJVEJFQQJL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathcal{F}\) is the model class, \(\ell\) is the regression loss, and \(\hat{f}\) is the fitted task-specific model. Random Forest, XGBoost, LightGBM, and CatBoost all differ in how they define and optimize \(\mathcal{F}\), but in this workflow they are still learning a fresh model from the current transformed table.</p><p>For TabPFN, the meaning is different. TabPFN is already pretrained, and the current training rows become context. Conceptually, the prediction is closer to: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p(y_{i,T_i+h} | x_{i,T_i+h}, \\mathcal{D}_\\text{train}).&quot;,&quot;id&quot;:&quot;YUCJMEZYWM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(p(\cdot)\) denotes a predictive distribution over the future target value.</p><p>This is the same posterior-predictive-distribution viewpoint I discussed in P4 and reused in P11. The difference is that the row \(x_{i,T_i+h}\) now represents a future timestamp, not a generic tabular row.</p><p>This framing also explains why the time-series package can support point forecasts and probabilistic forecasts. If TabPFN produces a predictive distribution for the target at a future row, then the output can be summarized as a mean, median, or quantiles. This is a useful conceptual lens, not a guarantee that the output is perfectly calibrated for every dataset.</p><h3>2.3 Standard Supervised ML vs What Is New Here</h3><p>The conversion from a time series to a tabular regression problem is not unique to TabPFN. A practitioner could build the same kind of table and fit XGBoost, LightGBM, Random Forest, CatBoost, or a linear model on the generated rows. In that sense, the feature-engineering idea is a standard supervised-ML move.</p><p>What is new in the TabPFN-TS workflow is the model used after the transformation. Instead of training and tuning a new forecasting model from scratch, TabPFN-TS uses a pretrained tabular foundation model as the regression engine. The training rows act as context, the future rows act as queries, and the model returns point and quantile predictions through the time-series wrapper.</p><p>So the split is:</p><ul><li><p>Standard supervised ML part: turn timestamps into rows, create temporal features, define known-target training rows and unknown-target future rows.</p></li><li><p>TabPFN-specific part: use a pretrained, context-conditioned tabular model instead of fitting a task-specific model from scratch.</p></li><li><p>TabPFN-TS convenience: return both point forecasts and quantile forecasts through one forecasting interface.</p></li></ul><p>There is an important caveat. TabPFN-TS relies on temporal featurization; TabPFN is not modeling sequence order natively in the same way as a dedicated sequence model. The sequence structure becomes available to the model through columns such as running index, calendar features, seasonal features, and known covariates.</p><h3>2.4 Why Temporal Features Matter</h3><p>If we only create rows without useful time-derived features, a tabular model has no direct way to know that January 1980 and January 1981 are related, or that December and January are adjacent months. This is why temporal feature engineering is central to the TabPFN-TS workflow.</p><p>The notebook uses three feature groups: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">selected_features = [
    RunningIndexFeature(),
    CalendarFeature(),
    AutoSeasonalFeature(),
]</code></pre></div><p>The running index gives each timestamp an ordered numeric position within each item. If item \(i\) has \(n_i\) observed rows, the running index over the observed history is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;0, 1, 2, \\ldots, n_i - 1.&quot;,&quot;id&quot;:&quot;UIBMAQRVFQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This \(n_i\) counts rows in the observed history; it is separate from \(T_i\), which denotes the last observed time index used in the forecasting equations. The running index helps the model see trend-like behavior. Calendar features encode timestamp information such as year, month, day of week, and similar components. Seasonal features encode repeated patterns. A standard way to encode cyclic seasonality is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sin \\left(\\frac{2\\pi t}{P}\\right),\n\\quad\n\\cos \\left(\\frac{2\\pi t}{P}\\right),&quot;,&quot;id&quot;:&quot;UVPKZZEWEC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where \(P\) is the period. For monthly data with annual seasonality, \(P=12\). Using both sine and cosine is useful because it represents the cycle on a circle. This avoids treating the end of a period and the beginning of the next period as far apart.</p><p>In the notebook output, the transformed table contains columns such as: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">target, running_index, year, second_of_minute_sin, second_of_minute_cos, ...,
sin_#0, cos_#0, sin_#1, cos_#1, sin_#2, cos_#2</code></pre></div><p>The target column is known in the training rows and missing in the future rows. The time-derived features are known for both training and future rows. That is exactly what forecasting needs: at prediction time, we do not know the future target, but we do know the future timestamps.</p><h3>2.5 Point Forecasts, Quantiles, and Coverage</h3><p>For item \(i\) and forecast step \(h\), the future row is \(x_{i,T_i+h}\), and the random future target is \(Y_{i,T_i+h}\). A point forecast gives one value: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{y}_{i,T_i+h}.&quot;,&quot;id&quot;:&quot;XLSSBDOEIK&quot;}" data-component-name="LatexBlockToDOM"></div><p>A probabilistic forecast gives more information. The conditional cumulative distribution function is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;F_{i,h}(y)\n\n= \\mathbb{P}(Y_{i,T_i+h} \\leq y | x_{i,T_i+h}, \\mathcal{D}_\\text{train}).&quot;,&quot;id&quot;:&quot;MYMDPMTCWE&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathbb{P}\) denotes probability, and \(F_{i,h}(y)\) is the probability that the future target is less than or equal to the candidate value \(y\), given the future row and the training context.</p><p>The \(\alpha\)-quantile is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q_\\alpha(i,h)\n\n= \\inf\\{y : F_{i,h}(y) \\geq \\alpha\\}.&quot;,&quot;id&quot;:&quot;FIKRAFQOQQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\alpha\) is a quantile level between 0 and 1, and \(\inf\) means the infimum: the smallest value, or limiting lower bound, where the cumulative probability reaches at least \(\alpha\).</p><p>For example, the interval: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;[Q_{0.1}(i,h), Q_{0.9}(i,h)]&quot;,&quot;id&quot;:&quot;QPERBLFFMH&quot;}" data-component-name="LatexBlockToDOM"></div><p>is an 80% central prediction interval. In the demo, TabPFN-TS returns the point forecast and quantile columns from <code>0.1</code> to <code>0.9</code>. </p><p>As in yesterday&#8217;s post, quantile intervals should not be treated as automatic guarantees. They need to be checked on held-out data. Let the held-out future points be indexed by \((i_j,h_j)\) for \(j=1,\ldots,m\), where \(m\) is the number of held-out item-horizon pairs being evaluated. The empirical 80% coverage is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{1}{m}\\sum_{j=1}^{m}\n\n\\mathbf{1}\\{y_{i_j,T_{i_j}+h_j} \\in [Q_{0.1}(i_j,h_j), Q_{0.9}(i_j,h_j)]\\}.&quot;,&quot;id&quot;:&quot;YFBXICLHCK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, \(\mathbf{1}\{\cdot\}\) is the indicator function: it equals 1 when the condition is true and 0 otherwise.</p><p>For context, when quantile models are trained directly, a common loss is the pinball loss. For quantile level \(\alpha\), true value \(y\), and quantile prediction \(q\), it is: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_\\alpha(y,q)\n\n= (\\alpha - \\mathbf{1}\\{y < q\\})(y-q).&quot;,&quot;id&quot;:&quot;OYVHMXJXNY&quot;}" data-component-name="LatexBlockToDOM"></div><p>This loss penalizes under-prediction and over-prediction asymmetrically, which is exactly what is needed for quantile estimation.</p><p>The coverage calculation answers a different question from the pinball loss. If the empirical coverage value is close to 0.8, the interval is roughly calibrated on that held-out sample. If it is much lower, the forecast intervals are overconfident. If it is much higher, the intervals may be too wide to be useful.</p><h2>3. Hands-on Demo</h2><p>The conceptual background gave us the main objects: a time-indexed sequence, a transformed tabular representation, a future table with unknown targets, and point/quantile forecasts. Now I use the notebook to walk through the time-series example.</p><p>The mental model for the demo is:</p><ul><li><p>Training rows: past timestamps with known target values.</p></li><li><p>Future rows: future timestamps with <code>target = NaN</code>.</p></li><li><p>Features: running index, calendar features, seasonal features, and any known covariate columns.</p></li><li><p>Output: point forecast plus quantile columns for each future row.</p></li></ul><p>The full notebook contains the setup code and imports. Below, I show the parts that matter for understanding the workflow.</p><h3>3.1 Loading the Time Series Data</h3><p>The demo uses a dataset from the <a href="https://huggingface.co/datasets/autogluon/chronos_datasets">Chronos datasets collection</a> on Hugging Face. To keep the example small, it uses only two time series from <code>monash_tourism_monthly</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">dataset_metadata = {
    "monash_tourism_monthly": {"prediction_length": 24},
    "m4_hourly": {"prediction_length": 48},
}

dataset_choice = "monash_tourism_monthly"
num_time_series_subset = 2</code></pre></div><p>The notebook then loads the dataset, converts it into a <code>TimeSeriesDataFrame</code>, keeps only two item IDs, and creates a train/test split. The last 24 months are held out as the future window.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from datasets import load_dataset
from tabpfn_time_series import TimeSeriesDataFrame
from tabpfn_time_series.data_preparation import generate_test_X, to_gluonts_univariate

prediction_length = dataset_metadata[dataset_choice]["prediction_length"]
dataset = load_dataset("autogluon/chronos_datasets", dataset_choice)

tsdf = TimeSeriesDataFrame(to_gluonts_univariate(dataset["train"]))
tsdf = tsdf[
    tsdf.index.get_level_values("item_id").isin(tsdf.item_ids[:num_time_series_subset])
]

train_tsdf, test_tsdf_ground_truth = tsdf.train_test_split(
    prediction_length=prediction_length
)
test_tsdf = generate_test_X(train_tsdf, prediction_length)</code></pre></div><p>The first important object is <code>train_tsdf</code>: the observed history. The second is <code>test_tsdf_ground_truth</code>: the future values that we hide from the model but keep for evaluation. The third is <code>test_tsdf</code>: the future table that contains the timestamps where predictions are needed. The function <code>generate_test_X</code> creates those future timestamp rows for the forecast horizon, with unknown targets.</p><p>The following plot shows the two tourism series and the train/test split.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-LSs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-LSs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png" width="990" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110516,&quot;alt&quot;:&quot;Time series train/test split.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195744234?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time series train/test split." title="Time series train/test split." srcset="https://substackcdn.com/image/fetch/$s_!-LSs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-LSs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979538ef-ed18-4269-8805-9fb49d8ceb46_990x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Time series train/test split.</figcaption></figure></div><p>Both series show strong yearly seasonality. The vertical dashed red line marks the point where the training history ends and the held-out future window starts. Since the forecast horizon is 24 months, the model is asked to forecast two full seasonal cycles.</p><h3>3.2 Adding Time Features</h3><p>The next step is the most important conceptual step in the demo. The raw time series is transformed into a tabular regression problem by adding features.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series import FeatureTransformer
from tabpfn_time_series.features import (
    AutoSeasonalFeature,
    CalendarFeature,
    RunningIndexFeature,
)

selected_features = [
    RunningIndexFeature(),
    CalendarFeature(),
    AutoSeasonalFeature(),
]

feature_transformer = FeatureTransformer(selected_features)

train_tsdf, test_tsdf = feature_transformer.transform(train_tsdf, test_tsdf)</code></pre></div><p>After this transformation, the training table has a known <code>target</code> column and many feature columns. The future table has the same feature columns, but the <code>target</code> column is missing:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">item_id  timestamp    target     running_index    year    ...    sin_#0    cos_#0
0        1979-01-31   1149.8700  0                1979    ...    0.0000    1.0000
0        1979-02-28   1053.8002  1                1979    ...    0.5000    0.8660
...
0        1992-08-31   NaN        163              1992    ...   -0.5000   -0.8660</code></pre></div><p>This is the point where forecasting becomes tabular. The rows with known targets form the context. The rows with unknown targets form the query set.</p><h3>3.3 Predicting with TabPFN-TS</h3><p>In my run, I used <code>local</code> mode, which runs TabPFN on my local GPU, instead of <code>client</code> mode, which uses GPUs hosted in Prior Labs&#8217; cloud:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series import TabPFNMode, TabPFNTimeSeriesPredictor

predictor = TabPFNTimeSeriesPredictor(
    tabpfn_mode=TabPFNMode.LOCAL,
)

pred = predictor.predict(train_tsdf, test_tsdf)</code></pre></div><p>The output <code>pred</code> is again indexed by <code>item_id</code> and <code>timestamp</code>. It contains a point forecast in the <code>target</code> column and quantile forecasts in columns such as <code>0.1</code>, <code>0.2</code>, ..., <code>0.9</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">                         target          0.1          0.2  ...          0.8          0.9
item_id timestamp
0       1992-08-31  6632.519531  6147.241211  6321.268066  ...  6938.606445  7118.754395
        1992-09-30  4159.460938  3881.989502  3977.088379  ...  4355.097656  4479.076172
        1992-10-31  3012.987549  2780.682861  2859.992432  ...  3172.838623  3264.242920</code></pre></div><p>This output format is useful because it gives both a central forecast and uncertainty bands without first setting up a separate conformal wrapper or separately trained quantile model.</p><h3>3.4 Visualizing the Forecast</h3><p>The notebook visualizes the history, the held-out future values, the TabPFN-TS point forecast, and the 0.1 to 0.9 quantile band.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from tabpfn_time_series.plot import plot_pred_and_actual_ts

plot_pred_and_actual_ts(
    train=train_tsdf,
    test=test_tsdf_ground_truth,
    pred=pred,
)</code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BiQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BiQs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png" width="990" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:990,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:142924,&quot;alt&quot;:&quot;Time series forecast.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.dsaiengineering.com/i/195744234?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time series forecast." title="Time series forecast." srcset="https://substackcdn.com/image/fetch/$s_!BiQs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 424w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 848w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1272w, https://substackcdn.com/image/fetch/$s_!BiQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81c4b905-3faa-4953-9c6e-174e4d3f7ba8_990x590.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Time series forecast.</figcaption></figure></div><p>The blue curve is the observed history. The purple curve is the held-out future, which is available only because this is a demo. The red curve is the TabPFN-TS forecast. The shaded red region is the 0.1 to 0.9 quantile interval.</p><p>The forecast captures the most obvious structure in both series: strong annual seasonality, sharp yearly peaks, and a recurring drop after the peak. This is exactly where the feature transformation matters. The model is not seeing a raw sequence alone; it is seeing a tabular representation that exposes time position and seasonal phase.</p><p>The forecast is not perfect. For example, the sharpness and height of some future peaks are difficult to match exactly. That is expected because monthly tourism demand is not deterministic. The useful question is not whether every point lands exactly on the future curve. The useful question is whether the model captures the seasonal structure, gives sensible point forecasts, and expresses uncertainty that is reasonable for the held-out window.</p><h3>3.5 Evaluating Forecasts in Practice</h3><p>The demo is useful as a first look, but a real forecasting workflow would need numerical evaluation. At minimum, a practitioner should compute point forecast errors and quantile coverage on the held-out window.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import numpy as np

eval_df = test_tsdf_ground_truth[["target"]].rename(
    columns={"target": "actual"}
).join(
    pred.rename(columns={"target": "forecast"})
)

q10_col = 0.1 if 0.1 in eval_df.columns else "0.1"
q90_col = 0.9 if 0.9 in eval_df.columns else "0.9"

mae = (eval_df["actual"] - eval_df["forecast"]).abs().mean()
rmse = np.sqrt(((eval_df["actual"] - eval_df["forecast"]) ** 2).mean())
coverage_80 = (
    (eval_df["actual"] &gt;= eval_df[q10_col])
    &amp; (eval_df["actual"] &lt;= eval_df[q90_col])
).mean()

print(f"MAE: {mae:.3f}")
print(f"RMSE: {rmse:.3f}")
print(f"80% interval coverage: {coverage_80:.3f}")</code></pre></div><p>For a more complete evaluation, it is also useful to break the errors down by item ID and forecast horizon. In forecasting, average error can hide important behavior. A model may be good for short horizons but weak for longer horizons, or good for one item but poor for another.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">eval_df = eval_df.reset_index()
eval_df["horizon"] = eval_df.groupby("item_id").cumcount() + 1
eval_df["absolute_error"] = (eval_df["actual"] - eval_df["forecast"]).abs()

display(
    eval_df.groupby("horizon")["absolute_error"]
    .mean()
    .rename("MAE by horizon")
)</code></pre></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.dsaiengineering.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DSAIEngineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>4. Summary and Conclusion</h2><p>In this post, I used the time series forecasting section of the official TabPFN hands-on demo to understand how TabPFN can be applied outside ordinary static tabular prediction.</p><p>The conceptual section made the key step explicit: TabPFN-TS frames univariate time series forecasting as tabular regression. This transformation is not unique to TabPFN; supervised ML models such as XGBoost, LightGBM, Random Forest, and CatBoost can also use time-derived tabular features. What changes with TabPFN-TS is the regression engine: a pretrained, context-conditioned tabular foundation model is used instead of fitting a new task-specific model from scratch.</p><p>Operationally, the observed history becomes rows with known targets. Future timestamps become rows with missing targets. Running-index, calendar, and seasonal features give the tabular model information about trend, time position, and cyclic structure.</p><p>The hands-on demo then showed this idea in code. We loaded two monthly tourism time series from the Chronos datasets collection, held out the last 24 months, added temporal features, predicted with <code>TabPFNTimeSeriesPredictor</code>, and visualized both point forecasts and 0.1 to 0.9 quantile intervals.</p><p>The main takeaway is that TabPFN is not being used as a native sequence model here. The bridge is feature engineering. Once time is represented as tabular features, TabPFNv2 can be used as a zero-shot tabular regressor for future timestamps.</p><p>This makes the workflow conceptually simple and practically interesting. It also creates clear evaluation questions: how accurate are the point forecasts, how well calibrated are the quantile intervals, and how does the model behave across different horizons, seasonalities, and item IDs?</p><p>With this post, I have covered another remaining section of the official TabPFN hands-on demo. In the upcoming posts, I will continue exploring the parts of the TabPFN ecosystem that can translate into useful workflows for real tabular and time-dependent data problems. As I continue this series, I welcome feedback and requests from readers: what did you find most useful in this post, and which aspects of tabular foundation models should I explore next?</p>]]></content:encoded></item></channel></rss>