A technical guide to TabICLv2 repeated feature grouping: why similar columns confuse encoders, how circular shifts add context, with NanoTabICL implementation.
thank you! so there is one linear layer that is used to embed all groups at once? why there is in select batch and rows? batch is not used to select rows?
Hi, Piotr. Thanks for asking. Yes, there is one shared linear layer, and it is applied with the same weights to every 3-value group across all batches, rows, and column positions. `batch` is not part of the feature grouping logic. It is just the number of independent tables/tasks processed in parallel in one forward pass. In `x[:, :, (idxs + offset) % n_cols]`, `batch` and `rows` are preserved because repeated feature grouping happens only along the column/feature axis. PyTorch applies `nn.Linear` to the last dimension only. So if the grouped tensor has shape `(batch, rows, cols, 3)`, then `self.x_embed(x)` applies the same `Linear(3 -> embed_dim)` independently to every `(batch, row, col)` group, producing: `(batch, rows, cols, embed_dim)`.
thank you! so there is one linear layer that is used to embed all groups at once? why there is in select batch and rows? batch is not used to select rows?
Hi, Piotr. Thanks for asking. Yes, there is one shared linear layer, and it is applied with the same weights to every 3-value group across all batches, rows, and column positions. `batch` is not part of the feature grouping logic. It is just the number of independent tables/tasks processed in parallel in one forward pass. In `x[:, :, (idxs + offset) % n_cols]`, `batch` and `rows` are preserved because repeated feature grouping happens only along the column/feature axis. PyTorch applies `nn.Linear` to the last dimension only. So if the grouped tensor has shape `(batch, rows, cols, 3)`, then `self.x_embed(x)` applies the same `Linear(3 -> embed_dim)` independently to every `(batch, row, col)` group, producing: `(batch, rows, cols, embed_dim)`.