Adaptive Sampling Networks

Fri, 01 May 2026 00:00:00 +0000

Co-authored with Navneel Singhal.

Adaptive Sampling Networks began from a simple but, in my view, underexamined question: should decoding in language models be treated as a fixed heuristic, or as a learned operator over the model’s own uncertainty?

Most current decoding schemes choose a rule such as temperature scaling, top-k, top-p, typical sampling, min-p, epsilon, or eta sampling, and then apply that rule uniformly across all contexts. This is operationally convenient, but it is also structurally rigid. A next-token distribution with low entropy and a clear mode does not present the same decision problem as a flatter or more ambiguous distribution, yet standard decoding exposes both to the same global hyperparameters.

Sampling on Saurav Panigrahi

Adaptive Sampling Networks