<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Sampling on Saurav Panigrahi</title><link>https://sauravpanigrahi.com/tags/sampling/</link><description>Recent content in Sampling on Saurav Panigrahi</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 01 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://sauravpanigrahi.com/tags/sampling/feed.xml" rel="self" type="application/rss+xml"/><item><title>Adaptive Sampling Networks</title><link>https://sauravpanigrahi.com/work/adaptive-sampling-networks/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/work/adaptive-sampling-networks/</guid><description>&lt;p&gt;Co-authored with Navneel Singhal.&lt;/p&gt;
&lt;p&gt;Adaptive Sampling Networks began from a simple but, in my view, underexamined question: should decoding in language models be treated as a fixed heuristic, or as a learned operator over the model&amp;rsquo;s own uncertainty?&lt;/p&gt;
&lt;p&gt;Most current decoding schemes choose a rule such as temperature scaling, top-k, top-p, typical sampling, min-p, epsilon, or eta sampling, and then apply that rule uniformly across all contexts. This is operationally convenient, but it is also structurally rigid. A next-token distribution with low entropy and a clear mode does not present the same decision problem as a flatter or more ambiguous distribution, yet standard decoding exposes both to the same global hyperparameters.&lt;/p&gt;</description></item></channel></rss>