<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reading on Saurav Panigrahi</title><link>https://sauravpanigrahi.com/reading/</link><description>Recent content in Reading on Saurav Panigrahi</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 01 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://sauravpanigrahi.com/reading/feed.xml" rel="self" type="application/rss+xml"/><item><title>Emergent Misalignment</title><link>https://sauravpanigrahi.com/reading/emergent-misalignment/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/emergent-misalignment/</guid><description>&lt;p&gt;Selected references on emergent misalignment and broad behavioral shifts from narrow training signals.&lt;/p&gt;
&lt;h2 id="core"&gt;Core&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.emergent-misalignment.com/"&gt;Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs&lt;/a&gt;&lt;br&gt;
Introduces the central phenomenon: finetuning on a narrow harmful behavior can produce broader misaligned behavior outside the training domain.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.lesswrong.com/posts/gLDSqQm8pwNiq7qst/narrow-misalignment-is-hard-emergent-misalignment-is-easy"&gt;Narrow Misalignment is Hard, Emergent Misalignment is Easy&lt;/a&gt;&lt;br&gt;
Useful for thinking about why a broad misalignment direction may be a more stable and efficient solution than a narrow one.&lt;/p&gt;</description></item><item><title>Evaluation</title><link>https://sauravpanigrahi.com/reading/evaluation/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/evaluation/</guid><description>&lt;p&gt;Long-form references on benchmarks, measurement, and what evaluations actually test.&lt;/p&gt;
&lt;h2 id="biology-and-scientific-evaluation"&gt;Biology And Scientific Evaluation&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2407.10362"&gt;LAB-Bench&lt;/a&gt;&lt;br&gt;
Benchmark for language models doing biology research tasks. Useful because it evaluates research-relevant behavior rather than only static factual recall.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://fomo26.github.io/"&gt;FOMO26&lt;/a&gt;&lt;br&gt;
Foundation model challenge for brain MRI, useful as a clinical-domain evaluation reference.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="methodology"&gt;Methodology&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ogb.stanford.edu/"&gt;Open Graph Benchmark&lt;/a&gt;&lt;br&gt;
Standardized graph ML benchmark suite with datasets, loaders, and evaluators. Useful as a reference point for what benchmark infrastructure can look like.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="robotics-and-sim-to-real"&gt;Robotics And Sim-To-Real&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2504.13059"&gt;RoboTwin&lt;/a&gt;&lt;br&gt;
Dual-arm robot benchmark using generative digital twins for scalable task and data generation.&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>ML Systems</title><link>https://sauravpanigrahi.com/reading/ml-systems/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/ml-systems/</guid><description>&lt;p&gt;Long-form references on training, infrastructure, and implementation practice.&lt;/p&gt;
&lt;h2 id="training-systems"&gt;Training Systems&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://djdumpling.github.io/2026/01/31/frontier_training.html"&gt;Frontier Model Training Methodologies&lt;/a&gt;&lt;br&gt;
Survey of open frontier training recipes and implementation choices.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://jax-ml.github.io/scaling-book/"&gt;Scaling LLMs with JAX&lt;/a&gt;&lt;br&gt;
Book-length treatment of distributed training practice.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2603.03276"&gt;Beyond Language Modeling: An Exploration of Multimodal Pretraining&lt;/a&gt;&lt;br&gt;
From-scratch multimodal pretraining study with useful details on representation choices and scaling behavior.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="embeddings-and-retrieval"&gt;Embeddings And Retrieval&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.jxmo.io/p/how-to-train-the-best-embedding-model"&gt;How to Train the Best Embedding Model in the World&lt;/a&gt;&lt;br&gt;
Detailed engineering writeup on embedding model training, label noise, verification, and dataset scale.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gpu-programming"&gt;GPU Programming&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tgautam03.github.io/"&gt;CUDA Writeups by Tushar Gautam&lt;/a&gt;&lt;br&gt;
Implementation-forward notes on CUDA kernels and optimization.&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Programmable Biology</title><link>https://sauravpanigrahi.com/reading/programmable-biology/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/programmable-biology/</guid><description>&lt;p&gt;Long-form references on biological foundation models, structure prediction, and sequence modeling.&lt;/p&gt;
&lt;h2 id="genomic-foundation-models"&gt;Genomic Foundation Models&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.biorxiv.org/content/10.1101/2025.02.18.638918v1"&gt;Evo 2&lt;/a&gt;&lt;br&gt;
Long-context genomic foundation model for sequence modeling and design.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/abs/2306.15794"&gt;HyenaDNA&lt;/a&gt;&lt;br&gt;
Long-context sequence models at nucleotide resolution.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="structure-prediction"&gt;Structure Prediction&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.nature.com/articles/s41586-021-03819-2"&gt;AlphaFold&lt;/a&gt;&lt;br&gt;
Foundational protein structure prediction paper.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.nature.com/articles/s41586-024-07487-w"&gt;AlphaFold 3&lt;/a&gt;&lt;br&gt;
Extends structure prediction toward biomolecular complexes and interactions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Research Engineering</title><link>https://sauravpanigrahi.com/reading/research-engineering/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/research-engineering/</guid><description>&lt;p&gt;Selected references on research taste, engineering judgment, and doing useful technical work.&lt;/p&gt;
&lt;h2 id="research-taste"&gt;Research Taste&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.cs.virginia.edu/~robins/YouAndYourResearch.html"&gt;You and Your Research&lt;/a&gt;&lt;br&gt;
Hamming&amp;rsquo;s classic essay on choosing important problems and organizing a life around serious work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="http://joschu.net/blog/opinionated-guide-ml-research.html"&gt;An Opinionated Guide to ML Research&lt;/a&gt;&lt;br&gt;
Practical advice on developing taste and becoming effective in machine learning research.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://michaelnielsen.org/blog/principles-of-effective-research/"&gt;Principles of Effective Research&lt;/a&gt;&lt;br&gt;
A useful frame for research as a skill that can be deliberately improved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://patrickcollison.com/fast"&gt;Fast&lt;/a&gt;&lt;br&gt;
Examples of ambitious work happening faster than conventional expectations.&lt;/p&gt;</description></item><item><title>Tool Use And Agents</title><link>https://sauravpanigrahi.com/reading/tool-use-and-agents/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://sauravpanigrahi.com/reading/tool-use-and-agents/</guid><description>&lt;p&gt;Long-form references on tool use, agent environments, and reliability loops.&lt;/p&gt;
&lt;h2 id="agent-environments"&gt;Agent Environments&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://openai.com/index/harness-engineering/"&gt;Harness Engineering&lt;/a&gt;&lt;br&gt;
Useful framing around agents as systems shaped by environments, specs, feedback, and reliability loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://blog.cloudflare.com/code-mode/"&gt;Code Mode&lt;/a&gt;&lt;br&gt;
A concrete argument for exposing tools through code interfaces rather than forcing every step through chat-level tool calls.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://mksg.lu/blog/context-mode"&gt;Context Mode&lt;/a&gt;&lt;br&gt;
A useful pattern for keeping agent context manageable when tools produce large or noisy outputs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="training-time-semantics"&gt;Training-Time Semantics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2603.01209"&gt;Agents Learn Their Runtime&lt;/a&gt;&lt;br&gt;
Study of persistent versus reset Python interpreters in CodeAct-style training.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="engineering-practice"&gt;Engineering Practice&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kanyilmaz.me/2026/02/25/1000x-engineer.html"&gt;AI Gave Birth to the 100x Engineer&lt;/a&gt;&lt;br&gt;
Long case study on compounding agent workflows with test harnesses and supporting tools.&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>