Selected references on emergent misalignment and broad behavioral shifts from narrow training signals.

Core

Why I Keep Returning To This

The important question is not just whether a model can be made misaligned.

The deeper question is what models are predisposed to learn when we apply narrow optimization pressure.

That question connects finetuning, tool use, evaluation, and deployment safety.