Writings

On AI alignment

A short structural read of alignment failure modes (~600 words)

Sycophancy, deceptive alignment, reward hacking, and harmful compliance as four positions of one underlying collapse. Written for the LessWrong and alignment-forum audiences, with a falsifiable prediction.

Read here →

How Will a Sufficiently Powerful AI Decide Not to Harm Us? (~8,000 words)

The long-form exposition. The structural claim, the four failure modes, why a captured system cannot check its own capture, why the designer must see the feature in themselves before they can build it, and the main prediction with two weaker falsifiable alternatives.

Read here →

Probe experiment brief: testing the reconciling-capacity hypothesis (~1,800 words)

The runnable specification for the first empirical test. Models, benchmarks, phases, and success and failure criteria. Written for an ML graduate student or interpretability researcher who might run it.

Read here →

On human flourishing

The Fusion Dynamics Framework (full book, PDF)

The complete exposition of the framework as applied to human flourishing, developed over 25 years. Thirteen chapters and eight appendices on the structural model, the six generative configurations, the six collapsed ones, the three negative currents, and the practices for growing the reconciling capacity under ordinary life conditions.

Download the PDF →

Why Humans Fail to Flourish: A Structural Read (~800 words)

The companion to the alignment short version. Names the structural absence Fusion Dynamics sees as the common cause underneath anxiety, perfectionism, people-pleasing, and avoidance, and the practices that grow the reconciling capacity in the system that has to carry them.

Read here →