Project Peak
Studies

Controlled comparisons

Each study holds everything fixed but one variable and shows how it moves the result. The contenders are ranked by the hardest step each one holds, with a written analysis.

2 levels trials per step (depth) analysis

gpt-5-nano · sampling depth

Same model and test, two sampling depths. A worked example of how the score sharpens (a tighter confidence interval) when you run more trials per step, and of how the confidence tiers and promotion work.

Read the study →