Experiments

The experiments control plane now replays sampled native traces against a candidate prompt version, compares output and latency, and auto-promotes on pass.

Now: sampled replay on native traces is live

Judging: automatic regression checks use output similarity and latency thresholds

Release: passing candidates auto-promote, failing ones are blocked