What it optimizes for
The queue favors runs that may add new information. A run is more useful when it shows behavior not already covered by nearby annotated examples, exposes a partial gap in the agent’s capability, or contains a failure that is hard to judge from the final output alone. The algorithm also reduces repeat work. Runs that look similar to already reviewed successes should not keep coming back unless they reveal a new pattern.Priority labels
Sovara groups surfaced runs by why they deserve attention. The UI shows five labels:- Repeated failure: the run matches a failure pattern already seen in related traces.
- Failure risk: the run is likely wrong, but it is not yet established as a repeated failure pattern.
- Novel behavior: the run is meaningfully different from reviewed examples.
- Partially covered: related examples exist, but the run still tests a gap.
- Covered: successful references already cover the behavior, so review is optional and lower priority.
Reviewer guidance
When Sovara surfaces a run, it includes a short explanation and links to the trace steps worth checking first. Start there, then open the full run when the case needs more context.
