Skip to main content
Sovara’s recommendation algorithm keeps annotation focused. It does not try to replace the reviewer. It tries to decide which runs are worth a reviewer’s time.

What it optimizes for

The queue favors runs that may add new information. A run is more useful when it shows behavior not already covered by nearby annotated examples, exposes a partial gap in the agent’s capability, or contains a failure that is hard to judge from the final output alone. The algorithm also reduces repeat work. Runs that look similar to already reviewed successes should not keep coming back unless they reveal a new pattern.

Priority labels

Sovara groups surfaced runs by why they deserve attention. The UI shows five labels:
  • Repeated failure: the run matches a failure pattern already seen in related traces.
  • Failure risk: the run is likely wrong, but it is not yet established as a repeated failure pattern.
  • Novel behavior: the run is meaningfully different from reviewed examples.
  • Partially covered: related examples exist, but the run still tests a gap.
  • Covered: successful references already cover the behavior, so review is optional and lower priority.
The label is a starting point, not a verdict. The reviewer still decides whether the run should be marked as success or failure.

Reviewer guidance

When Sovara surfaces a run, it includes a short explanation and links to the trace steps worth checking first. Start there, then open the full run when the case needs more context. Inspect run button in the annotation queue Inspect run button in the annotation queue Click Inspect run, then open SovaraChat and start with:
What should I look at first?
SovaraChat can point you to the final answer, retrieved evidence, related failed behavior, and the steps cited by the recommendation. Use that as the starting point for the label and ground truth. Good annotations make future recommendations better. They tell Sovara which behaviors are already covered, which failures matter, and which domain lessons should become priors.