S

Agent Quality / Evals Engineer 1754

Softgic S.

workfromhome, workfromhome, Colombia Full-time June 29, 2026

Found Description

Owns the eval harness and quality gate from the beginning. This role replaces the old late‑stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

  • Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
  • Wire evals into CI so quality regressions fail builds and releases.
  • Define and maintain release‑gate thresholds with Product and the Tech Lead.
  • Lay the path for later adversarial and drift‑testing expansion without overbuilding MVP scope.

Requirements

  • Experience evaluating ML, LLM, or non‑deterministic systems.
  • Strong test and benchmark design capability.
  • Comfort working with noisy metrics, thresholds, and probabilistic behavior.
  • Good scripting and automation skills.
  • Uses AI to generate candidate eval cases and failure hypotheses, but never confuses gene...

Ready to Apply?

Submit your application for Agent Quality / Evals Engineer 1754 at Softgic S.

Apply Now