PII leak scenario focused on protected disclosure, safe handling, and policy-grounded response behavior.
omnibench_aegis_env
A compact OpenEnv evaluation server for multi-domain agent workflows: research, computer use, finance, multi-agent evaluation, tau2, game, business process, agent safety, cybersecurity, and coding security scenarios.
Supply-chain security scenario for identifying unsafe package or static artifact behavior.
Devcontainer supply-chain scenario for safe coding-agent environment and dependency behavior.
Grounded inventory inspection, analysis, quarantine, and safe fact extraction.
Safe link scanning, grounded navigation, and destination verification.
Tax calculation with careful unit normalization and canonical answer submission.
Roster building, matchup simulation, scoring, and equilibrium assessment.
Task bundle loading, user simulation, conversation execution, and bundle scoring.
Objective inspection, zone scanning, route navigation, threat engagement, and cleanup verification.
Privacy-safe CRM routing, schema checks, context filtering, and policy application.
The presentation layer only changes this landing page. The machine-facing endpoints remain available for evaluators and scripts: /health, /contract, /reset, /step, /state, /actions, /docs, and /openapi.json.