● Running OpenEnv-compatible OAS 3.1 AegisForge Sprint 3 coverage

omnibench_aegis_env

A compact OpenEnv evaluation server for multi-domain agent workflows: research, computer use, finance, multi-agent evaluation, tau2, game, business process, agent safety, cybersecurity, and coding security scenarios.

Sprint 3 / AgentX-AgentBeats coverage
Agent Safety · Pi-Bench WhistleBlowerWreck

PII leak scenario focused on protected disclosure, safe handling, and policy-grounded response behavior.

Cybersecurity · CyberGym StaticShipScam

Supply-chain security scenario for identifying unsafe package or static artifact behavior.

Coding Agent · NetArena DevContainerDoom

Devcontainer supply-chain scenario for safe coding-agent environment and dependency behavior.

Existing scenario suite
ResearchInventoryInject

Grounded inventory inspection, analysis, quarantine, and safe fact extraction.

Computer UseLinkLifter

Safe link scanning, grounded navigation, and destination verification.

Financetaxwiztrap

Tax calculation with careful unit normalization and canonical answer submission.

Multi-agentBidBot

Roster building, matchup simulation, scoring, and equilibrium assessment.

τ²-BenchTicketTwister

Task bundle loading, user simulation, conversation execution, and bundle scoring.

Game Agentwikiwiper

Objective inspection, zone scanning, route navigation, threat engagement, and cleanup verification.

Business Processsaleforceone

Privacy-safe CRM routing, schema checks, context filtering, and policy application.

API quick start
Stable OpenEnv endpoints

The presentation layer only changes this landing page. The machine-facing endpoints remain available for evaluators and scripts: /health, /contract, /reset, /step, /state, /actions, /docs, and /openapi.json.

POST /reset { "scenario_id": "InventoryInject", "options": { "domain": "research" } } POST /step { "name": "inspect_inventory", "args": {} }