Agent Skill EvalsTest agent skills with Promptfoo.

Check the skill, run the agent in an isolated World, and prove the result with evidence.

Scaffold with one command

`agent-skill-evals init --skill ./skills/my-skill --adapter claude-code` creates the minimal Promptfoo config and a clean starter Test Pack.

Check the skill first

Find unclear activation text, missing files, and broken tests before you run an agent.

Prove it with evidence

Check file outcomes and allowed change scope, tool calls, skill loading, output, turns, and token usage — not the agent's own summary.

Promptfoo is the test runner

Promptfoo is an open-source eval framework. Agent Skill Evals plugs into normal Promptfoo configs, so you keep running promptfoo eval and add skill-specific checks. Use the Promptfoo docs for Promptfoo's own config reference.

How It Works

Use agent-skill-evals init to scaffold the Promptfoo wiring and a starter Test Pack. Run agent-skill-evals check for cheap static validation, then use promptfoo eval to run the selected real agent and grade recorded evidence. There is no separate eval runner.

What A Test Looks Like

This example checks that an agent fixes the login redirect and keeps any file changes within the intended scope:

yaml

skill: ../skills/bugfix-workflow
tests:
  - prompt: Fix successful logins so they go to /dashboard.
    fixture: ../fixtures/login-bug
    preconditions:
      - verifier.fails: { run: ./verify_login_redirect.sh }
    expect:
      - verifier.succeeds: { run: ./verify_login_redirect.sh }
      - file.changes_within: { paths: [app.js] }

Start with Getting Started. See the Reference for Test Pack structure and runtime checks, or run the repo's cross-adapter example.