Check the skill first
Find unclear activation text, missing files, broken tests, and unsafe edit rules before you run an agent.
Check the skill, run the agent on a copied sample project, and prove the result with evidence.
Agent Skill Evals is for teams that write reusable skills for agents.
Use it when a skill can edit files, run commands, call tools, or make changes you want to check before trusting it.
Promptfoo is the test runner
Promptfoo is an open-source eval framework. Agent Skill Evals plugs into normal Promptfoo configs, so you keep running promptfoo eval and add skill-specific checks. Use the Promptfoo docs for Promptfoo's own config reference.
Agent Skill Evals has two jobs:
That split exists because a bad skill test can make a bad skill look good, and an agent's final message is not proof that the right work happened.
The model is:
Agent Skill Evals runs that loop through Promptfoo. There is no separate runner to learn.
This example checks that an agent creates a PowerPoint deck and only changes the allowed files:
preconditions:
- verifier.fails:
run: ./verify_brand_deck.cjs
should:
- verifier.succeeds:
run: ./verify_brand_deck.cjs
- file.created:
path: launch-deck.pptx
- file.created:
path: deck.js
should_not:
- file.changes_outside_scope:
scope:
- deck.js
- launch-deck.pptxStart with Getting Started, then read Core Concepts.
Use Set Up Tests For An Existing Skill when you already have a skill and want an agent to set up the Promptfoo configs, agent tests, verifier scripts, and evidence checks for you.
Use Runtime Checks, Skill Loading, Metrics, Package Reference, and the Promptfoo docs as reference pages.