Agent Skill · DatoCMS

eval-triggers

Run the trigger evaluation pipeline — classify, analyze, and optionally compare against a baseline. Only run when explicitly asked — evals are expensive.

Provider: DatoCMS Path in repo: .agents/skills/eval-triggers/SKILL.md

Skill body

IMPORTANT: This skill is expensive (makes many LLM API calls). Only run when the user explicitly asks for it. Never run proactively.

Before running, ask the user which eval source to run unless they already specified it in $ARGUMENTS:

Step 1 — Classify:

The runner writes to the canonical layout at evals/results/trigger/<skill>/<track>/<source>/results.json. You do not pass an output directory.

For Claude Code:

python3 evals/scripts/run_trigger_eval.py --track claude --source combined

For Codex:

python3 evals/scripts/run_trigger_eval.py --track codex

Step 2 — Analyze:

python3 evals/scripts/analyze_trigger_results.py \
  --track claude --source frontmatter

This writes the cross-skill summary to evals/results/trigger/_summary/<track>/<source>/summary.{json,md}. Report:

Step 3 — Compare (optional):

If the user provides a baseline summary or you have one from a previous run:

python3 evals/scripts/compare_trigger_runs.py \
  --baseline <baseline-summary>.json \
  --candidate evals/results/trigger/_summary/<track>/<source>/summary.json \
  --output-markdown local/comparison.md

Summarize regressions and improvements. For ad-hoc baselines that should not be committed, store them under local/ (gitignored).

Skill frontmatter

disable-model-invocation: true metadata: {"internal" => true}