
About
Diagnose and optimize Agent Skills (SKILL.md) with real session data and research-backed static analysis. Works with Claude Code, Codex, and any Agent Skills-compatible agent.
name: skill-optimizer description: "Diagnose and optimize Agent Skills (SKILL.md) with real session data and research-backed static analysis. Works with Claude Code, Codex, and any Agent Skills-compatible agent." risk: safe source: hqhq1025/skill-optimizer (MIT) date_added: "2026-04-11"
When to Use This Skill
- Use when skills are not triggering as expected or seem broken
- Use when you want to audit and improve your skill library's quality
- Use when you want to understand which skills are underperforming or wasting context tokens
Rules
- Read-only: never modify skill files. Only output report.
- All 8 dimensions: do not skip any. If data is insufficient, report "N/A — insufficient session data" rather than omitting.
- Quantify: "you had 12 research tasks last week but the skill never triggered" beats "you often do research".
- Suggest, don't prescribe: give specific wording suggestions for description improvements, but frame as suggestions.
- Show evidence: for undertrigger claims, quote the actual user message that should have triggered the skill.
- Evidence-based suggestions: when suggesting description rewrites, cite the specific research finding that motivates the change (e.g., "front-load trigger keywords — MCP study shows 3.6x selection rate improvement").
Overview
Analyze skills using historical session data + static quality checks, output a diagnostic report with P0/P1/P2 prioritized fixes. Scores each skill on a 5-point composite scale across 8 dimensions.
CSO (Claude/Agent Search Optimization) = writing skill descriptions so agents select the right skill at the right time. This skill checks for CSO violations.
Usage
/optimize-skill→ scan all skills/optimize-skill my-skill→ single skill/optimize-skill skill-a skill-b→ multiple specified skills
Data Sources
Auto-detect the current agent platform and scan the corresponding paths:
| Source | Claude Code | Codex | Shared |
|--------|------------|-------|--------|
| Session transcripts | ~/.claude/projects/**/*.jsonl | ~/.codex/sessions/**/*.jsonl | — |
| Skill files | ~/.claude/skills/*/SKILL.md | ~/.codex/skills/*/SKILL.md | ~/.agents/skills/*/SKILL.md |
Platform detection: Check which directories exist. Scan all available sources — a user may have both Claude Code and Codex installed.
Workflow
Identify target skills
↓
Collect session data (python3 scripts scan JSONL transcripts)
↓
Run 8 analysis dimensions
↓
Compute composite scores
↓
Output report with P0/P1/P2
Step 1: Identify Target Skills
Scan skill directories in order: ~/.claude/skills/, ~/.codex/skills/, ~/.agents/skills/. Deduplicate by skill name (same name in multiple locations = same skill). For each, read SKILL.md and extract:
- name, description (from YAML frontmatter)
- trigger keywords (from description field)
- defined workflow steps (Step 1/2/3... or ### sections under Workflow)
- word count
If user specified skill names, filter to only those.
Step 2: Collect Session Data
Use python3 scripts via Bash to scan session JSONL files. Extract:
Claude Code sessions (~/.claude/projects/**/*.jsonl):
Skilltool_use calls (which skills were invoked)- User messages (full text)
- Assistant messages after skill invocation (for workflow tracking)
- User messages after skill invocation (for reaction analysis)
Codex sessions (~/.codex/sessions/**/*.jsonl):
session_metaevents → extractbase_instructionsfor skill loading evidenceresponse_itemevents → assistant outputs (workflow tracking)event_msgevents → tool execution and skill-related events- User messages from
turn_contextevents (for reaction analysis)
Note: Codex injects skills via context rather than explicit Skill tool calls. Skill loading (present in base_instructions) does NOT equal active invocation. To detect actual use, search for skill-specific workflow markers (step headers, output formats) in response_item content within that session. A skill is "invoked" only if the agent produced output following the skill's defined workflow.
Aggregated:
- Per-skill: invocation count, trigger keyword match count
- Per-skill: user reaction sentiment after invocation
- Per-skill: workflow step completion markers
Step 3: Run 8 Analysis Dimensions
You MUST run ALL 8 dimensions. The baseline behavior without this skill is to skip dimensions 4.2, 4.3, 4.5b, and 4.8. These are the most valuable dimensions — do not skip them.
4.1 Trigger Rate
Count how many times each skill was actually invoked vs how many times its trigger keywords appeared in user messages.
Claude Code: count Skill tool_use calls in transcripts.
Codex: count sessions where the agent produced output following the skill's workflow markers (not merely loaded in context).
Diagnose:
- Never triggered → skill may be useless or trigger words wrong