Should you install this skill?
Type a skill name. We'll show you whether it measurably helps the agent — and whether it triggers exploits in a runtime sandbox.
227 audited · 41 unsafe · 93 confirmed exploits · 4,256 judge items
Riskiest skills you should know about.
93 confirmed exploits across 41 skillsTwelve hand-picked findings spanning five exploit classes. Click any card to inspect that skill.
How we score a skill
5-step pipeline · two independent axes
Every skill goes through the same pipeline. The same execution pass produces both axes — effectiveness and safety — never combined into a single score.
SKILL.md, scripts, and dependencies. Each finding gets an existence_confidence ∈ [0, 1].
→ static_scan.json
scenarios/U*.yaml
fs.diff · net.log
existence × exploitability against the runtime trace.
→ judges/*.json
pass_rate_gain, efficiency_score, and security.score = max(10, 100 − Σ base × existence × exploit).
→ skill_report.json
See it on GitHub
Chrome extension · injects the verdict on any SKILL.md repo
The Chrome MV3 extension recognizes any GitHub repository whose root contains a
SKILL.md and renders the same verdict directly on the page — at the moment
someone is deciding whether to install.
// captured during precomputed run · 2026-05-04 · run d4f8c2
Cite + download
BibTeX · benchmark.json (4.5 MB) for reproducibility
BibTeX
@inproceedings{skilllens2026,
title = {SkillLens: From Task-First Evaluation to
Skill-Centered Assessment of Agent Skill Packages},
author = {Anonymous Author(s)},
booktitle = {Submitted to the 40th Conference on Neural
Information Processing Systems (NeurIPS)},
year = {2026},
note = {Under review. Do not distribute.}
}
Reproduce
Every audit's skill_report.json — judge items, finding rationale, paired
wi / wo numbers, severity weighting — bundled for all 227 evaluated skills.