Self-Healing Triage Dataset

Self-Healing Triage Dataset #

Complete audit of all triage decisions made by the sisakuintel-agent self-healing system, published for independent verification and community challenge.

Data Sources

Last Updated: 2026-03-06


How to Read This Dataset #

The self-healing architecture routes findings through two repositories:

  1. sisakuintel-worker: The scan-orchestrator creates [Scan Report] issues for each scanned repository. A triage comment classifies every finding as TP or FP with reasoning.
  2. sisakulint: When the triage agent identifies a false positive, it creates a bug report Issue and a corresponding fix PR in the scanner’s own repository. Only FP-related items appear here.

The “45 triaged items” referenced in the evaluation metrics counts 27 Finding Issues + 18 Fix PRs created by sisakuintel-agent[bot] in the sisakulint repository. TP findings are documented in sisakuintel-worker Scan Report comments.


Classification Criteria #

A finding is classified as:

  • True Positive (TP): The flagged pattern is exploitable given the workflow’s trigger type, permission scope, and step dependency graph.
  • False Positive (FP): Any contextual factor (trigger restrictions, permission scope, step dependencies, safe data types) eliminates exploitability.

Ground truth was determined by the author (Atsushi Sada). This dataset publishes the complete classification with per-finding rationale to enable independent re-classification by any reviewer.


Part A: sisakulint Repository — FP Reports + Fix PRs (45 Items) #

Legend #

SymbolMeaning
FPFalse Positive
BugScanner bug (not a TP/FP classification)
MergedFix PR merged after human review
OpenFix PR pending
N/ANo bot-generated fix PR

Week 1 (2026-01-06 – 2026-01-11) — 12 items #

#Issue/PRTypeClassificationRuleTarget RepoRationale
1#234IssueFPsyntax (parser)anthropics/claude-codedescription is a valid workflow key. Missing from allowed keys list in parse_main.go
2#235PR (HikaruEgashira)Fix → MergedsyntaxFix for #234
3#240IssueFPcache-poisoningChecking out base.ref (target branch) is safe; does not execute PR code
4#241PRFix → Mergedcache-poisoningFix for #240
5#242IssueFP (2 findings)cond, expressionophub/fnnas(a) Multi-expression conditions ${{ A }} == ${{ B }} flagged as “always true”; (b) cancelled() treated as undefined function
6#245IssueFPcondSame root cause as #242
7#246IssueBugcond, expressionAuto-fix functionality broken for these rules
8#247PRFix → MergedcondFix for #242/#245/#246
9#249IssueFPpermissionspermissions: read-all is valid (used by OpenSSF Scorecard). Missing from switch statement
10#250PRFix → MergedpermissionsFix for #249
11#251IssueFPartifact-poisoning/tmp is outside workspace; cannot overwrite source code
12#252PRFix → Mergedartifact-poisoningFix for #251

Week 3–4 (2026-01-24 – 2026-02-01) — 6 items #

#Issue/PRTypeClassificationRuleTarget RepoRationale
13#307IssueFPimpostor-commitgithub/copilot-sdkValid commit on releases/v5 branch flagged as impostor
14#308IssueFP (3 findings)untrusted-checkout, cache-poisoning, code-injectiongoogle/langextractJob-level if: github.event_name == 'pull_request' restricts execution to safe triggers, but rules only check workflow-level triggers
15#309IssueFP (2 findings)permissions, impostor-commitnexmoe/VidBee(a) Job-level permissions defined but workflow-level flagged as missing; (b) valid main branch commit flagged as impostor
16#310IssueFPartifact-poisoningOpenBMB/VoxCPMNo checkout step in job — artifact download cannot overwrite source code
17#311PRFix → MergedpermissionsFix for #309
18#312PRFix → Mergedartifact-poisoningFix for #310

Note: #308 was fixed by #315 (by ultra-supara, introducing JobTriggerAnalyzer). Not a bot PR, not counted in 45.

Week 5–6 (2026-02-04 – 2026-02-17) — 10 items #

#Issue/PRTypeClassificationRuleTarget RepoRationale
19#324IssueFPimpostor-commitj178/prekAnnotated tag object SHA differs from commit SHA; valid commit flagged
20#328IssueFPartipackedAuto-fix unconditionally adds persist-credentials: false, breaking workflows that need git credentials for git push
21#329PRFix → MergedartipackedFix for #328: guard condition checking for upload-artifact
22#333IssueFP (2 findings)impostor-commit, cache-poisoningkoala73/worldmonitor(a) Annotated tag handling; (b) swatinem/rust-cache false alert
23#334PRFix → Mergedimpostor-commit, cache-poisoningFix for #324/#333
24#335IssueFPparsernearai/ironclawYAML anchors (&name) and aliases (*name) flagged as syntax errors
25#336PRFix → MergedparserFix for #335: added dereferenceAlias() helper
26#337IssueFPparserdependabot.yml validated as workflow file, reporting missing on:/jobs:
27#338PRFix → MergedparserFix for #337

Week 6–7 (2026-02-18 – 2026-02-27) — 10 items #

#Issue/PRTypeClassificationRuleTarget RepoRationale
28#339IssueFPdependabot-github-actionsRemote scan mode uses os.Stat() on local filesystem
29#340PRFix → Mergeddependabot-github-actionsFix for #339
30#341IssueFPtoctougoogle/langextractJobTriggerAnalyzer (PR #315) was not applied to TOCTOU rule variant
31#342PRFix → MergedtoctouFix for #341
32#344IssueFPparser#338 fix not deployed to API server; dependabot.yml FP recurrence
33#346IssueFPcommit-shaLocal actions (./my-action) are part of the same repo; not subject to supply chain attacks
34#347PRFix → Mergedcommit-shaFix for #346
35#348IssueBuginfraLambda deployment outdated
36#349IssueFPimpostor-commitAPI rate-limiting causes getTags() to return empty; all fallback checks fail, falling through to isImpostor: true
37#350PRFix → Mergedimpostor-commitFix for #349: fail-open on API errors

Week 8–9 (2026-03-01 – 2026-03-03) — 8 items #

#Issue/PRTypeClassificationRuleTarget RepoRationale
38#370IssueFPimpostor-commitgsd-build/get-shit-doneOfficial tagged commits (v4.3.1, v4.4.0) flagged as impostor
39#371IssueFPcode-injectionhead.sha is always 40-char hex; cannot contain shell metacharacters
40#372PRFix → Mergedcode-injectionFix for #371
41#373IssueFPimpostor-commitdtolnay/rust-toolchainNon-default branches (stable, nightly) used as version identifiers; reachability check only compared against default branch
42#374PRFix → Mergedimpostor-commitFix for #373
43#375IssueFPsecret-exfiltrationWebhook URL used as curl destination misidentified as data exfiltration
44#376PRFix → Opensecret-exfiltrationFix for #375 (superseded by #378)
45#377IssueFP (2 findings)secret-exfiltration(a) Shell line continuation (\) breaks matchesLegitPattern; (b) secret used as curl URL positional arg misidentified as data payload
#378PRFix → Mergedsecret-exfiltrationFix for #375/#377

Visualizations #

Scan Report Classification Distribution #

pie title Scan Report Classification (n≈220)
    "All TP" : 130
    "Mixed (TP+FP)" : 50
    "All/Majority FP" : 20
    "No triage / Non-scan" : 20

FP Corrections by Rule Category #

xychart-beta
    title "False Positive Corrections by Rule Category"
    x-axis ["impostor-commit", "parser", "cond/expr/code-inj", "credential", "artifact-poison", "cache-poison", "permissions", "toctou", "commit-sha"]
    y-axis "Number of Corrections" 0 --> 7
    bar [6, 4, 3, 3, 2, 2, 2, 1, 1]

Self-Healing Timeline (Cumulative FP Corrections) #

xychart-beta
    title "Cumulative FP Corrections Over Time"
    x-axis ["W1 (Jan 6)", "W2", "W3-4 (Jan 24)", "W5-6 (Feb 4)", "W6-7 (Feb 18)", "W8-9 (Mar 1)"]
    y-axis "Cumulative Corrections" 0 --> 27
    line [8, 8, 13, 19, 24, 27]
    bar [8, 0, 5, 6, 5, 3]

Bar = new FP findings per period. Line = cumulative total.

Self-Healing Pipeline Flow #

flowchart LR
    A["sisakuintel-worker\nscans trending repo"] --> B{"Triage\ncomment"}
    B -->|All TP| C["TP findings\ndocumented in\nScan Report"]
    B -->|FP detected| D["sisakuintel-agent\ncreates Issue\nin sisakulint"]
    D --> E["sisakuintel-agent\ncreates Fix PR"]
    E --> F{"Human review"}
    F -->|Approve| G["Merged\n(15 bot PRs)"]
    F -->|Reject/Revise| H["Manual fix\n(2 human PRs)"]
    G --> I["Scanner improved\nFP eliminated"]
    H --> I
    I -.->|"Next scan"| A

    style A fill:#4a90d9,color:#fff
    style B fill:#f5a623,color:#fff
    style C fill:#7ed321,color:#fff
    style D fill:#d0021b,color:#fff
    style E fill:#d0021b,color:#fff
    style G fill:#7ed321,color:#fff
    style I fill:#7ed321,color:#fff

Fix PR Outcome Distribution #

pie title Fix PR Outcomes (18 total)
    "Bot PR Merged" : 15
    "Bot PR Open" : 1
    "Human PR (outside count)" : 2

Part B: sisakuintel-worker Repository — Scan Report Triage (TP + FP) #

Each [Scan Report] issue in sisakuintel-worker contains a triage comment with per-finding TP/FP classification and reasoning. This section summarizes all triaged reports from #300 to #555.

Aggregate Classification (#300–#555, ~220 Scan Reports) #

ClassificationCountPercentage
All TP (every finding confirmed valid)~130~60%
Mixed (TP + FP in same report)~50~23%
All/Majority FP (scanner bug)~20~9%
No triage comment / Non-scan issues~20~8%

Representative HIGH/CRITICAL Severity TP Findings #

Worker #RepositoryDetectionSeverity
#513ZhuLinsen/daily_stock_analysispull_request_target + PR head checkout → external contributor code runs with access to GEMINI_API_KEY, OPENAI_API_KEY, GITHUB_TOKENCRITICAL
#542router-for-me/CLIProxyAPIAdditional HIGH severity vulnerability discovered during triageHIGH
#469Veirt/weathrcode-injection-critical in homebrew.ymlHIGH
#484stan-smith/FossFLOWimpostor-commit + dangerous-triggers-criticalHIGH

Representative All-TP Scan Reports #

Worker #RepositoryFindingsKey Rules
#554rtk-ai/rtk105commit-sha, secrets:inherit, dependabot, latest tag
#553ruvnet/RuView280permissions, dependabot, commit-sha
#552openai/symphony21permissions, dependabot, commit-sha
#534superset-sh/superset406commit-sha, artifact-poisoning, cache-poisoning, artipacked
#527alibaba/OpenSandbox316commit-sha, artipacked, dependabot, self-hosted-runner
#509ruvnet/ruvector1269commit-sha, artipacked, permissions
#506clockworklabs/SpacetimeDB499permissions, dependabot, commit-sha, artipacked
#492D4Vinci/Scrapling71all TP
#491cloudflare/agents29all TP
#452anthropics/claude-quickstarts68all TP

Representative Mixed (TP + FP) Scan Reports #

Reports where the triage correctly separated TP from FP within the same scan:

Worker #RepositoryTP CountFP CountFP DetailsResulting Fix
#543mengxi-ream/read-frog862secret-exfiltration FP→ sisakulint #378
#538block/goosemanymanyimpostor-commit mass FP; reusable-workflow-taint TP→ sisakulint #370
#531gsd-build/get-shit-done52impostor-commit on actions/checkout@v4.3.1→ sisakulint #370
#536koala73/worldmonitormanyfewimpostor-commit non-default branch→ sisakulint #374
#520ruvnet/claude-flow37666 FP out of 382 total
#508vercel/chatmany7dependabot.yml misidentified as workflow→ sisakulint #338
#465google/langextractmanyfewtoctou/critical FP→ sisakulint #342
#358google/langextract403+untrusted-checkout/cache-poisoning/code-injection FP→ sisakulint #315
#325anthropics/claude-code145description syntax FP→ sisakulint #235

FP-to-Fix Traceability #

Complete chain from FP discovery in worker to scanner fix:

Worker #FP Discovered→ sisakulint Issue→ Fix PRStatus
#325anthropics/claude-code description syntax#234#235Merged
#331marcelscruz/public-apis cache-poisoning base.ref#240#241Merged
#336ophub/fnnas cond/expression#242#247Merged
#371nexmoe/VidBee permissions + impostor-commit#309#311Merged
#361OpenBMB/VoxCPM artifact-poisoning without checkout#310#312Merged
#392j178/prek impostor-commit annotated tag#324#334Merged
#416koala73/worldmonitor impostor-commit#333#334Merged
#423nearai/ironclaw YAML anchors#335#336Merged
#508vercel/chat dependabot.yml#337#338Merged
#358google/langextract job-level triggers#308#315Merged
#465google/langextract TOCTOU#341#342Merged
#482qwibitai/nanoclaw local action commit-sha#346#347Merged
#536koala73/worldmonitor non-default branch#373#374Merged
#531gsd-build/get-shit-done official tag#370Open
#543mengxi-ream/read-frog secret-exfiltration#377#378Merged

Statistics #

Fix PR Status #

StatusCountPR Numbers
Bot PR Merged15#241, #247, #250, #252, #311, #312, #329, #334, #336, #338, #340, #342, #347, #350, #372, #374, #378
Bot PR Open1#376
Human PR Merged (outside 45 count)2#235 (HikaruEgashira), #315 (ultra-supara)
Unfixed1#370 (open)

Rule Category Breakdown #

CategoryCorrectionssisakulint Issues
Impostor-commit6#307, #309, #324, #333, #349, #370, #373
Artifact Poisoning2#251, #310
Access Control2#249, #309
Credential Protection (artipacked, secret-exfiltration)3#328, #375, #377
Code Injection (cond, expression, code-injection)3#242, #245, #371
Parser/Validation4#234, #335, #337, #339
TOCTOU1#341
Third Party (commit-sha)1#346
Cache Poisoning2#240, #333

Verification Guide #

To independently verify any triage decision:

  1. Pick a Scan Report from sisakuintel-worker issues (e.g., #543)
  2. Read the triage comment — it contains the full reasoning for each finding
  3. Clone the target repository listed in the Scan Report
  4. Inspect the workflow file at the specified path and line number
  5. Assess exploitability using the classification criteria above
  6. Compare your classification with the published determination

For FP findings that led to scanner fixes:

  1. Read the sisakulint Issue (e.g., #377) for the FP report with root cause analysis
  2. Read the Fix PR (e.g., #378) for the code change and regression tests
  3. Verify the fix does not reduce detection of true positives by checking CI results

If you disagree with any TP/FP determination, open an issue in the sisakulint repository with your analysis.


Limitations #

  1. Single annotator: Ground truth was classified by the author. Per-finding rationale and triage reasoning are published for independent re-classification. GHSA/GHSL advisory outcomes provide partial external validation for the advisory-confirmed subset, though advisories confirm vulnerability existence at the repository level, not individual rule-level TP/FP status.

  2. FP-only in sisakulint: The sisakulint repository contains only false positive reports and fixes because the architecture routes only FPs to the scanner for correction. TP evidence is in sisakuintel-worker Scan Report comments.

  3. Evolving dataset: The self-healing system continues to operate. New findings, triage decisions, and fixes are added continuously and are publicly visible in real time.