Methodology

Transparent Benchmarks

Every claim we make is backed by reproducible benchmarks. Our methodology is transparent, and our test suites are continuously expanded as we discover new patterns.

CVE BENCHMARK

Real-World Detection Performance

We test TraceMint against real CVE-affected repositories, not synthetic test cases. This benchmark measures our ability to find known vulnerabilities in production code.

500 CVE Repositories Scanned
80.4% Strict Recall (Type + File)
25+ Vulnerability Classes Covered

πŸ“ˆ CVE Benchmark v4 Definition

Total Corpus 500 OSS CVE repositories across major web languages
Ground Truth Verified vulnerable file + vulnerability type per repo
Mode Blind Detection (no vulnerability hints provided)
Strict Hit Correct vulnerability type AND correct file identified
Result 80.4% strict recall on verified ground truth repos
Validation JSON output, file from analyzed set, evidence required

Detection Rate by Vulnerability Type

Deserialization
100%
Path Traversal
92%
LDAP Injection
89%
XSS
87%
Command Injection
83%
XXE
82%
SQL Injection
80%
SSRF
78%
SSTI
75%
Auth Bypass
70%
IDOR
65%
Open Redirect
62%
METHODOLOGY

How We Test

Our testing methodology is designed to reflect real-world performance, not cherry-picked results.

πŸ“ˆ

CVE Repository Testing

We clone real CVE-affected repositories and run full scans. A "hit" means we find the correct vulnerability type in the correct file β€” verified against advisory data.

Test Corpus 500 CVE repositories
πŸ”„

Multi-Language Coverage

Our benchmark spans Python, JavaScript, Go, Java, PHP, Ruby, and more. Each language has dedicated taint engines and framework adapters.

Languages 30+ supported
🐳

Docker PoC Verification

When a repo includes docker-compose or Dockerfile, we automatically spin up a lab and execute the PoC to confirm exploitability.

Verification Automated E2E
🎯

Verdict-Based Scoring

Instead of arbitrary FP percentages, we use VERIFIED / PROOF-BACKED / NEEDS_REVIEW verdicts based on proof obligation completion.

Approach Proof-first
FALSE POSITIVE MANAGEMENT

Signal Over Noise

High recall means nothing if every finding is a false positive. Our proof-obligation pipeline ensures that reported findings carry verifiable evidence.

<0.5 FP per 1K LOC
5-Stage FP Reduction Pipeline
Evidence Required for Every Finding
πŸ›‘οΈ

Guard & Sanitizer Verification

Before reporting a finding, the engine checks whether authorization guards, sanitizers, or input validators are present on the data flow path β€” and suppresses if protected.

πŸ”—

Source→Sink Chain Required

No pattern-match-only reports. Every finding must have a verified taint chain from user-controlled source to a dangerous sink, with all intermediate steps traced.

πŸ“‹

Verdict Classification

Findings are classified as VERIFIED, PROOF-BACKED, or NEEDS_REVIEW based on how many proof obligations are satisfied β€” so you know exactly what to triage first.

METRICS

What We Measure

We track multiple metrics to ensure a balanced view of scanner performance.

Recall % of real vulnerabilities detected out of all known vulnerabilities Current: 80.4% strict
Precision % of reported findings that are true vulnerabilities Target: >85%
FP Rate False positives per 1000 lines of code analyzed Target: <0.5
Coverage % of vulnerability classes with active detection rules 25+ categories

Ready to see these results
in your codebase?

Start scanning with TraceMint today. See the difference semantic analysis makes.