By Garrett Kohlrusch | GK Data LLC
Triage queues aren’t backed up because of a shortage of bugs. They’re backed up because of a flood of reports that are technically coherent, professionally formatted, and completely unsubstantiated.
That’s the AI problem in bug bounty right now. Not that hunters are using it — that some hunters are using it as a replacement for proof.
What’s Actually Happening
LLMs are exceptionally good at producing text that sounds authoritative. You can describe a behavior, drop in some context, and get back a four-paragraph vulnerability writeup with impact analysis, a CVSS score estimate, and remediation recommendations. It reads like it was written by someone who found something real.
The problem: the finding often isn’t real. The LLM filled in the impact because you asked it to, not because the impact exists. Triage teams know this pattern. They’ve read hundreds of these reports. They can spot the structure immediately — and once they do, the credibility hit extends beyond that single submission.
Programs receiving this volume aren’t just frustrated. Some are quietly closing, reducing scope, or moving to invite-only. The signal-to-noise ratio has gotten bad enough that maintaining a public program is becoming difficult to justify.
The Second Problem Nobody Talks About
Report quality is the visible problem. There’s a less visible one worth naming directly.
Hunters are pasting raw recon data into cloud AI models. Live endpoints, internal hostnames, session tokens, parameter structures, API keys found in source. For public programs, this is careless. For private programs with NDAs or rules of engagement, it could be a material violation — and “I was using it to write a better report” is not a defense that will hold.
If you’re working on a private program and your AI workflow involves a third-party cloud model, you need to think carefully about what you’re feeding it. The information you paste doesn’t stay in your session.
How to Use It Without Being Part of the Problem
None of this means AI is off the table. It means the sequence matters.
Prove impact first. Write second.
AI belongs at the end of your workflow, not the beginning. It’s a communication tool, not a discovery tool. Once you have a confirmed, reproducible finding with real evidence — screenshots, request/response pairs, proof of data access, demonstrated behavioral change — AI can help you structure that into a clean report. That’s a legitimate use. The problem is using it in the opposite direction: generating a writeup and hoping the evidence materializes.
Use it for code review and hypothesis generation, not claims.
Pasting a JavaScript file and asking an AI to flag interesting patterns is reasonable. Using it to identify code paths worth investigating is reasonable. Asking it to write an impact statement for a behavior you haven’t yet proven is not.
Keep private program data out of cloud models.
If you’re doing anything under NDA or on a private program, either use a local model or write the report without AI assistance. The friction is worth it.
If the AI-assisted version reads better than you understand the finding, stop.
That gap is the problem. The report should be a reflection of what you know, not what the model inferred.
The hunters earning consistent payouts on serious programs aren’t the ones generating the most reports. They’re the ones triage teams have learned to trust. That trust is built submission by submission, and it’s a lot easier to lose than to rebuild.
AI can make good hunters faster. It can make careless hunters look temporarily credible. Triage teams know the difference — and they remember which one you are.
Garrett Kohlrusch is a security researcher based in Minneapolis, MN. GK Data LLC provides web application testing and security consulting for businesses.
[email protected] | gkdata.io