How AI Candidates Use ChatGPT to Cheat in Interviews (And How to Catch Them)
By the time you finish this article, at least one candidate somewhere is finishing a remote AI/ML interview with ChatGPT or Claude open in a second tab, an audio earpiece feeding answers, or a more senior collaborator off-camera doing the actual work.
This is now the single biggest fraud risk in AI/ML hiring. The teams winning at AI hiring in 2026 are the teams that have actively built defenses for it. The rest are hiring people who cannot do what their interview said they could.
How big is the problem, really?
The honest answer is that nobody knows the exact rate, because most teams using legacy interview tools have no way to measure it. The data points we do have are concerning:
- Multiple AI hiring leads we've spoken to report that 60–80% of remote AI/ML candidates show at least one integrity signal when measured against behavioral baselines.
- In high-volume AI/ML pipelines we've observed, around half of senior candidates use external assistance on at least one assessment question. Most of those candidates would not pass a clean evaluation of the same material.
- The biggest jump in measured cheating happened in Q3 2024, when LLM quality crossed the threshold of producing answers that look indistinguishable from a strong senior candidate's without any obvious tells. Interview tools designed before that point are now systematically undercounting.
Treat the question as not “is cheating happening?” but “what fraction of our shortlist is AI-augmented and how do we know?”
The five patterns we see most
Cheating in AI/ML interviews falls into a handful of recognizable patterns. Each leaves different fingerprints. The integrity defense looks different for each.
1. The paste-and-edit
Candidate asks ChatGPT or Claude the question, gets an answer, pastes it into the response field, and lightly rewords it to sound more like their own writing. Most common pattern. Easiest to catch.
The tells: Paste events recorded. Typing-rhythm signature suddenly matches a copy + light-edit pattern instead of free composition. Word-frequency distribution of the answer matches GPT-family output rather than human writing.
2. The second-monitor read
Candidate keeps an LLM open on a second screen or phone. They read the answer off-screen and type it out themselves, leaving no paste event. Harder to catch by paste detection alone.
The tells: Eye-tracking pattern (when video is on) shows repeated off-screen glances during answer composition. Typing speed and rhythm are unusually consistent — too consistent for free composition. Answer structure follows a generic LLM template.
3. The audio earpiece
Candidate has someone — or some tool — feeding them answers through an earpiece. Common in high-stakes senior interviews where a junior engineer poses as a senior. The candidate seems to "think out loud" but the substance is being supplied externally.
The tells: Speech cadence shows micro-pauses immediately before substantive technical claims, as if the candidate is listening before speaking. Voice-stress signature inconsistent with the level of confidence the candidate is projecting. Specific facts and details land too cleanly to be from memory.
4. The stand-in interviewer
A completely different person takes the interview on the candidate's behalf. Sometimes a more senior engineer hired specifically to pass interviews ("interview farms" are a real industry). Sometimes a friend. Sometimes the candidate has multiple stand-ins for different stages.
The tells: Face-verification mismatch against the candidate's ID or LinkedIn photo. Voice signature inconsistent across interview stages. Working hours / time zone do not match the candidate's stated location. Skill level radically inconsistent between the assessment stage and a later unscheduled check.
5. The agentic browser
Newer pattern. Candidate uses an AI browser agent (Comet, browser-use, etc.) to read the question, generate an answer, and even type the answer directly into the response field. No paste event because the agent is "typing." Hardest to catch with paste detection.
The tells: Typing rhythm has a robotic regularity that human composition does not produce. Mouse movement patterns show no idle behavior — perfect direct paths, no pauses, no off-target clicks. Some browser agents leave detectable user-agent or DOM-event signatures.
What does not work (anymore)
A lot of the integrity playbook from 2020–2023 silently failed once frontier LLMs got good. If your AI/ML hiring loop relies on any of these, you are essentially undefended:
- Plagiarism detection on answers. LLMs generate fresh text every time. There is nothing to plagiarize against. Plagiarism tools detect duplication, not generation.
- “Did you use AI?” honor questions. A non-trivial fraction of candidates will say no. Some say no because they genuinely don't think paste-and-edit counts.
- Camera-on requirement, by itself. A camera does not see a phone on the desk, a second monitor, or an earpiece. It does not see the agentic browser running in another tab.
- Timed coding tests with anti-paste JavaScript. Trivially defeated. The candidate types from a second screen or has the agent type for them.
- Live Zoom interviews as the primary integrity guarantee. Live Zoom is where the audio-earpiece and stand-in patterns are most effective. It is the worst environment for integrity in 2026, not the best.
- Brand-name proctoring services from the 2020 era. Most were designed for university test-taking, not adversarial professional hiring. The AI-assisted hire economy has more incentive and more sophistication than a college student with a textbook.
What actually works
No single signal is reliable on its own. The teams that catch AI-assisted cheating consistently are using a stack of weak signals that triangulate. The big four:
Behavioral telemetry across the whole assessment
Paste events, tab switches, typing rhythm, mouse-movement patterns, idle time, focus changes, and answer-composition timing. None of these on their own is conclusive. All of them together describe a candidate's actual behavior, which is hard to fake consistently across a 30-minute assessment.
Adaptive follow-up that probes for specifics
A generic LLM answer is fluent but unspecific. “You mentioned implementing RAG with a vector store. Describe the failure mode you saw when the query distribution drifted from the training data.” A candidate who actually shipped a RAG system answers from experience. A candidate pasting LLM output stalls or hallucinates specifics that do not check out. Adaptive follow-ups are the single highest-signal integrity check, because they collapse the space of plausible LLM responses.
Voice and face verification
A voice signature and a face match against the candidate's ID and public profile. Not bulletproof — deepfakes exist — but raises the cost of the stand-in pattern enough that most candidates won't bother. Combined with a brief live-conversation segment, this catches the majority of stand-in attempts.
Cross-question consistency analysis
A real candidate's level is internally consistent. They are good at the things they are good at and weaker on the things they are weaker on. A candidate using outside help shows suspicious flatness — every answer is uniformly strong, which is statistically unusual. A candidate whose senior collaborator drops off halfway through the assessment shows a sudden quality cliff. Both patterns are detectable.
What this means for your AI/ML hiring loop
The integrity layer is now table-stakes for AI/ML hiring specifically. Other engineering hiring can sometimes get by with weaker integrity defenses because the downstream consequences are smaller. AI/ML hires are more expensive, more concentrated, and harder to course-correct after the fact — so a single cheating-enabled mis-hire is more costly than in most other engineering disciplines.
Practical guidance for the next quarter:
- Assume some fraction of every shortlist is AI-augmented unless you have a behavioral telemetry layer telling you otherwise. Plan for it.
- Add adaptive follow-up to every senior AI/ML assessment. The single highest-leverage defense.
- Stop relying on live Zoom interviews as your primary integrity guarantee. They are now the easiest stage to spoof.
- Use one tool that runs the integrity stack consistently across every candidate. Inconsistent measurement is worse than no measurement — it creates a false sense of security on the candidates you happened to measure carefully.
Building this in-house is hard. Buying it is straightforward.
LayersRank ships the full integrity stack — behavioral telemetry, adaptive follow-up, voice/face verification, cross-question consistency — in the base product. It is not a premium add-on. See how Integrity Detection works or read the dedicated AI & ML hiring playbook.
Run a pilot on your hardest AI/ML role
Pick the AI or ML role where you have the strongest suspicion that your current shortlist is AI-augmented. Run LayersRank in parallel. See what the integrity stack flags.