AI-generated fakes aren't one problem. They're seven. Each type of fake requires a different kind of detective. No single AI can fight them all — so we built a team.
In Greek mythology, the Hydra was a serpent with many heads. Cut one off, and two more grew back. The only way to defeat it was to cauterize each neck before a new head could sprout. You needed a different strategy for each head.
AI-generated imagery works the same way. When people say "AI fakes," they talk about it as if it's one thing. It isn't. There are at least seven fundamentally different kinds of visual deception — and each one fools detectors in a different way.
Most AI detection tools use a single model. They ask one question: "Is this image AI-generated?" They get back a number between 0 and 100. And then they make a judgment based on that one number. That's like sending one detective to investigate seven completely different types of crime. A homicide investigator doesn't know how to trace financial fraud. A cybercrime expert doesn't dust for fingerprints.
The AI fake is not a single beast. It is a hydra. And every head requires a specialist who knows exactly where to look.
Sources: Comprehensive Benchmark Study (2025), Community Forensics (2024), Real-World Detection Benchmarks
Each head of the hydra represents a fundamentally different type of visual deception. They exploit different weaknesses, leave different traces, and require different specialists to detect.
The entire image is generated from nothing. A text prompt becomes a photograph that never happened. A person who never existed poses in a place that was never built. Every pixel is synthetic.
The Kamchatka snowstorm videos. A real weather event happened, but hundreds of AI-generated drone shots appeared showing buildings and situations that didn't exist. Newsrooms in Panama, Mexico, and Poland published them as real footage.
Detection approach: Visual DNA analysis, spectral frequency checks, perturbation stability tests. These are the easiest fakes to catch — when the entire image is synthetic, every pixel carries the generator's fingerprint.
The body is real. The scene is real. But the face has been replaced. This is the classic deepfake: take a real video or photo and swap in someone else's face. The background passes every authenticity test because it is authentic. Only the face is synthetic.
Modern face swaps match lighting, skin texture, and even blinking patterns. The manipulated area is often less than 5% of the total image. A whole-image scanner sees 95% real content and says "probably authentic."
Detection approach: Manipulation localization models that compare every patch against every other patch. The swapped face has different pixel statistics from the surrounding skin, even when it looks perfect to the human eye.
A real person is placed in a fake scene, or a fake person is placed in a real scene. The Eiffel Tower composite that broke our detector: a real software engineer pasted himself in front of a real landmark. The person was real. The tower was real. The combination was fake. Our tool said "authentic."
Composites contain 100% real elements. Neither the person nor the background triggers AI generation detectors, because neither was AI-generated. The forgery exists only at the seam between them — a boundary that modern tools can make nearly invisible.
Detection approach: Noise consistency analysis. Every camera sensor leaves a unique noise pattern across the entire image. When two sources are combined, the noise pattern breaks at the boundary. Specialists trained in digital noise can see the seam.
The photo is real. The scene is real. But one small element has been changed. The text on a protest banner was altered. A license plate was swapped. A uniform badge was removed. A product label was edited. These are the fakes that matter most in journalism — and they're nearly invisible to whole-image AI detectors.
The manipulated area might be 50 by 50 pixels in a 4000 by 3000 pixel image. That's 0.02% of the total image. A whole-image AI score averages this away to nothing. You need a specialist that examines the top 1% most suspicious pixels — not the average.
Detection approach: Hotspot analysis. Instead of averaging the entire image, specialized models find the single most suspicious region and zoom in. A tiny text edit that scores 0.2% on a whole-image scan can score 85% on a hotspot analysis.
Something was there, and now it isn't. AI inpainting tools can remove a person from a crowd, delete a weapon from a hand, or erase evidence from a crime scene. The removed area is filled in with AI-generated content that seamlessly matches the surroundings. The rest of the image remains untouched and authentic.
In conflict journalism, this is the most dangerous head of the Hydra. A protest photo with a key person removed. A satellite image with a military vehicle erased. Evidence that something happened, altered to show that it didn't. The image was never "AI-generated" — it was AI-erased.
Detection approach: The inpainted region has different statistical properties from the camera-captured surroundings. Non-semantic artifact detectors — which ignore what the image shows and focus on how pixels behave — can spot the boundary between real and generated content.
The photo is real, but it's been "enhanced" by AI in ways that change its meaning. An AI upscaler that makes a blurry surveillance image sharp might also invent facial features that weren't in the original. An AI tool that "improves" an old photo might add details that change the historical record. The line between enhancement and fabrication is invisible.
Is it still a "real" photo if AI sharpened the faces, filled in missing pixels, adjusted the colors, and "enhanced" the resolution? At what point does enhancement become fabrication? There is no clear boundary — which is exactly what makes this head so dangerous.
Detection approach: Frequency-domain analysis. AI enhancement tools leave spectral fingerprints that differ from natural camera optics. A specialist that converts the image into its frequency spectrum — like turning a song into sheet music — can see the artificial harmonics that AI processing introduces.
The image might be entirely real. But it has been stripped of all metadata, re-compressed, screenshotted, and shared across platforms until no forensic trace remains. It arrives in a journalist's inbox as a bare JPEG with no EXIF data, no camera information, no GPS coordinates, no editing history. It could be authentic. It could be AI. There is no digital trail left to follow.
Research shows that detection accuracy drops 45-50% when tested on real-world images versus laboratory conditions. The images journalists actually receive have been through WhatsApp compression, Twitter resizing, screenshot cropping, and multiple rounds of JPEG re-encoding. Every step destroys the subtle forensic signals that detectors rely on.
Detection approach: This is where training-free methods and perturbation-based detection shine. Instead of looking for learned patterns (which compression destroys), these specialists ask: "Does this image behave like a real photograph when I slightly disturb it?" Real photos are geometrically stable. AI images are not — even after compression.
If each head of the hydra requires a different specialist, the answer is obvious: build a team. Not one AI detector. A squad of them. Each trained differently, each looking at different evidence, each with a unique blind spot that the others cover.
That's exactly what we did. We consulted researchers from five continents — through video calls, chat threads, and in-person conversations — computer scientists who build these detectors, and journalists who use them in the field. We asked one question: if you could pick a team of specialists to fight every head of the hydra, who would you choose?
Uses a powerful self-supervised vision model to extract the "visual DNA" of an image. Trained to see the subtle difference between photographs taken by cameras and images rendered by AI. Works at full resolution — no detail lost.
Converts the image into its frequency spectrum — turning the visual into mathematical harmonics. AI generators leave spectral fingerprints that are invisible to the eye but unmistakable in the frequency domain. Particularly strong against older GAN-based generators.
Trained on 2.7 million images from over 4,800 different AI generators. While most detectors have seen a handful of generators, this one has seen thousands — including obscure open-source models that most tools have never encountered. Tiny, efficient, and remarkably accurate.
Specifically trained to catch images from modern diffusion generators — the technology behind the most realistic AI images today. Uses contrastive learning: trained on millions of images that were reconstructed through diffusion processes, learning the subtle difference between "original" and "passed through AI."
Uses a foundation model that understands both images and language. It checks whether the image has visual fingerprints that match known AI generation methods, using a trainable importance weighting system to focus on the features that matter most.
This specialist was never trained to detect AI images. Instead, it exploits a fundamental law of physics: real photographs are stable when slightly disturbed, AI images are not. It adds tiny noise to the image and measures how much the deep features change. Real photos barely flinch. AI images shatter.
Examines the digital noise pattern across the entire image. Every camera sensor leaves a unique noise signature. When part of an image is spliced, edited, or AI-generated, the noise pattern breaks. This specialist produces a heatmap: blue for untouched areas, red for manipulated regions.
Divides the image into a grid and compares every patch against every other patch. Edited regions have different pixel statistics — even when they look identical to the human eye. Features a "hotspot" detector that zeroes in on the top 1% most suspicious pixels, catching tiny edits that whole-image scanners miss entirely.
Most detectors look at what an image shows. This specialist deliberately breaks the image's semantics and looks only at manipulation traces. It doesn't care if the edited text "makes sense" — it only cares whether the pixels at the edit boundary behave differently from their neighbors. Uses 80% less computational power than traditional approaches.
Runs two parallel investigations simultaneously: a macro branch that captures object-level tampering cues, and a micro branch that detects fine-grained boundary artifacts. Then dynamically weighs their findings pixel by pixel. Catches both large background replacements and tiny detail edits in the same pass.
An independent commercial detection service runs its own proprietary analysis. This provides a second opinion from entirely outside our own system — like getting a second doctor's opinion from a different hospital.
A multimodal AI examines the image the way a human would: looking at faces, hands, text, reflections, shadows, perspective, and context. It provides a narrative explanation — not just a number, but a reason. "The shadows point in different directions" or "the text on the sign contains gibberish characters."
Having ten specialists is useless if they all shout different answers. The hardest engineering problem isn't running the models — it's deciding what to do when they disagree.
We built a verdict system with clear rules of authority. Think of it as a courtroom: some evidence carries more weight than others, and the jury has to reach a verdict even when witnesses disagree.
Verdicts follow a strict chain of command. Higher-priority evidence overrides everything below it.
If the image matches our database of verified fakes, everything else is irrelevant. Case closed.
Some AI generators embed invisible watermarks in their output. If we detect one, we know exactly which tool created it.
The commercial scanner acts as the primary forensic authority. Its scores are well-calibrated and tested across millions of images.
When the authority signal is uncertain, the team votes. If five or more specialists agree, their collective judgment overrides the uncertainty. No single model can veto the team.
If three or more localization specialists flag the same region as manipulated, the image is marked as "Real but Edited" — even if the whole-image scanners say it's authentic.
No single specialist is infallible. But when independent brains — each using a completely different technique, trained on different data, looking at different evidence — reach the same conclusion, that's not a guess. That's a verdict.
Research shows that ensemble methods achieve up to 78% mean accuracy compared to 37-72% for single models. Diverse models capture complementary patterns, providing robustness to distribution shift. The whole is genuinely greater than the sum of its parts.
A journalist at a major wire service put it bluntly: "When I see ten models and one of them flags the image at 45%, I panic. But you're telling me that model is answering a completely different question? Then why is it in the same list?"
She was right. We were mixing two fundamentally different investigations. Five specialists were answering "Was this entire image made by AI?" Four others were answering "Was part of this real photo edited?" Showing them together with sequential numbers made every result look like a contradiction.
It's like mixing a blood test and an X-ray in the same report, numbering them 1 through 9, and expecting the patient to understand why test #7 shows a broken bone while tests #1-6 say the blood is fine.
So we consulted a larger group. Computer scientists who build these detectors. Journalists who use them on deadline. Designers who specialize in making complex information understandable. And the tool's creator moderating — all consulted through calls, chats, and in-person sessions — arguing about labels.
The panel's first unanimous decision: separate the models into two clearly labeled groups, each answering its own question.
Five models check whether the entire image was created by an AI tool. They use different techniques — visual DNA, frequency analysis, generator fingerprints, diffusion traces, perturbation stability — to answer the same binary question.
Four models look for signs that parts of a real photograph were altered — retouching, object removal, text replacement, or background swaps. They produce heatmaps showing exactly where the edits are.
A Scandinavian fact-checker summed it up: "5 tools checked if this was made by AI — they all say no. That's something my grandmother understands."
The panel's second major decision: every verdict needs a plain-language summary before the technical details. Not "model concordance indicates high probability of synthetic origin." Instead:
"We checked this image with 9 different detection tools and none found signs of AI generation or editing. The image appears to be an authentic photograph. No further technical verification needed, but always check the context and source."
"Multiple detection tools independently identified patterns typical of AI-generated images. This image was very likely created by an AI tool. Do not publish without disclosure."
"Our analysis found some concerning signals, but we can't be certain. This could be a heavily edited photo, a screenshot, or an image processed in unusual ways. Compare with other versions of this image."
"The detection tools found no hard evidence of AI or manipulation, but our visual analysis raised questions about the image. Seek a second opinion — share with a colleague or use another verification tool."
An open-source intelligence investigator added the crucial element: every summary should end with what to do next, not just what was found. Verification is not a destination — it's a process.
The most important design insight came from a localization researcher. She asked: "What happens when Squad A says 'authentic' but Squad B says 'edited'?"
The old system treated this as a contradiction — mixed signals, unclear result, show an orange "uncertain" badge. The panel realized it's not a contradiction at all. It's the most coherent signal you can get:
"The AI detection models confirm this was taken by a camera, not generated by AI. But our editing detectors found regions where the photo may have been altered. This is a common pattern: the image started as a real photograph, but someone may have changed parts of it."
This is exactly Head #4 (the Surgical Edit) in action. The photo is real. The scene is real. But someone changed the text on a banner, removed a person from the crowd, or swapped a license plate. Five AI detectors correctly say "not AI." Four editing detectors correctly say "something was changed." Both are right. The design just needs to explain that they're answering different questions.
Not all specialists carry equal weight. Some were trained on broader data. Some have been peer-reviewed more rigorously. The system weighs their votes accordingly — but the user never sees the numbers.
The panel voted unanimously: showing "1.5x weight" next to a model card creates more questions than it answers. Users would think the system is gaming the result. A former commercial detection engineer put it simply: "The weights affect the internal verdict logic, not the interface. Transparency matters, but implementation details aren't transparency — they're noise."
There is a number that should keep every fact-checker awake at night: detection accuracy drops from roughly 79% on 2020-era generators to around 38% on 2024-era generators. Every year, the generators get better. Every year, the detectors fall further behind.
This is the hydra in its truest form. Cut off one head — learn to detect one generator — and two new generators grow back. The only strategy that works is the one Hercules figured out: don't just cut. Cauterize. Build detectors that don't rely on knowing the specific generator, but on fundamental properties of what makes a photograph a photograph.
That's why the perturbation tester is our insurance policy. It wasn't trained on any AI generator. It doesn't know what Midjourney or Stable Diffusion looks like. It only knows one thing: real photographs are stable when you poke them. AI images are not. When the next generation of AI arrives — and it will — the trained detectors will struggle. But the laws of physics don't change.
We're not trying to win the arms race. We're trying to make sure that when a journalist needs to know if a photo is real, they get the best answer currently possible — from ten independent specialists who each bring something the others don't.
Today, the hydra has seven heads. Tomorrow it might have twelve. But the principle doesn't change: you fight a monster with many heads by building a team with many eyes.
Upload an image and watch the specialists work. You'll see each one report independently — and then the verdict system weigh their collective evidence. For more background, read why AI detection fails on the fakes that matter most.