Detection Accuracy

Last updated: March 2026 · Based on internal evaluation across 7,700+ tracked real-world analyses

ImageWhisperer uses detection models in parallel, cross-validated against each other. No single model determines the verdict. This page documents what each model does well, where it struggles, and our overall system accuracy.

1. How We Test

Our benchmarks are based on two sources:

Production data (7,700+ tracked analyses) — Real user uploads from March 2026 (systematic tracking began March 8), with verdicts reviewed against known ground truth where available (fact-checked images, known AI generators, photographer-verified originals).
Curated test sets — Images from known AI generators (Midjourney v6, DALL-E 3, Stable Diffusion XL, Flux), known manipulations (spliced composites, face swaps, background replacements), and verified authentic photographs.

We report accuracy honestly, including categories where detection is weak.

2. Overall System Performance

Our numbers compared to the research-reported industry average for single-model detectors tested on real-world images (not lab conditions).

AI-Generated Detection

ImageWhisperer 94%

Industry avg. 71%

Avg. across 16 detectors on real-world images ¹

False Positive Rate

ImageWhisperer 5%

Industry avg. 18%

Lower is better · Real photos misclassified as AI ¹

Manipulation Detection

ImageWhisperer 81%

Industry avg. 52%

Splicing, compositing, face-swap ²

¹ Averaged from peer-reviewed benchmarks: Dogoulis et al. (2023) on 16 detectors across 2.6M images; Corvi et al. (2023) cross-generator evaluation. Industry averages reflect single-model performance on out-of-distribution, real-world conditions (social media compression, unseen generators).
² Based on Guillaro et al. (2023) manipulation detection survey; Wu et al. (2022) IML benchmark. Most detectors tested on spliced/copy-move/inpainted images without cross-validation.

3. Per-Model Benchmarks

B-Free

Primary AI-generation detector. Trained on diffusion model outputs (Midjourney, DALL-E, Stable Diffusion).

AI Detection Accuracy

93%

False Positive Rate

Inference Time

0.3s

Midjourney DALL-E 3 Stable Diffusion Flux (use Flux Probe) Illustrations/artwork

SPAI (Splicing & AI Detection)

Specializes in detecting image splicing, compositing, and copy-move manipulations.

Manipulation Detection

87%

False Positive Rate

10%

Inference Time

0.4s

Spliced composites Background replacement Heavily compressed JPEGs Professional retouching

TruFor

Forgery localization model. Produces heatmaps showing manipulated regions.

Localization Accuracy

84%

False Positive Rate

Inference Time

0.5s

Region manipulation Inpainting Uniform AI-generated images

IML-ViT

Vision Transformer for image manipulation localization at pixel level.

Localization Accuracy

82%

False Positive Rate

Inference Time

0.6s

Flux Probe

Specialized DINOv2 linear probe trained specifically for Flux-generated images, which evade general detectors.

Flux Detection

93%

False Positive Rate

<5%

Inference Time

0.2s

Flux Non-Flux AI generators

External AI Detection API

Third-party AI detection service used as authority signal for cross-validation.

AI Detection

90%

False Positive Rate

Inference Time

0.8s

Additional Models

Sparse-ViT, Mesorch, ClipDet, CommFor, HiFi-Net++, and PerspectiveFields provide supporting votes in the multi-model ensemble. Individual accuracy varies (70–87%) but their combined signal strengthens verdict confidence.

4. Performance by Image Category

ImageWhisperer vs. research-reported industry averages per category.

Category	ImageWhisperer	Industry Avg.	Delta	Notes
Midjourney v5/v6	96%	82%	+14	Strongest detection category
DALL-E 3	94%	79%	+15	Reliable detection
Stable Diffusion XL	92%	76%	+16	Good across most subjects
Flux	93%	21%	+72	Flux Probe + ensemble vs. general detectors
Face swaps / deepfakes	85%	62%	+23	B-Free + SPAI + HiFi-Net++ cross-validated
Spliced composites	80%	48%	+32	SPAI + TruFor + IML-ViT + PerspectiveFields
Background replacement	77%	40%	+37	Hardest manipulation category
Screenshots	N/A	N/A	—	Flagged as "Further Research Needed"
Illustrations / artwork	Limited	High FP	—	Guards suppress false AI flags on artwork

Industry averages sourced from Dogoulis et al. (2023), Corvi et al. (2023), and Guillaro et al. (2023). Averages reflect single-model performance in cross-generator, real-world conditions. Flux average from Feb 2026 benchmark of 16 detection methods across 2.6M images.

5. How We Compare

Why the gap? Most detectors are a single model returning a single score. ImageWhisperer runs 10+ models in parallel, cross-validates their outputs, and requires corroboration before any verdict. That ensemble approach is why our real-world accuracy stays 20–40 percentage points above the single-model industry average.

The Flux gap

Flux-generated images are the hardest for the industry to detect. A February 2026 academic benchmark tested 16 detection methods across 2.6 million images and found an average accuracy of just 21% on Flux Dev. ImageWhisperer's dedicated Flux Probe achieves 93% on this category — a purpose-built DINOv2 linear probe trained specifically on Flux outputs. That's a +72 percentage point advantage.

Lab numbers vs. real-world performance

Many tools report 95–99% accuracy in controlled settings, but independent studies consistently show steep drops in real-world conditions. Platform re-encoding (Instagram, WhatsApp, Twitter compression), screenshots, and generators not in the training data all degrade performance. Our numbers are based on 7,700+ tracked real-world user uploads, not curated test sets — they reflect what you'll actually experience.

False positive rates matter

A detector that flags 18% of real photos as AI-generated (the industry average) creates alert fatigue and erodes editorial trust. ImageWhisperer's corroboration requirement — no single model can override the verdict alone — keeps our false positive rate at 5%, nearly four times lower than the average single-model detector.

What sets ImageWhisperer apart. We combine forensic AI detection with investigative tools — fact-checking, reverse image search, EXIF analysis, location verification, and full narrative explanations — in a single analysis. Most detection tools return a score; we explain why.

6. Known Limitations

We believe transparency about limitations builds more trust than inflated accuracy claims.

Professional retouching: Standard photo editing (color grading, cropping, exposure adjustment) is NOT flagged as manipulation. We detect structural changes, not aesthetic edits.
Heavily compressed images: JPEG compression below quality 40 degrades all detection models. Low-quality social media screenshots may produce less reliable results.
New AI generators: Detection accuracy may temporarily decrease when new generators launch. We continuously train and add specialized probes (like the Flux Probe) to catch up.
Illustrations and digital art: AI-generated artwork is harder to distinguish from human-made digital illustrations. We use guards to suppress false positives on clearly artistic content.
Text-heavy images: Memes, screenshots of text, and documents are outside our primary use case and may produce unreliable results.
Video stills: Single frames from video lose temporal information. Detection works but at reduced accuracy compared to native photographs.

7. How We Improve

Multi-model cross-validation: No single model can override the verdict alone. Every override requires corroboration from at least one other signal.
User feedback loop: When users report incorrect verdicts, we investigate and adjust thresholds. This is how the Flux Probe was developed — users reported missed Flux images.
Shadow analysis: New model candidates run in parallel on real traffic without affecting verdicts. We compare their performance before promoting them to the live pipeline.
Known fakes database: Viral AI-generated images that have been fact-checked are added to our database for instant recognition.

Questions about our methodology? Found an image we got wrong? Let us know — every report makes the system better.

8. Related Documents

Privacy Policy — How we handle your data
Terms of Use — Acceptable use policy
About — Who builds ImageWhisperer