Forensic Analysis of JPEG Ghosts: Detecting Multiple Compressions
Forensic Analysis of JPEG Ghosts: Detecting Multiple Compressions
Three months ago, a client sent us a "passport photo" for verification. Clean EXIF, proper dimensions, looked legitimate. But something was wrong. When we ran a JPEG ghost analysis, we discovered the image had been compressed at least four times at different quality levels. The face had been swapped.
JPEG ghosts are one of the oldest—and still most reliable—techniques for detecting image manipulation. If you've never heard of them, you're missing out on a powerful forensic tool. Let me show you how they work.
What Are JPEG Ghosts?
Every time you save an image as JPEG, the compression algorithm applies a lossy transformation. The image is divided into 8x8 pixel blocks, each transformed via Discrete Cosine Transform (DCT) and quantized based on a quality factor (1-100 in most editors).
Here's the key insight: Each quality level has a unique quantization matrix. When you re-compress a JPEG at a different quality, the new quantization conflicts with the old one, creating ghosting artifacts.
The Math (Simplified)
- First compression (Q=90): Pixel values are rounded to multiples of the Q=90 quantization matrix.
- Editing: You paste a region from another image or modify pixels.
- Second compression (Q=75): The entire image is re-quantized to the Q=75 matrix.
Result: The edited region shows different compression artifacts than the rest of the image. When you recompress at the original quality (Q=90), the untouched areas "ghosts" (appears slightly brighter/different) while the edited region doesn't.
Practical Example: Detecting a Face Swap
Let's work through a real example.
Step 1: Extract the Quality Factor History
from PIL import Image
import numpy as np
from jpegio import read, write
def estimate_jpeg_quality(image_path):
"""
Estimate the last JPEG compression quality factor.
Uses quantization table analysis.
"""
jpeg_struct = read(image_path)
# Extract luminance quantization table
quant_table = jpeg_struct.quant_tables[0]
# Compare against standard JPEG quality tables
# (This is a simplified version; production code uses full IJCV tables)
standard_qualities = {
50: np.array([[16, 11, 10, 16, 24, 40, 51, 61],
[12, 12, 14, 19, 26, 58, 60, 55],
# ... full 8x8 matrix ...
]),
75: np.array([[8, 6, 5, 8, 12, 20, 26, 31],
[6, 6, 7, 10, 13, 29, 30, 28],
# ...
]),
90: np.array([[3, 2, 2, 3, 5, 8, 10, 12],
[2, 2, 3, 4, 5, 12, 12, 11],
# ...
]),
}
best_match_quality = None
min_error = float('inf')
for quality, std_table in standard_qualities.items():
error = np.sum(np.abs(quant_table - std_table))
if error < min_error:
min_error = error
best_match_quality = quality
return best_match_quality, quant_table
# Usage
quality, qtable = estimate_jpeg_quality("suspect_passport.jpg")
print(f"Estimated quality factor: {quality}")
print(f"Quantization table:\n{qtable}")
For our passport photo, this returned Q=75.
Step 2: Recompress at Different Qualities and Compute Difference
The ghost detection algorithm:
- Load the suspect image
- Recompress it at various quality levels (60, 70, 75, 80, 90, 95)
- For each recompression, compute the pixel-wise difference from the original
- The quality level with the lowest difference reveals the original compression quality
def detect_ghost(image_path, qualities=[60, 70, 75, 80, 85, 90, 95]):
"""
Detect JPEG ghosts by recompressing at different qualities.
Returns:
ghost_map (np.array): Heatmap showing likely edited regions
primary_quality (int): Most likely original quality
"""
original = Image.open(image_path)
original_array = np.array(original, dtype=np.float32)
differences = {}
for q in qualities:
# Recompress at quality q
temp_path = f"temp_q{q}.jpg"
original.save(temp_path, 'JPEG', quality=q)
recompressed = Image.open(temp_path)
recompressed_array = np.array(recompressed, dtype=np.float32)
# Compute absolute difference
diff = np.abs(original_array - recompressed_array)
differences[q] = diff
# Clean up
import os
os.remove(temp_path)
# Find quality with minimum overall difference (likely original)
mean_diffs = {q: np.mean(diff) for q, diff in differences.items()}
primary_quality = min(mean_diffs, key=mean_diffs.get)
print(f"Primary quality: {primary_quality}")
for q, md in sorted(mean_diffs.items()):
print(f" Q={q}: mean diff = {md:.3f}")
# Ghost map: Difference from primary quality
ghost_map = differences[primary_quality]
return ghost_map, primary_quality
# Run detection
ghost_map, primary_q = detect_ghost("suspect_passport.jpg")
Step 3: Visualize the Ghost
import matplotlib.pyplot as plt
def visualize_ghost(ghost_map, threshold=5.0):
"""
Highlight regions that were likely edited.
Threshold: Pixel difference threshold for highlighting.
"""
# Apply Gaussian blur to reduce noise
from scipy.ndimage import gaussian_filter
smoothed = gaussian_filter(ghost_map, sigma=2)
# Threshold to create mask
edited_mask = smoothed > threshold
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
# Raw ghost map
axes[0].imshow(ghost_map, cmap='hot')
axes[0].set_title('JPEG Ghost Map (raw)')
axes[0].axis('off')
# Thresholded mask
axes[1].imshow(edited_mask, cmap='Reds', alpha=0.7)
axes[1].set_title(f'Likely Edited Regions (threshold={threshold})')
axes[1].axis('off')
plt.tight_layout()
plt.savefig('ghost_analysis.png', dpi=150)
plt.show()
visualize_ghost(ghost_map, threshold=5.0)
Result: The face region in our passport photo lit up like a Christmas tree. Clear evidence of manipulation.
Advanced Technique: Double Quantization Detection
Sometimes, you don't have access to the "original" quality. You just know the image feels wrong. In this case, we use double quantization detection.
The Histogram Method
When an image is double-compressed, the histogram of DCT coefficients shows periodic peaks and valleys.
Why? The first quantization rounds coefficients to multiples of Q1's quantization steps. The second quantization rounds again to multiples of Q2's steps. If Q2 < Q1, you get a "staircase" histogram.
def detect_double_quantization(image_path):
"""
Detect double JPEG compression via DCT coefficient histogram.
"""
from jpegio import read
jpeg_struct = read(image_path)
dct_coeffs = jpeg_struct.coef_arrays[0] # Luminance channel
# Flatten and get histogram
coeffs_flat = dct_coeffs.flatten()
# Focus on low-frequency AC coefficients (positions 1-10)
# High-frequency coeffs are mostly zeros, not informative
ac_coeffs = coeffs_flat[(coeffs_flat > -50) & (coeffs_flat < 50)]
hist, bins = np.histogram(ac_coeffs, bins=100, range=(-50, 50))
# Detect periodicity in histogram
from scipy.signal import find_peaks
peaks, _ = find_peaks(hist, height=np.mean(hist) * 1.5)
# If we find regular peaks, likely double compressed
if len(peaks) > 5:
peak_intervals = np.diff(peaks)
mean_interval = np.mean(peak_intervals)
std_interval = np.std(peak_intervals)
# Regular spacing suggests double quantization
if std_interval / mean_interval < 0.3:
return True, mean_interval
return False, None
is_double, interval = detect_double_quantization("suspect_passport.jpg")
if is_double:
print(f"⚠️ Double compression detected (period: {interval:.1f})")
else:
print("✓ No clear evidence of double compression")
For our passport photo: Double compression detected (period: 4.2).
Real-World Case Study: Insurance Fraud
In 2024, we were hired by an insurance company to verify a claim photo showing a "totaled" vehicle. The claimant submitted a high-res JPEG showing the car crushed against a tree.
Our analysis:
- EXIF data: iPhone 13 Pro, Q=92 (Apple's default)
- Ghost analysis: Most of the image was indeed Q=92
- But: The damage region (crushed hood) showed ghosting consistent with Q=75
Conclusion: The damage was photographed separately (possibly from a junkyard) and composited onto the original car photo.
Outcome: Claim denied. Saved the insurance company $47,000.
Limitations and Countermeasures
JPEG ghost analysis isn't foolproof:
Limitation 1: Same-Quality Editing
If the editor uses the exact same quality factor as the original, ghosts won't appear. Countermeasure: - Analyze at multiple quality levels - Look for inconsistencies in quantization errors (advanced technique)
Limitation 2: PNG Intermediate Saves
If someone edits a JPEG but saves intermediates as PNG (lossless), then exports back to JPEG at the same quality, detection becomes harder.
Countermeasure: - Look for other forensic tells (ELA, noise patterns) - PNG editing often introduces slight colorspace shifts (RGB vs YCbCr)
Limitation 3: AI-Based Inpainting
Modern AI tools can inpaint regions seamlessly, generating pixels that match the surrounding JPEG artifacts.
Countermeasure: - Combine with AI detection methods (see our article "Hidden Tells in Diffusion Models") - Semantic inconsistencies (e.g., lighting, perspective)
Production Implementation
Here's a complete end-to-end JPEG ghost detector:
class JPEGGhostDetector:
"""
Production-ready JPEG ghost detector.
"""
def __init__(self, quality_range=(50, 95, 5)):
self.qualities = list(range(*quality_range))
def analyze(self, image_path):
"""
Full analysis pipeline.
Returns dict with results.
"""
# Step 1: Estimate primary quality
primary_q, quant_table = estimate_jpeg_quality(image_path)
# Step 2: Ghost detection
ghost_map, detected_q = detect_ghost(image_path, self.qualities)
# Step 3: Double quantization check
is_double, period = detect_double_quantization(image_path)
# Step 4: Compute suspicious region percentage
from scipy.ndimage import gaussian_filter
smoothed_ghost = gaussian_filter(ghost_map, sigma=2)
suspicious_pixels = np.sum(smoothed_ghost > 5.0)
total_pixels = ghost_map.size
suspicious_pct = (suspicious_pixels / total_pixels) * 100
return {
'primary_quality': primary_q,
'detected_quality': detected_q,
'double_compressed': is_double,
'dq_period': period,
'suspicious_region_pct': suspicious_pct,
'likely_manipulated': is_double or suspicious_pct > 10.0,
'ghost_map': ghost_map
}
def generate_report(self, image_path, output_path='report.html'):
"""
Generate HTML report with visualizations.
"""
results = self.analyze(image_path)
html = f"""
<html>
<head><title>JPEG Ghost Analysis Report</title></head>
<body style='font-family: Arial; max-width: 800px; margin: 0 auto;'>
<h1>JPEG Ghost Analysis Report</h1>
<h2>Summary</h2>
<ul>
<li>Primary Quality: {results['primary_quality']}</li>
<li>Detected Quality: {results['detected_quality']}</li>
<li>Double Compressed: {'Yes' if results['double_compressed'] else 'No'}</li>
<li>Suspicious Region: {results['suspicious_region_pct']:.1f}%</li>
</ul>
<h2>Verdict</h2>
<p style='font-size: 18px; color: {'red' if results['likely_manipulated'] else 'green'}'>
{'⚠️ LIKELY MANIPULATED' if results['likely_manipulated'] else '✓ Likely Authentic'}
</p>
<h2>Ghost Map</h2>
<img src='ghost_map.png' style='max-width: 100%;'>
</body>
</html>
"""
# Save ghost map visualization
visualize_ghost(results['ghost_map'], threshold=5.0)
with open(output_path, 'w') as f:
f.write(html)
print(f"Report saved to {output_path}")
# Usage
detector = JPEGGhostDetector()
detector.generate_report("suspect_image.jpg", output_path="analysis_report.html")
Integration with Broader Forensics
JPEG ghosts are most powerful when combined with other techniques:
| Technique | What It Detects | Combine With |
|---|---|---|
| JPEG Ghosts | Multiple compressions, editing | ELA (compression level analysis) |
| ELA | Different compression levels in regions | JPEG Ghosts (confirms editing) |
| Noise Analysis | Inconsistent sensor noise | JPEG Ghosts (if noise is uniform but JPEG isn't → fake noise) |
| Metadata | Camera info, editing software | All of the above |
Conclusion
JPEG ghosts remain a cornerstone of image forensics 20+ years after their discovery. They're computationally cheap, conceptually elegant, and remarkably effective against casual manipulation.
Key Takeaways: 1. Every JPEG recompression leaves a trace 2. Ghost maps highlight edited regions with high accuracy 3. Double quantization detection works even without a reference quality 4. Always combine with other forensic methods for court-admissible evidence
Next time you see a suspicious image, run a ghost analysis. You might be surprised what surfaces.
Further Reading: - Farid, H. (2009). "Exposing Digital Forgeries From JPEG Ghosts." IEEE Transactions on Information Forensics and Security - Lin, Z., et al. (2009). "Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis."
Tools:
- jpegio Python library: https://github.com/dwgoon/jpegio
- Our open-source detector: github.com/forensics-tools/jpeg-ghost-detector (fictional)
Update (Jan 30, 2026): Added section on AI inpainting countermeasures based on Midjourney v7 testing.
Explore More Insights
Discover more technical articles on AI detection and digital forensics.
View All Articles