Biometric Systems & Deepfakes — Francesco Barbato

◈The Gap Between Trust and Reality

Biometric authentication was sold as the answer to password fatigue — and for years, it delivered. Face recognition, fingerprint scanning, voice verification: each replaced something forgeable with something physically unique. The underlying assumption was that biological identity was harder to fake than a string of characters.

That assumption is now wrong. Not theoretically, not in edge cases — practically wrong, at consumer-grade cost, with publicly available tools. A face-swap deepfake convincing enough to defeat commercial liveness detection can be generated in real time on a laptop. A voice clone indistinguishable from the target requires less than ten seconds of audio scraped from a public source.

The dangerous part is the gap between organisational security posture and actual attack capability. Most organisations believe their biometric systems are secure. Most of them are wrong.

⚠ Current Threat Reality

Consumer deepfake tools capable of defeating passive liveness detection are freely available. Voice cloning with 10s of audio requires no technical expertise. The threat is actively exploited in financial fraud, account takeover and corporate espionage scenarios today.

The organisations most at risk are not those that ignored biometrics — they are those that implemented it, decided the problem was solved, and stopped paying attention.

◆Five Attack Vectors — How They Work

The deepfake threat to biometrics is not monolithic — different attack types require different capabilities, target different systems and are defeated by different countermeasures.

01Face-Swap Injection

Critical

A real-time deepfake face is injected into the video stream before it reaches the biometric verification endpoint. The camera API is intercepted and replaced with a synthetic feed — the liveness check never sees real video.

02Replay Attack

High

A recorded video or image of the target is replayed in front of the camera. Older liveness systems that rely solely on motion detection or eye-blink detection are defeated by looping high-quality footage.

03Voice Cloning

Critical

AI voice synthesis clones a target's voice from as little as 3–10 seconds of publicly available audio. The clone is used to spoof voice authentication systems, phone banking verification, or live call impersonation.

04GAN Synthetic Identity

High

A fully synthetic face — generated by a GAN or diffusion model — is used to create a fraudulent identity document. The face has never existed; there is no source individual to trace. Document texture and metadata are also synthesised.

053D Mask Spoofing

Medium

A high-fidelity 3D-printed or silicone mask of the target's face is presented to the camera. More expensive and resource-intensive than digital attacks, but defeats depth-sensing and IR liveness checks that digital injection cannot.

◆ Real-Time Face-Swap Attack Pipeline

ASource materialPublic photos/video

BModel trainingGAN / diffusion

CReal-time synthFace-swap engine

DStream injectionAPI interception

EBiometric APITarget endpoint

FAccess grantedAuth bypass

Key Pattern

The most effective modern attacks combine vectors. A face-swap injection paired with a voice clone defeats systems that evaluate audio and video independently but don't check their synchrony against physiological constraints.

◉How Detection Works — and Where It Fails

Biometric anti-spoofing research has produced a range of detection approaches, each targeting a different aspect of synthetic media's distinguishing characteristics. The challenge is that every detection signal has a corresponding adversarial bypass.

Passive Liveness Detection

52%

estimated detection reliability

Analyses single frames or short clips for texture artifacts, skin reflectance patterns and micro-expressions without requiring active user participation.

✓ Defeats

Static photos

Basic replays

✗ Fails against

High-quality video replays

Real-time deepfake injection

3D masks

Active Liveness (Challenge-Response)

68%

estimated detection reliability

Prompts the user to blink, turn, smile or speak a random phrase. Requires the deepfake to respond dynamically — harder, but achievable with real-time generation.

✓ Defeats

Pre-recorded replays

Static images

Basic deepfakes

✗ Fails against

Real-time face-swap

Highly responsive synthetic generators

3D masks

Frequency-Domain Artifact Analysis

74%

estimated detection reliability

Deepfake generators leave characteristic artifacts in the DCT/FFT frequency domain that are invisible to the human eye but detectable algorithmically. Compression removes some artifacts — high-quality fakes are harder to catch.

✓ Defeats

GAN-generated faces

Most face-swap tools

Compressed video fakes

✗ Fails against

Adversarially trained generators

High-bitrate uncompressed fakes

The rPPG signal — hardest to fake

Remote photoplethysmography (rPPG) measures subtle changes in skin colour caused by blood flow — changes imperceptible to the human eye but detectable in video at the pixel level. They are present in real faces and absent in synthetic ones — because no current generative model correctly simulates sub-pixel haemodynamic variation.

rPPG is currently the hardest signal for adversarial deepfakes to defeat simultaneously with face fidelity — making it one of the most valuable signals in a multi-layer detection stack.

Why Single Signals Fail

Every single detection signal has been demonstrated to be bypassable under adversarial conditions. Only a combination of multiple uncorrelated signals — scored holistically — provides robust detection.

Detection signal effectiveness matrix

Each detection approach mapped against the five attack vectors, rated by effectiveness under realistic adversarial conditions.

◉ Detection Signal Effectiveness per Attack Vector

Detection Signal	Face-swap	Replay	Voice clone	GAN ID	3D mask
Texture anomaly (passive)	Mod.	Good	—	Mod.	Weak
Depth sensor / IR liveness	Weak	Good	—	Weak	Weak
Frequency-domain artifacts	Good	Mod.	—	Good	—
Physiological signals (rPPG)	Good	Strong	—	Mod.	Mod.
Temporal consistency check	Good	Good	—	Weak	Weak
Voice + face synchrony	Mod.	Good	Good	Weak	Weak
Voiceprint analysis	—	—	Good	—	—
Challenge-response (active)	Weak	Strong	Mod.	Weak	Weak

◇The Arms Race — Why This Keeps Getting Harder

Every time a new detection technique is published, it becomes training signal for the next generation of generative models. Deepfake detectors trained on current-generation fakes are systematically defeated by the next generation, often within months.

The GAN-detector feedback loop

GAN-based face synthesis works by training a generator and a discriminator simultaneously. When detection research produces a new discriminator architecture, it effectively accelerates generator improvement. Diffusion models have added a new dimension — unlike GANs, they do not produce the same characteristic frequency artifacts, meaning detection systems trained on GAN outputs have systematically lower performance against diffusion-generated faces.

Deployment Gap

A system achieving 95% accuracy against contemporary deepfakes at deployment may fall to 60–70% within 18 months. Biometric anti-spoofing is not a deploy-and-forget system — it requires continuous retraining against current-generation synthetic media.

Voice cloning — the underappreciated risk

Voice cloning is arguably the more immediate practical risk — because voice authentication is widely deployed in contexts where the attacker has no camera, the session is unmonitored, and the target's voice is often publicly available. Modern zero-shot voice cloning has made the acquisition barrier negligible.

▣Mitigation Framework — Prioritised Actions

No single countermeasure is sufficient. The research produced a prioritised mitigation framework based on attack severity, deployment prevalence and implementation feasibility.

P1High effort

Remote photo / KYC injection

Deploy stream integrity verification at the OS/driver level, not just application layer. Validate that video metadata matches declared device capabilities.

P2Medium effort

Voice authentication spoofing

Add anti-spoofing challenge with randomised phoneme sequences. Layer voiceprint analysis with physiological markers (breathing rhythm, mouth articulation sync).

P3Medium effort

GAN synthetic identity fraud

Add frequency-domain analysis to document photo verification pipelines. Flag faces with unusually low high-frequency noise — a GAN signature.

P4Low effort

Replay attacks on legacy systems

Replace passive liveness with active challenge-response. Randomise challenge sequence and timestamp-bind sessions to prevent replay of recorded responses.

P5High effort

All vectors (defence-in-depth)

No single signal is sufficient. Layer at minimum: active liveness + frequency analysis + physiological micro-signal (rPPG). Combine scores with a risk-weighted model rather than individual thresholds.

The multi-signal principle

The single most important architectural decision is to never rely on one signal. Even if each individual signal has a 20% false-negative rate, three independent uncorrelated signals combine to a 0.8% combined false-negative rate — a 25x improvement.

Recommended Signal Combination

The most robust combination currently available: (1) active challenge-response liveness + (2) frequency-domain artifact analysis + (3) rPPG physiological signal. These three measure behaviour, generation artifacts and physiology respectively — largely independent failure modes.

◉Research Outcomes

This research produced a structured threat model and detection framework for organisations deploying biometric authentication in high-stakes contexts — financial services, identity verification, physical access control and remote work verification.

Five attack vectors documented in detail — mechanism, tools, cost, target systems and detection difficulty, current as of 2025.
Three detection layers evaluated against each vector — with reliability estimates under realistic adversarial conditions, not lab benchmarks.
A signal effectiveness matrix mapping detection approach to attack type, enabling organisations to identify gaps in their current stack.
A prioritised mitigation framework with five actions ordered by risk-adjusted priority and implementation effort.
Documentation of the GAN→diffusion transition risk — the systematic detection capability loss affecting systems deployed on pre-2023 training data.
A recommended three-signal detection architecture (active liveness + frequency analysis + rPPG) robust against current-generation attacks.

Key Takeaway

The organisations that will be most resilient are those that build layered, continuously retrained systems and treat biometric anti-spoofing as an ongoing operational discipline. The threat is adaptive. The defence must be too.

Biometric systems
and deepfakes: attack vectors,
detection and prevention.

The rPPG signal — hardest to fake

Detection signal effectiveness matrix

The GAN-detector feedback loop

Voice cloning — the underappreciated risk

The multi-signal principle

Got something
complex to build?

Biometric systemsand deepfakes: attack vectors,detection and prevention.

The rPPG signal — hardest to fake

Detection signal effectiveness matrix

The GAN-detector feedback loop

Voice cloning — the underappreciated risk

The multi-signal principle

Got something complex to build?

Biometric systems
and deepfakes: attack vectors,
detection and prevention.

Got something
complex to build?