Speaker-invariant representation Learning for spoofing detection via gradient reversal and a variational information bottleneck

Dao, Anh-Tuan; Matrouf, Driss; Rouvier, Mickael; Evans, Nicholas

ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal

Sophisticated generative speech technology can undermined the reliability of voice biometrics. While spoofing detection systems excel when assessed under in-domain conditions, generalisation to out-of-domain settings is often poor. In this paper, we show that such issues could be caused by speaker bias, where models learn individual voice traits rather than markers of manipulation or generation. We propose a teacher-student framework for speaker-invariant spoofing detection that disentangles identity without requiring speaker labels. We leverage a pre-trained speaker recognition teacher to guide a student model via a gradient reversal layer. To control the balance between suppressing cues related to voice identity with the preservation of those related to spoofing detection, we integrate a Variational Information Bottleneck. Evaluations across nine datasets show our model achieves a 25.7% relative reduction to the EER compared to the MHFA baseline.

Detail

ARXIV

BIBTEX

Type:

Conférence

City:

Lisbon

Date:

2026-06-23

Department:

Sécurité numérique

Eurecom Ref:

8799

© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in ODYSSEY 2026, Speaker and Language Recognition Workshop, 23-26 June 2026, Lisbon, Portugal and is available at :