The influence of non-timbral cues in voice anonymisation and evaluation

Bakari, Rayane; Le Blouch, Olivier; Evans, Nicholas; Gengembre, Nicolas; Panariello, Michele; Todisco, Massimiliano
SPSC 2025, 5th Symposium on Security and Privacy in Speech Communication, 16 August 2025, Delft, The Netherlands

Most approaches to voice anonymisation focus predominantly upon the obfuscation of timbral attributes. Approaches to evaluation which use traditional automatic speaker verification (ASV) systems such as ECAPA-TDNN can result in the overestimation of anonymisation performance since they too focus on timbral cues. In this paper, we show that the use of residual non-timbral attributes, e.g. related to prosody, rhythm, style and accent which also carry information related to the voice identity, can still be used to re-identify the speaker. When timbral cues are compromised, non-timbral cues can provide more reliable estimates of anonymisation performance. We also show that, when trained to focus on non-timbral attributes, a WavLMbased model outperforms the baseline ECAPA-TDNN model when operating upon anonymised speech. Using the latter, the equal error rate for the best 2024 VoicePrivacy Challenge baseline is overestimated by 32% relative. Ultimately, we hope to provide a fresh perspective, laying the foundation for more robust and comprehensive evaluations of voice anonymisation and highlighting the importance to future anonymisation systems of obfuscating non-timbral information.


DOI
Type:
Conference
City:
Delft
Date:
2025-07-30
Department:
Digital Security
Eurecom Ref:
8315
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in SPSC 2025, 5th Symposium on Security and Privacy in Speech Communication, 16 August 2025, Delft, The Netherlands and is available at : http://dx.doi.org/10.21437/SPSC.2025-6

PERMALINK : https://www.eurecom.fr/publication/8315