Skip to main content

Blog Article

The Ethics of Developing Voice Biometrics

Various ethical considerations must be applied to the development of artificial intelligence technologies like voice biometrics to ensure disenfranchised populations are not negatively impacted.

Published August 29, 2024

By Nitin Verma, PhD
AI & Society Fellow

Nitin Verma, PhD, (left) conducts an interview with Juana Caralina Becerra Sandoval at The New York Academy of Sciences’ office in lower Manhattan.
Photo by Nick Fetty/The New York Academy of Sciences.

Juana Catalina Becerra Sandoval, a PhD candidate in the Department of the History of Science at Harvard University and a research scientist in the Responsible and Inclusive Technologies initiative at IBM Research, presented as part of The New York Academy of Sciences’ (the Academy) Artificial Intelligence (AI) & Society Seminar series. The lecture – titled “What’s in a Voice? Biometric Fetishization and Speaker Recognition Technologies” – explored the ethical implications associated with the development and use of AI-based tools such as voice biometrics. After the presentation, Juana sat down with Nitin Verma, PhD, a member of the Academy’s 2023 cohort of the AI & Society Fellowship, to further discuss the promises and challenges society faces as AI continues to evolve.

*Some quotes have been edited for length and clarity*

Tell me about some of the big takeaways from your research so far on voice biometrics that you covered in your lecture?

I think some of the main takeaways from the history of the automation of speaker recognition are, first, really trying to understand what are the different motivations or incentives for investing in a particular technology and a particular technological future. In the case of voice biometrics, a lot of the interesmyt is coming from different sectors like the financial sector, or the security and surveillance sector. It’s important to keep those interests in mind and observe how they inform the way in which voice biometrics get developed or not.

The other thing that’s important is that even though we have a notion of technological progress, some of the underlying ideas and assumptions are very old. This includes ideas about the body, about what the human body is, and how humans have the ability to change, or not, their body and the way they speak. In the case of voice biometrics, these ideas date back to 19th-century eugenic science, and they continue informing research, even as we have new technologies. We need to not just look at this technology as new, but ask what are the ideas that remain, or that sustain over time, and in which context did those ideas originate.

So, in your opinion, what role does, or would, AI play in your historical accounting of voiceprint technology?

I think, in some way, this is the story of AI. So, it’s not a separate story. AI doesn’t come together in the abstract. It always comes along in relation to a particular application. A lot of the different algorithmic techniques we have today were developed in relation to voice biometrics. Really what AI entails is a shift in the logic of the ontology of voice where you can have information surface from the data or emerge from statistical methods, without needing to have a theory of what the voice is and how it relates to the body or identity and illness. This is the kind of shift and transformation that artificial intelligence ushers.

What would you think is the biggest concern regarding the use of AI in monitoring technologies such as voice biometrics?

Well, I think concerns are several. I definitely think that there’s already inscripted within the history of voice biometrics an interest in over-policing, and over-surveilling of Black and Latinx communities. There’s always that inherent risk that technology will be deployed to over-police certain communities and voice biometrics then enter into a larger infrastructure where people are already being policed and surveilled through video with computer vision or through other means.

In the security sector, I think my main concern is that there’s a presumption that the relationship between voice and identity is fixed and immutable, which can create problems for people who want to change their voice and or for people whose voice changes in ways outside of their control, like from an injury or illness. There are numerous reasons why people might be left out of these systems, which is why we want to make sure we are creating infrastructures that are equitable.

Speaking to the other side of this same question, in your view, what would be some of the beneficial or ethical uses of this technology going forward?

Rather than starting from the point of ‘what do corporations or institutions need to make their job easier or more profitable?’, we should instead focus on ‘what are the kinds of tools and techniques that people want for themselves and for their lives?’, and ‘in what ways can we leverage the current state of the art towards those ends?’. I think it’s much more about the approach and the incentive.

There’s nothing inherent to technology that makes it cause irreparable harm or be inherently unethical. It’s more about: what is the particular ontology of voice?; what’s the conception of voice that goes into the system?; and towards whose ends is it being leveraged? I’m hopeful and optimistic about anything that is driven by people and people’s desires for a better life and a better future.

Your work brings together various threads of research or inquiry, such as criminology, the history of technology, inequality, and the history of biometric technology as such. What are some of the challenges and benefits that you’ve encountered on account of this multidisciplinary approach to studying the topic?

I was trained as a historian, and originally my idea was to be a professor, but once I started working at IBM Research and the Responsible and Inclusive Tech team, I think I got much closer to the people who very materially and very concretely wanted to make technology better, or, more specifically, to improve the infrastructures and the cultures in which technology is built.

That really pushed me to take a multidisciplinary approach and to think about things not just from a historical lens, but be very rooted in the technical, as well as present day politics and economic structures. I think of my own immigrant background. I’m from Colombia and I naturally already had this desire to engage with humanities and social science scholarship that was critical of these aspects of society, but this may not be the same for everyone. I think the biggest challenge is effectively engaging different audiences.

In the lecture you described listening as a political process. Can you elaborate on that?

I’m really drawing on scholars in sound studies and voice studies. The Sonic Color Line, Race as Sound, and Black Linguistics, are three of the main theoretical foundations that I am in conversation with. The point they try to make is that when we attend to listening, rather than voice itself as a sort of thing that stands on its own, we can see and almost contextualize how different voices are understood, described, interpreted, classified, and so on.

The political in listening is what makes people have reactions to certain voices or interpret them in particular ways. Accents are a great example. Perceptions of who has an accent and what an accent sounds like are highly contextual. The politics of listening really emphasizes that contextuality and how we’ve come to associate things like being eloquent through particular ways of speaking or with how particular voices sound, and not others.

Is there anything else you’d like to add?

Well, I think something that strikes me about the story of voice biometrics and voiceprints is how little the public knows about what’s happening. A lot of decisions about these technologies are made in contexts that are not publicly shared. So, there’s a different degree of awareness in the kind of different public discourses around the ethics of AI and voice. It’s very different from facial recognition, computer vision, or even toxic language.


Author

Image
Nitin Verma, PhD
AI & Society Fellow
Nitin is a Postdoctoral Research Scholar in the area of AI & Society jointly at ASU's School for the Future of Innovation in Society (SFIS) and the New York Academy of Sciences. His research focuses on the notions of trust and belief-formation and the implications of generative AI broadly for trust in public institutions and democratic processes. His overarching research interest is in studying how information technologies and societies co-shape each other, the role of the photographic record in shaping history, and in the deep connection between human curiosity and the continuing evolution of the scientific method.