On March 30th, OpenAI unveiled its latest research achievement, the “Voice Engine,” on its official website. This technology can generate natural speech highly similar to the original speaker’s voice through short 15-second audio samples and text inputs.
In the announcement, OpenAI provided some early application scenarios for the Voice Engine. These include assisting children in reading with natural and emotionally rich voices, translating video and podcast content, improving community services in remote areas, and helping patients with sudden or degenerative speech disorders regain their voices.
For the aforementioned application scenarios, OpenAI also provided technical cases completed in collaboration with a few “trusted” partners. Age of Learning, an educational company, utilized GPT-4 and the Voice Engine for personalized communication with students; Livox, an AI-powered communication app, offered natural voices across multiple languages for people with disabilities using the voice engine; and Heygen, previously known for videos like “Taylor Swift Speaking Chinese,” also adopted the Voice Engine.
OpenAI stated that development of the Voice Engine technology began at the end of 2022 and it has already provided preset voices for text-to-speech APIs and ChatGPT’s reading function. Regarding the source of data for model training, Jeff Harris, a member of the OpenAI Voice Engine product team, stated in a media interview that the model was trained using “a combination of licensed data and publicly available data.”
Although trademarks have been filed for “Voice Engine,” OpenAI remains cautious about whether to deploy this technology on a large scale in the future. In February 2024, a few companies in the United States were reported to have used AI-generated voices of presidents to influence voter turnout, which is one major reason why OpenAI has opted for a small-scale application of the Voice Engine.
The announcement indicates that due to the potential for misuse of synthesized voices, OpenAI wishes to initiate discussions on responsible deployment of synthesized voices and how society can adapt to these new functionalities. Based on discussions and results from small-scale testing, OpenAI will decide whether to deploy this technology on a large scale.
According to Latest, OpenAI has been proactive in preparing for AI safety. In October 2023, OpenAI announced the establishment of a “Preparedness Team” aimed at monitoring and evaluating the technology and risks of cutting-edge models. Subsequently, in December 2023, OpenAI further unveiled the “Preparedness Framework,” introducing a series of mechanisms developed around “tracking, evaluating, predicting, and mitigating catastrophic risks” associated with OpenAI.
Regarding the Voice Engine, OpenAI states that it is exploring methods to watermark synthesized voices or add control measures to prevent people from using technology with the voices of politicians or other public figures.