Papercup, a startup based in the U.K., has raised $20 million for an AI-powered dubbing service for translating speech and expression into other languages. The funding will allow Papercup to enhance their research around expressive voices, expand into new languages and scale their offering in markets in which they know their technology works well.
Customers upload videos, choose a target language and receive a translated version with a synthetic makeover. Sky News, Discovery and Business Insider have already used the technology. More than 300 million people watched videos translated by Papercup in the last year.
The promise of AI dubbing
Markets and Markets predicts the global film dubbing market could grow from $3.1 billion in 2021 to $4.7 billion in 2028. Currently, most of this dubbing is done through a manual and expensive process involving human translators and actors.
Papercup has developed a novel human-in-the-loop process that allows it to achieve the same quality as dubbing studios, but at one-fifth of the cost. The hope is that this will unlock a much bigger market opportunity in games, podcasts, corporate training and audiobooks.
Other companies pursuing AI-powered dubbing include Deepdub, Respeecher and Resemble AI. In addition, other companies like Synthesia are hoping to not only dub videos, but also synchronize lip movements for greater realism.
Quality feedback loop
A main focus for the company was finding ways to augment human translators rather than replace them. Competitor, Verbit, raised $250 million last year to scale a similar approach for the transcription industry and is already doing something similar for high-quality transcription services. The key is to develop a better feedback loop that makes it easy for experts to create targeted and precise feedback to not only improve individual quality, but the training data as well.
Papercup’s CEO Jesse Shemen told VentureBeat there were several challenges in generating realistic expressive dubs in foreign languages, including getting high-quality training data, automating the translation process and improving the expressiveness of voices.
The company uses a mix of third-party training data and commissioned its own to create a comprehensive catalog of training data across different demographic backgrounds and multiple languages.
“Creating workflows to do this at scale is a big challenge,” Shemen said.
Papercup has also built an in-house dubbing software stack to collaborate with partner transcription and service providers before the text-to-speech engine generates voices. This way, quality assurance teams can control this service to improve translation accuracy and add nuance. They also created a feedback loop from local listeners for continuously improving the quality and naturalness of voices.
The service uses existing AI engines for automated transcription and machine translation, before Papercup uses its own text-to-speech system to create new speech tracks. The human-in-the-loop aspect allows a professional translator to perform a quality check, edit and amend translation and speech to improve quality.
“We believe this is the most efficient way of using available AI technology, which we can build on to create a commercially viable, productized AI platform,” Shemen said.
The latest series A funding brings Papercup’s total raised to $30.5 million. Octopus Ventures led the investment, joined by Local Globe, Sands Capital, Sky and Guardian Media Ventures, Entrepreneur First, BDMI and a range of angel investors, including Des Traynor, cofounder of Intercom and John Collison, cofounder of Stripe. Existing angel investors include William Tunstall-Pedoe, founder of Evi (now Amazon’s Alexa) and Zoubin Ghahramani, senior research director at Google Brain and former chief scientist at Uber.