The video creator left the following note on the video:
Cyberpunk 2077 Phantom Freedom Trailer in Russian, voiced by a neural network trained on voices from the original video. The voices were taken from the same video released in English. The next generation neural network can learn from voice samples as short as 10 seconds (the voice of the president was trained in just 6 seconds with her 2 sentences in the video), while older generation networks require hundreds of hours for this. The point is that I didn’t voice anything with my own voice, the network works with “text-to-speech” and only voices based on subtitles, but it can understand the context of who says what to whom, as it was conditionally allowed to listen to the original video, and it creates emotions in the voice but already in Russian with the voices of the original actors (such as Idris Elba, for example). If there is more data for training, all DLC will be voiced automatically in better quality. However, extracting voices from a video with lots of background sounds and music is not that great.
The author also gave an example of how the synthesis of voices of real game characters (V, Adam Smasher, Johnny Silverhand) would sound like.
Article from www.playground.ru