OpenAI and Google have reportedly utilized transcriptions of YouTube videos to train their AI models, potentially infringing on the copyrights of creators. The New York Times has revealed the extensive measures taken by OpenAI, Google, and Meta to maximize the data fed to their AIs, as disclosed by several sources familiar with their practices. This revelation follows YouTube CEO Neal Mohan’s recent statement to Bloomberg Originals, expressing concern over OpenAI’s alleged use of YouTube videos to train its new text-to-video generator, Sora, in violation of the platform’s policies.
According to the NYT, OpenAI employed its Whisper speech recognition tool to transcribe over one million hours of YouTube videos, which were then used to train GPT-4. The Information had previously reported that OpenAI utilized YouTube videos and podcasts to train the two AI systems, with OpenAI president Greg Brockman reportedly involved in the process. This raises questions about Google’s adherence to its own rules…
2024-04-06 11:35:31
Article from www.engadget.com