OpenAI: Copyrighted materials are essential for creating GenAI tools

OpenAI: Copyrighted materials are essential for creating GenAI tools

In response to gathering legal efforts to rein in its data collection, ‍OpenAI is‍ arguing that the creation of advanced generative AI (genAI) ⁤tools is unfeasible without ​the use of copyrighted content ​to train them.

In a report to the UK’s ⁤House ​of Lords Communications and ⁣Digital Select Committee, OpenAI said that training extensive large language models (LLMs) such as GPT-4, the underlying technology of ChatGPT, would be impossible without the use of copyrighted materials.

“Because copyright today covers virtually every sort of human expression ⁣— including blog posts, photographs, forum posts, scraps of software‌ code, and government documents⁤ — it would be impossible to train today’s leading AI models without using ‍copyrighted materials,” OpenAI said in its‍ submission.

GenAI applications ​such as⁤ ChatGPT or the image-generation⁤ tool Stable Diffusion are built using vast amounts of data — much of it ‍protected by copyright laws — collected from the internet. That’s ⁣led to increasing pushback from publishers and authors who say their ⁢work ‍is‌ being used without‍ credit or compensation.

Concerns about copyrighted code

Developers have been using⁣ resources such as ​Google ‌and StackOverflow ​for decades, said Daniel ‍Li, CEO of Plus Docs, a company whose‌ software uses ‍genAI‌ to design, create, and edit presentations. ChatGPT, he said, ‌simply allows ‍even more ease of use when coding.

“The important thing to realize, however, is⁢ that developers still need ⁤to understand their code. ChatGPT doesn’t change that requirement,” he said.

Li agreed that “companies⁢ need to be very careful ‍they⁢ are not using code‌ or other copyrighted ‌text. ‌This is already a major topic in software⁢ acquisitions for big tech ​companies, and it will only become more important.”

The statement by OpenAI comes as the company faces a raft of legal actions. Just last week, The New York​ Times filed a lawsuit against it and Microsoft, a significant ‍investor in the company and a user of its tools in various Microsoft products; the suit alleges illegal​ use of New‌ York Times content in the creation of ‌OpenAI tools. OpenAI argued in⁣ return that copyright law ⁣does not prohibit the training of genAI models.

OpenAI ⁣last year faced a federal class action​ lawsuit in California accusing it of​ unlawfully ​using personal data for training purposes. That ‌lawsuit, lodged in the Northern District of California, cited 15 violations, including breaches ​of the Computer ‍Fraud and Abuse Act, the Electronic Communications⁣ Privacy Act, and various⁤ consumer‍ rights statutes at ⁢the state level.

The central ​allegation of the California suit is ⁤that OpenAI “unlawfully acquired” ⁢the plaintiffs’ private data and used it without providing compensation.

According to⁢ the complaint,⁢ “OpenAI employed this misappropriated ‌data to refine ‌and advance [ChatGPT] through extensive language⁣ models and‌ advanced ⁤language algorithms, enabling it to‍ produce and understand language akin…

2024-01-09​ 12:00:04
Article‍ from www.computerworld.com rnrn

Exit mobile version