The outcome of the groundbreaking lawsuit by New York Times may determine the future of genAI

Here come the lawyers.

Last week, the New York Times sued Microsoft and OpenAI, in which Microsoft has invested billion and counting, for copyright violations. The Times claims Microsoft’s genAI-based Copilot and OpenAI’s ChatGPT, which powers Copilot, were trained using millions ⁣of articles without ‌the Times’s permission.

It‌ goes on to argue that those tools (and Microsoft’s search engine, Bing) “now compete with the news outlet as a source of reliable information.”

The Times isn’t seeking a specific amount of damages – yet. Ultimately, though, it wants a lot — “billions of dollars in statutory and actual damages” — because of the “unlawful copying and use ‌of The⁤ Times’s uniquely valuable works.”

Beyond that,⁢ the filing demands that Microsoft and OpenAI destroy both the datasets used to train the tools and the tools themselves.

This isn’t ‌the first ‌lawsuit claiming AI companies violated copyrights in building their chatbots, and it won’t be the last. But it is the Big ⁢Kahuna – the Times is⁤ among the best-known‍ newspapers in the world and the gold standard in journalism. And its ‍move could prove to be among the most influential lawsuits of the computer and internet age, perhaps the most influential.

That’s because the outcome could well determine the future of generative AI.

Who’s right here? Is the Times just grubbing for money,⁢ and using the lawsuit to negotiate a better⁢ rights deal with Microsoft and OpenAI for use ⁤of its⁤ articles? Or ‌is it standing up for‍ the rights‍ of all⁣ copyright holders, no matter ‌how small, against the ⁣onslaught of the AI titans?

What’s in the lawsuit?

To get a better ‌understanding of what’s involved, let’s first take a closer look at the ‍underlying technology involved and the suit itself. GenAI chatbots like Copilot and ChatGPT ‌are trained on large language models (LLMs) — which include tremendous amounts of data — to be effective and useful. The more‍ data, the better. And just as important is the quality of the data. The better the quality of the data,‌ the better the genAI results.

Microsoft and OpenAI use content available on ⁢the internet to train their tools, regardless of whether that content is public‌ domain information, open source data, or copyrighted material; it all gets ingested by the great,‌ hungry maw of genAI. That means‌ millions and millions of articles from the Times and myriad other publications are used for training.

Microsoft and OpenAI contend that those articles⁤ and all other copyrighted material are covered by the fair use doctrine. Fair use is an exceedingly complicated and confusing legal concept, and there’s an unending stream of lawsuits that determine what’s fair use and what isn’t. It’s widely open to interpretation.

That’s why the Times lawsuit is so ⁢important. It will determine whether all genAI tools, not‌ just those owned by‍ Microsoft and OpenAI, can continue to be trained on copyrighted material….

2024-01-13 09:00:04
Source from www.computerworld.com rnrn