Every time I publish a story, it gets stolen about 20 times. For instance, my last column on holiday layoffs was copied and pasted by numerous rip-off sites more than a dozen times in a single day. They do this to get readers’ views without compensating me.
Automated content scraping sites don’t make much money, but the process doesn’t cost them much either. OpenAI, on the other hand, made $1.3 billion in revenue in 2023 without paying me a dime.
OpenAI claims that “training AI models using publicly available internet materials is fair use” in response to the New York Times’ copyright lawsuit. However, the Times argues that millions of its articles are being used to train chatbots that compete with it. OpenAI and other generativeAI (genAI) companies are making billions from the work of the paper’s writers and editors without paying for it.
OpenAI also claims that the Times can opt-out from letting its stories be used in ChatGPT’s LLM, but how did ChatGPT plagiarize articles such as a Pulitzer-Prize-winning, five-part 18-month investigation into predatory lending practices in New York City’s taxi industry?
OpenAI admits that memorization is a rare failure of the learning process, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites.
OpenAI admits that the taxi series rip-off appears to have emerged “from years-old articles that have proliferated on multiple third-party websites.”
OpenAI’s entire business model relies on hoovering up as much data as it can find, often including copyrighted material.
For more information, you can read the full article from www.computerworld.com.
2024-01-31 16:41:03