OpenAI responds to The New York Times' lawsuit: training is fair use and there is an opt-out, “regurgitation” is a rare bug, and NYT “manipulated” its models

written evidence (LLM0113) Dan Milmo / The Guardian : ‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says Ryan Daws / AI News : Copyrighted data ‘impossible’ to avoid for AI training Preston Gralla / Computerworld : New York Times' blockbuster suit could decide the fate of genAI Thomas Claburn / The Register : 'Impossible to train today's leading AI models without using copyrighted materials' Jak Connor / TweakTown : OpenAI calls out The New York Times, saying its ‘not telling the full story’ Kevin Okemwa / Windows Central : OpenAI admits it's ‘impossible’ to create ChatGPT-like tools without using copyright material, amid court battles over intellectual property theft allegations Wesley Yin-Poole / IGN : Amid an Increasing Number of Lawsuits, OpenAI Says It's ‘Impossible’ to Train ChatGPT Without Copyrighted Material Andrew Fenton / Cointelegraph : Scam AI ‘kidnappings’, $20K robot chef, Ackman's AI plagiarism war: AI Eye James Morales / CCN.com : OpenAI Claims New York Times Wanted Partnership and ‘Tricked’ GPT in Copyright Feud Steve Ranger / ITPro : Copyright spats show generative AI training has become a major legal minefield Carl Franzen / VentureBeat : OpenAI responds publicly to NY Times copyright lawsuit: ‘without merit’ Julia Shapero / The Hill : New York Times-ChatGPT lawsuit poses new legal threats to artificial intelligence Emilia David / The Verge : OpenAI claims The New York Times tricked ChatGPT into copying its articles Matthias Bastian / The Decoder : OpenAI says it's “impossible” to train state-of-the-art models without copyrighted data Cade Metz / New York Times : OpenAI Says New York Times Lawsuit Against It Is ‘Without Merit’ Hasan Chowdhury / Business Insider : OpenAI owes its copyright woes to Silicon Valley's famous mantra: ‘Move fast and break things’ Gintaras Radauskas / Cybernews.com : OpenAI responds to New York Times copyright lawsuit, sees manipulation Luke Jones / WinBuzzer : OpenAI Hits Back at Accusations of Unauthorized Data Use By The New York Times Devesh Beri / MSPoweruser : OpenAI defends fair use in response to New York Times lawsuit over AI-generated content Liam Dawe / GamingOnLinux : OpenAI say it would be ‘impossible’ to train AI without pinching copyrighted works Sarvesh Mathi / MediaNama : OpenAI responds to The New York Times copyright lawsuit calling it meritless Jenny Darmody / Silicon Republic : NYT AI lawsuit is ‘without merit’, says OpenAI Mariella Moon / Engadget : OpenAI admits it's impossible to train generative AI without copyrighted materials Glory Kaburu / Cryptopolitan : OpenAI Dismisses New York Times Lawsuit, Alleges Manipulation of AI Model Shraddha Goled / TechCircle : NYT's lawsuit against OpenAI intensifies copyright debate Mike Dalton / Bitcoin Insider : OpenAI claims New York Times was in partnership talks prior to lawsuit Music Ally : OpenAI hits back in New York Times copyright lawsuit Mehrotra A / Neowin : OpenAI dismisses NYT lawsuit, says the publication tricked ChatGPT into copying its articles Jose Antonio Lanz / Decrypt : ‘Not Telling The Full Story’: OpenAI Challenges NYT's Copyright Lawsuit Claims Roger Montti / Search Engine Journal : New York Times Lawsuit Based On Misuse Of ChatGPT PYMNTS.com : OpenAI Says New York Times Lawsuit Surprising and Without Merit Noor Al-Sibai / Futurism : OpenAI Pleads That It Can't Make Money Without Using Copyrighted Materials for Free Chris Cooke / CMU : New York Times journalist slams OpenAI as “no different than any other thief” in latest copyright lawsuit Clare Duffy / CNN : OpenAI claims copyright lawsuit from The New York Times is ‘without merit’ George Hammond / Financial Times : OpenAI says New York Times ‘manipulated’ ChatGPT in copyright feud Hayden Field / CNBC : OpenAI responds to New York Times lawsuit, says ‘regurgitation’ of content is a ‘rare bug’ Winston Cho / The Hollywood Reporter : OpenAI Responds to New York Times Lawsuit, Claims Paper “Intentionally Manipulated” Prompts Kyle Wiggers / TechCrunch : OpenAI claims New York Times copyright lawsuit is without merit Ben Thompson / Stratechery : The New York Times' AI Opportunity Threads: Adam Lasnik / @thatadamguy : Two key points about this situation: 1) OpenAI offers an opt-out option for any publishers that don't want their content used in training. NY Times availed themselves of this option in Aug 2023 but still sued OpenAI afterwards. 2) One or more of the alleged examples of copying (particularly re Wirecutter) were, essentially, staged... and included the NY Times plying ChatGPT with literally paragraphs of copied text as a prompt. Not something ANY user would do, logically or practically. Zoe Schiffer / @reporterzoe : OpenAI responds to the New York Times lawsuit, saying the claims are without merit. “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.” Adam Lasnik / @thatadamguy : I've been increasingly frustrated with the NY Times for a variety of reasons over the years, but their clear bad-faith here (essentially falsifying evidence) makes me even more likely to cancel my longstanding subscription. There are valid debates to be had about AI laws and ethics but man the NY Times is a terrible no-good plaintiff. Mastodon: Jeff Jarvis / @jeffjarvis@mastodon.social : .@OpenAI responds to the NYTimes suit. I agree that training is fair use and will say something similar — about a right to learn — in my testimony in the Senate Wednesday. (I'll post my remarks tomorrow.) — https://openai.com/... Karl Auerbach / @karlauerbach@sfba.social : @Techmeme I am not so ready to jump to the conclusion that “training” can be defended as “fair use” (nor “transformative use” either.) — Clearly, humans can read books in a library and learn - that's OK - it's not even considered as copying. … Stephen Shankland / @stshank@mstdn.social : OpenAI response to the NYT lawsuit: fair use lets us ingest everything, we offer an opt-out to do the right thing, article regurgitation is rare, we don't need the NYT for training data, and the NYT lawsuit is “without merit.” — As with many tech issues, there's plenty of nuance here, so evaluate the hot takes judiciously. … Martin Dougiamas / @martin@openedtech.social : I have a lot more thoughts about this whole topic but one central one is that I think most copyright law in general is the capitalist problem and just bunk. And NYT is in the same camp — In reality, EVERYONE is being influenced by everyone else all their life. What thoughts can you really “own”? … Fifi Lamoura / @fifilamoura@eldritch.cafe : @Mediagazer This really distorts ideas of “fair use” since it's for commercial purposes. Will Oremus / @willoremus@mastodon.social : OpenAI just published its response to the NYT copyright lawsuit: https://openai.com/... Which I wrote about in some depth here: https://www.washingtonpost.com/ ... X: Dan Froomkin / @froomkin : OpenAI alleges that NYT “intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.” lol. https://openai.com/... Isaiah Poritz / @isaiahporitz : Wow, OpenAI comes out with a very strongly worded blogpost responding to the NYTimes copyright lawsuit. “The regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites.” https://openai.com/... Tom Warren / @tomwarren : it wouldn't be impossible for OpenAI to create ChatGPT without copyrighted material. They'd just have to actually pay for it. Nicholas Diakopoulos / @ndiakopoulos : Wow, here's @openAI's rebuttal: https://openai.com/... — I tried *really hard* for a month to get GPT models to regurgitate NYT content in August leading here: https://generative-ai-newsroom.com/ ... — it was extremely rare on regular articles.. case likely cherry picks as per #4 in rebuttal Max Woolf / @minimaxir : OpenAI's blog post about the New York Times lawsuit reads less like a defense and more like a PR piece trying to convince normal people. Except that a) pro-AI people already know the legal issues of training models and b) anti-AI people don't care. https://openai.com/... Gio Rogers / @giordanorogers : OpenAI responded to the New York Times. They admit they need to fix the “regurgitation” bug. And they mention a way for publishers to block their tools. But they claim the Times' examples are not typical and were selectively chosen. https://openai.com/... @saurabhsachan : Every organization needs to build it's own models/derived/fine-tune models, if they are not working on it, today, by end of 2024 it will be like having no website in 2023 https://openai.com/... Max Kannen / @maxkannen : I think @OpenAI is in the right here. Training has to stay fair use. It is more productive to talk about the AI output. Jeremy Barr / @jeremymbarr : “The New York Times is not telling the full story,” OpenAI says in response to recent NYT lawsuit. “We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit.” https://openai.com/... Alex Volkov / @altryne : The most interesting ☕ from the response, NYT is omitting some ... facts.. shocker. [image] @themacrosift : Based on what I know about the story, it seems like the New York Times is trying to get a sweet deal from OpenAI to use their publication with their LLM's. Gary Marcus / @garymarcus : Exactly what I thought they would say. At least with visual images, it's not going to fly. It's not a “rare” bug. (See Marcus and Southen in @ieespectrum for many examples, easily elicited.) But yes, “regurgitation” is a candidate for 2024 word of the year. (@Dictionarycom, take note.) Aimee Maree / @aimee_maree : 🤣😜😂 yeah like anyone believes you and your consistent IP Theft ... may you be litigated out of existence Jacques / @jacquesthibs : Pretty sure I side with OpenAI on this whole NYT thing, and below is basically was I expected when I first hear about it [image] @nomadsvagabonds : This will be the case to watch for US copyright implications of copyrighted training data. My hope is Open AI doesn't settle and we can get clarity on fair use for training material. Otherwise the US/west will cede progress in the realm of AI. Selmar Smit / @selmarsmit : That's not opt-out, that's a “do not steal more than you already did” option Sai Prasanna / @sai_prasanna : Open source your weights + inference code or no fair use. Every time you use your weights for inference you are basically tapping into civilizational knowledge. Nick / @nicky_bonez : Fair use for research, not for developing commercial products. Rohan / @rohanposts : I'm so glad to see fair use is being argued for training. Greg Brockman / @gdb : OpenAI + Journalism: @techau : OpenAI pushes back, in relation to the NYTimes, Training (on public data) is ‘fair use’. Make no mistake, a lot of lawyers are about to become very rich battling this claim. I believe LLMs do a lot of good for the world, if companies like OpenAI were required to pay for free... @obscure_jazz : Cool to think that “regurgitation” is nowadays a bug. I remember messing around with early LMs (like a few years ago) where regurgitation was practically the primary feature. AI progress really does feel like it's exponential Kermit Da Phog / @quasarfortress : “We had explained to The New York Times that, like any single source, their content didn't meaningfully contribute to the training of our existing models and also wouldn't be sufficiently impactful for future training.” So true. The NYT think their data really adds a lot of value Colin W.P. Lewis / @drcolinwplewis : Strong and what “seems” like a transparent response by #OpenAI to the NY Times suit. @acidflask : Anyone who really believes #2 should read the actual research showing that regurgitation is very common in LLMs, and happens more often with more capacity, and I'm not aware of any research that fixes it. Two good papers to start: https://arxiv.org/... https://arxiv.org/... [image] James Ball / @jamesrbuk : This is heresy in journalism-world, but: I think the NYT lawsuit is ill-conceived and not good for journalism, and that OpenAI have a good case in rebuttal. (1/2) https://openai.com/... Andrew Ng / @andrewyng : I said some things poorly in my previous tweet, so let me elaborate/clarify. 1. I don't think it's okay for any company to regurgitate others' copyrighted content at scale without permission or a viable fair-use rationale. I should have said this more explicitly. And... I still... [image] Dare Obasanjo / @carnage4life : OpenAI argues that it would be impossible to create ChatGPT without copyrighted material. The issue is framed as whether we prioritize copyright holder profits over technological progress but in reality it's actually $MSFT stock price versus $DIS & $NYT https://www.theguardian.com/ ... Forums: Hacker News : OpenAI and journalism BeauHD / Slashdot : OpenAI Claims NYT Tricked ChatGPT Into Copying Its Articles 5 See also Mediagazer

OpenAI 2024-01-09

Chronicles

OpenAI responds to The New York Times' lawsuit: training is fair use and there is an opt-out, “regurgitation” is a rare bug, and NYT “manipulated” its models