/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

AI companies, running out of conventional training datasets from the web, may be forced to shift from big, all-purpose LLMs to smaller, more specialized models

why human-sourced data can help prevent AI model collapse Matthias Bastian / The Decoder : OpenAI co-founder says AI is reaching “peak data” as it hits the limits of the internet Kylie Robison / The Verge : During his NeurIPS talk, Ilya Sutskever says “Pre-training as we know it will end”, as “we've achieved peak data and there'll be no more” Bluesky: James McDermott / @jmmcd : I refuse to read the article but I wonder what scenario people like this imagine, when they hear we're running out of data.  Do they think it's used up, or deleted? [embedded post] Nicolai B. Hansen / @nbhansen.dk : Honestly much more interested in small specialized models than big all purpose models.  They so far have not really been that useful.  [embedded post] Casey Newton / @caseynewton : Seems to me that one answer to “we have run out of data to steal” could be to pay people to make stuff and use it with their consent [embedded post] @photogenealogy : Using AI generated data to train AI!  —  😳 That can't be good  —  www.nature.com/articles/d41... Scott McGrath / @smcgrath.phd : 🧪 AI faces a significant data bottleneck: by 2028, training data may equal the total stock of public online text.  —  Solutions like synthetic data, smaller models, and specialized datasets are key to future advances.🩺💻 #MLSky Katherine Stiles / @katherinestiles.org : The Internet is a vast ocean of human knowledge, but it isn't infinite.  And artificial intelligence (AI) researchers have nearly sucked it dry. Christian Frezza / @frezzalab : The sad realistion that someone will have to do experiments at some point....damn it  —  “The AI revolution is running out of data.  What can researchers do?”  —  www.nature.com/articles/d41... X: Eric Topol / @erictopol : What happens when LLMs run out of data to ingest? https://www.nature.com/... @nature feature by @nicolakimjones [image] @bermaninstitute : The AI revolution is running out of data. What can researchers do? AI developers are rapidly picking the Internet clean to train large language models such as those behind ChatGPT. Here's how they are trying to get around the problem. https://www.nature.com/... Steven Ashley / @steveashleyplus : At current growth rates, the AI industry runs out of readily accessed HQ data in four yrs... Workarounds incl using synthetic data and less-easily-accessed data. (N) https://www.nature.com/... Nicola Jones / @nicolakimjones : “Compute is growing but the data is not growing... data is the fossil fuel of AI” - Ilya Sutskever #NeurIPS2024 Read my story in Nature about the data shortage: https://www.nature.com/...

Nature