burkov · TEXXR

Cursor is becoming obsolete because Claude Code just does everything faster and better, so they are trying to hype “generative AI” when it's already looking lame. If you ask an agentic coding system with a runtime feedback loop to build an app whose meaning is clear, like “web

2026-01-20 View on X

Simon Willison's Weblog

Cursor recently experimented with using hundreds of AI agents to build a web browser; they ran for close to a week, writing 1M+ lines of code across 1,000 files

Scaling long-running autonomous coding. Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of “autonomous” coding agents:

View original

A new open-weight Kimi K2 Thinking claims to be comparable to GPT-5 and Sonnet 4.5. It's a Mixture-of-Experts (MoE) model with a total of 1T parameters and 32B activated parameters for token generation. The context length is 256K, which makes it competitive for coding. Weights [image]

2025-11-07 View on X

CNBC

Chinese startup Moonshot releases Kimi K2 Thinking, an open-weight model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI's ChatGPT in …

View original

All execs are power-hungry sociopathic liars, and the higher the exec, the better liar they are.

2025-11-03 View on X

The Information

Court docs: in a deposition, Ilya Sutskever discussed conflicts at OpenAI that he sent to board members before Sam Altman's firing, his OpenAI exit, and more

Anthropic initially expressed “excitement” about a possible merger with OpenAI two years ago, after OpenAI's board fired CEO Sam Altman …

View original

altman is a chicken. When his company obviously abuses images of some Japanese manga studio, he doesn't see an issue. But when King, Inc. sends a lawyer, he suddenly appoligizes and sets a filter. Hmm, what's different, sam?

2025-10-18 View on X

TechCrunch

OpenAI says it paused Sora's ability to generate videos resembling MLK Jr. at the request of his estate, after some users created “disrespectful depictions”

OpenAI responded, but is it enough? Mary Cunningham / CBS News : OpenAI blocks Sora 2 users from using MLK Jr.'s likeness after “disrespectful depictions” Katrina Morgan / WUSA : O...

View original

altman is a chicken. When his company obviously abuses images of some Japanese manga studio, he doesn't see an issue. But when King, Inc. sends a lawyer, he suddenly appoligizes and sets a filter. Hmm, what's different, sam?

2025-10-18 View on X

The Hollywood Reporter

Inside the discussions between OpenAI and talent agencies about the video app Sora; some agents say studios have been too reluctant to challenge tech giants

OpenAI's CEO brazenly regurgitated major studios' characters to allow video app Sora 2 to spit out clips tailor-made for users.

View original

altman is a chicken. When his company obviously abuses images of some Japanese manga studio, he doesn't see an issue. But when King, Inc. sends a lawyer, he suddenly appoligizes and sets a filter. Hmm, what's different, sam?

2025-10-17 View on X

TechCrunch

OpenAI says it paused Sora's ability to generate videos resembling MLK Jr. at the request of his estate, after some users created “disrespectful depictions”

OpenAI announced Thursday it paused the ability for users to generate videos resembling the late civil rights activist …

View original

It's quite sad to see Elon in this position. He has built the world's first commercially successful electric car company and the world's first commercially successful private space company, but with xAI, all he can do is throw more GPUs at the problem everyone else is solving

2025-07-10 View on X

Tom's Guide

xAI introduces Grok 4, trained on its Colossus supercomputer, with multimodal features, faster reasoning, Grok 4 Voice, Grok 4 Code, a new interface, and more

Deeper thinking and greater reasoning is promised — An hour after the live stream was supposed to start last night (July 9) …

View original

So they first said, “most of the models out there can only achieve a single-digit accuracy,” then they show that they reach 52%. I'm like, ok, cool. But then they show this. What are these “most of the models” they were talking about? GPT-2 and Llama 4? If you throw enough [image]

2025-07-10 View on X

@artificialanlys

Artificial Analysis benchmarks: Grok 4 is now the leading AI model, a first for xAI; Grok 4's per-token pricing is more expensive than Gemini 2.5 Pro's and o3's

xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model. We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis...

View original

It's quite sad to see Elon in this position. He has built the world's first commercially successful electric car company and the world's first commercially successful private space company, but with xAI, all he can do is throw more GPUs at the problem everyone else is solving

2025-07-10 View on X

Axios

Elon Musk addresses Grok's antisemitic replies, saying that “Grok was too compliant to user prompts” and “too eager to please and be manipulated, essentially”

Herb Scribner / Axios :

View original

So they first said, “most of the models out there can only achieve a single-digit accuracy,” then they show that they reach 52%. I'm like, ok, cool. But then they show this. What are these “most of the models” they were talking about? GPT-2 and Llama 4? If you throw enough [image]

2025-07-10 View on X

Tom's Guide

xAI introduces Grok 4, trained on its Colossus supercomputer, with multimodal features, faster reasoning, Grok 4 Voice, Grok 4 Code, a new interface, and more

Deeper thinking and greater reasoning is promised — An hour after the live stream was supposed to start last night (July 9) …

View original

Apple did more for AI than anyone else: they proved through peer-reviewed publications that LLMs are just neural networks and, as such, have all the limitations of other neural networks trained in a supervised way, which I and a few other voices tried to convey, but the noise

2025-06-09 View on X

Marcus on AI

Apple researchers detail the limitations of top LLMs and large reasoning models, including on classic problems like the Tower of Hanoi, which AI solved in 1957

LLM “reasoning” is so cooked they turned my name into a verb — Quoth Josh Wolfe, well-respected venture capitalist at Lux Capital:

View original

“We've also heard claims that we trained on test sets — that's simply not true, and we would never do that.” No one said you trained on the test set. What they said is that you seem to have finetuned to benchmarks. It's especially obvious when you look at the Elo rating and then test the model right there and see how GPT-3.5-ish the output is...

2025-04-08 View on X

TechCrunch

Meta VP of Generative AI Ahmad Al-Dahle denies a rumor that the company trained Llama 4 Maverick and Scout on test sets, saying that Meta “would never do that”

but the EU doesn't get everything Pascale Davies / Euronews : From a political shift to a more powerful AI: Everything to know about Meta's Llama 4 models Jay Bonggolto / Android C...

View original

“We've also heard claims that we trained on test sets — that's simply not true, and we would never do that.” No one said you trained on the test set. What they said is that you seem to have finetuned to benchmarks. It's especially obvious when you look at the Elo rating and then test the model right there and see how GPT-3.5-ish the output is...

2025-04-08 View on X

The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

View original

Below is the memo from @Shopify CEO to all the employees. If you are a CEO, avoid making the same mistake. Shopify has an executive leadership crisis. People at the top are mediocre at best. The fact that he used agents to prepare a talk is just one small indicator that he [image]

2025-04-08 View on X

CNBC

Memo: Shopify CEO Tobi Lütke says using AI is now a “fundamental expectation” and that teams asking for more resources must first show why AI can't do the job

Shopify CEO Tobi Lutke is changing his company's approach to hiring in the age of artificial intelligence.

View original

If today's disappointing release of Llama 4 tells us something, it's that even 30 trillion training tokens and 2 trillion parameters don't make your non-reasoning model better than smaller reasoning models. Model and data size scaling are over.

2025-04-06 View on X

Meta launches Llama 4 Maverick with 400B parameters and Scout with 109B parameters and a 10M context window, and previews Behemoth with 2T total parameters

Takeaways — We're sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.

View original

BREAKING🚨 So, I tested this new LLM-based system. It generated this 200-page report I didn't read and then this 150-page book I didn't read either, and then a 20-page travel plan I didn't verify. All I can say: it's very, very impressive! 🔥🚀 First, the number of pages it

2025-03-16 View on X

Bloomberg

The hype around AI agent Manus doesn't represent a second DeepSeek moment, but reveals that Chinese startups can compete with US companies building AI products

The viral AI agent from a Chinese startup isn't about research breakthroughs, it's about creating competitive consumer products.

View original

“o + 3 + mini + high”, OMG. It's like counting in French: 97 is quatre-vingt-dix-sept, so you must calculate in your head 4 x 20 + 10 + 7 to understand the number.

2025-02-01 View on X

TechCrunch

OpenAI launches o3-mini, its latest reasoning model that the company says is largely on par with o1 and o1-mini in capabilities, but runs faster and costs less

OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company's o family of reasoning models.

View original

No, it's not open-source AI beating closed-source, as LeCun claimed today on LinkedIn. It's a resource-constrained but very focused team of creative people beating teams spoiled with resources with their leaders hyping and wasting resources on problems that they know cannot be

2025-01-26 View on X

MIT Technology Review

Rather than weakening China's AI capabilities, US sanctions appear to be driving startups like DeepSeek to innovate by prioritizing efficiency and collaboration

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model. — The model was developed by the Chinese AI startup DeepSeek …

View original

No, it's not open-source AI beating closed-source, as LeCun claimed today on LinkedIn. It's a resource-constrained but very focused team of creative people beating teams spoiled with resources with their leaders hyping and wasting resources on problems that they know cannot be

2025-01-26 View on X

VentureBeat

Yann LeCun says DeepSeek “profited from open research and open source” like Meta's Llama and is proof that open source models are surpassing proprietary ones

“Marc Andreessen, a co-inventor of the pioneering Mosaic web browser, co-founder of the Netscape browser company and current general partner at the famed Andreessen Horowitz (a16z)...

View original

No, it's not open-source AI beating closed-source, as LeCun claimed today on LinkedIn. It's a resource-constrained but very focused team of creative people beating teams spoiled with resources with their leaders hyping and wasting resources on problems that they know cannot be

2025-01-25 View on X

VentureBeat

Yann LeCun says DeepSeek “profited from open research and open source” like Meta's Llama and is proof that open source models are surpassing proprietary ones

If you hadn't heard, there's a new AI star in town: DeepSeek, the subsidiary of Hong Kong-based quantitative analysis …

View original