Samsung introduces the Tiny Recursion Model, a 7M-parameter model that can outperform LLMs 10,000x larger, like Gemini 2.5 Pro and o3-mini, on specific problems
The trend of AI researchers developing new, small open source generative models that outperform far larger …
OpenAI researchers build the SWE-Lancer benchmark and find that real-world freelance software engineering work remains challenging for frontier language models
Large language models (LLMs) may have changed software development, but enterprises will need to think twice …
xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users
Elon Musk's AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.
OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data
the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...
Three of Google's NotebookLM team members, lead Raiza Martin, a designer, and an engineer, are leaving to launch a startup focused on “a user-first AI product”
Three members of Google's NotebookLM team, including its team lead and designer, have announced they are leaving Google for a new stealth startup.
OpenAI launches canvas, a ChatGPT interface with a workspace for writing and coding projects, similar to Anthropic's Artifacts, in beta for Plus and Team users
A new way of working with ChatGPT to write and code The image shows … Kevin Raposo / KnowTechie : OpenAI rolls out Canvas: AI sidekick for coding and writing Jorge A. Aguilar / How...
OpenAI launches canvas, a ChatGPT interface with a workspace for writing and coding projects, similar to Anthropic's Artifacts, in beta for Plus and Team users
A new way of working with ChatGPT to write and code The image shows … Jorge A. Aguilar / How-To Geek : ChatGPT Canvas Wants To Be Your Personal Writing Assistant Harsh Shivam / Bus...
Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by priming them with dozens of harmful queries in a single prompt
How do you get an AI to answer a question it's not supposed to? There are many such “jailbreak” techniques …
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors
[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (e...
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors
Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they're exceptionally good at it.