2025-11-18
@artificialanlys
Artificial Analysis announces AA-Omniscience, a benchmark for knowledge and hallucination across 40+ topics; Claude 4.1 Opus takes first place in its key metric
@artificialanlys : X: @artificialanlys , @emollick , @scaling01 , @teortaxestex , @artificialanlys , @zephyr_z9 , @artificialanlys , @artificialanlys , @mweinbach , @artificialanlys , and @artificial...
2025-08-17
Simon Willison's Weblog
1 related
A new Artificial Analysis benchmark, focusing on OpenAI's gpt-oss-120b, shows how open-weight LLMs exhibit inconsistent performance across hosting providers
Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model - OpenAI's gpt-oss-120b - performs across different hosted providers.
2025-05-06
TechCrunch
3 related
Recraft, whose image model Recraft V3 beat OpenAI's DALL-E and Midjourney on the Artificial Analysis benchmark last year, raised a $30M Series B led by Accel
they empower creativity, enable brand storytelling, and give designers precision and control. …
2025-04-11
TechCrunch
1 related
AI reasoning models cost more to benchmark, making it harder to independently verify claims; Artificial Analysis says evaluating OpenAI's o1 costs $2,767.05
AI labs like OpenAI claim that their so-called “reasoning” AI models, which can “think” through problems step by step …
Loading articles...