mlcommons · TEXXR

Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 benchmark offers a comprehensive set of safety grades for today's most prevalent #LLMs. https://mlcommons.org/... (1/4) [image]

2024-12-07 View on X

Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems. It wants to measure the bad side of AI next.

View original

and industry experts from @Google, @intel, @nvidia, @Meta, @Microsoft, @Qualcomm, and others committed to a standardized approach to AI safety. (3/4)

2024-12-07 View on X

Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems. It wants to measure the bad side of AI next.

View original

This is a major milestone in progress to a global standard for AI safety. The benchmark was created by the @MLCommons AI Risk & Reliability working group of experts from @Stanford, @Columbia, and @TUeindhoven, civil society reps, (2/4)

2024-12-07 View on X

Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems. It wants to measure the bad side of AI next.

View original

2/4 Gen AI benchmarks- GPT-3, Stable Diffusion, Llama 2 70B LoRA fine-tuning - saw a 46% increase in submissions compared to previous round.

2024-11-14 View on X

IEEE Spectrum

Nvidia B200 GPU and Google Trillium TPU debut on the MLPerf Training v4.1 benchmark charts; the B200 posted a doubling of performance on some tests vs. the H100

Samuel K. Moore / IEEE Spectrum :

View original

1/4 Announcing new @MLCommons @MLPerf Training v4.1 benchmark results: 155 performance results submitted from 17 organizations in this round. https://mlcommons.org/... [image]

2024-11-14 View on X

IEEE Spectrum

Nvidia B200 GPU and Google Trillium TPU debut on the MLPerf Training v4.1 benchmark charts; the B200 posted a doubling of performance on some tests vs. the H100

Samuel K. Moore / IEEE Spectrum :

View original

@MLCommons @MLPerf Training v4.0 benchmark results are out! This round of results includes 2 new benchmarks added to the suite and a first time power submitter! See the results https://mlcommons.org/...

2024-06-13 View on X

IEEE Spectrum

MLCommons shares results from its MLPerf 4.0 training benchmarks, which added Google's and Intel's AI accelerators; Nvidia H100 GPUs topped all nine benchmarks

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

View original

Over 205 @MLPerf Training v4.0 benchmark results are out! Congrats @ASUS_OFFICIAL @Dell @Fujitsu_Global @GigaComputing @Google @HPE @intel/@HabanaLabs Labs @JuniperNetworks @Lenovo @nvidia @CoreWeave @Oracle @QuantaTechno @RedHat @Supermicro_SMCI @SMC_FutureAI @__tinygrad__ .

2024-06-13 View on X

IEEE Spectrum

MLCommons shares results from its MLPerf 4.0 training benchmarks, which added Google's and Intel's AI accelerators; Nvidia H100 GPUs topped all nine benchmarks

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

View original

@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM

2024-03-28 View on X

IEEE Spectrum

MLCommons shares the results from its MLPerf 4.0 inferencing benchmarks, which added Llama 2 70B and Stable Diffusion XL; PCs with Nvidia GPUs came out on top

no Blackwell submissions yet, sorry Karl Freund / Forbes : Nvidia Sweeps AI Benchmarks While AMD Misses The Boat. Again. Intel : Intel Gaudi 2 Remains Only Benchmarked Alternative ...

View original

The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI

2024-03-28 View on X

Ars Technica

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23. — On Tuesday, Anthropic's Claude 3 …

View original

@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM

2024-03-28 View on X

Ars Technica

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23. — On Tuesday, Anthropic's Claude 3 …

View original

The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI

2024-03-28 View on X

IEEE Spectrum

MLCommons shares the results from its MLPerf 4.0 inferencing benchmarks, which added Llama 2 70B and Stable Diffusion XL; PCs with Nvidia GPUs came out on top

no Blackwell submissions yet, sorry Karl Freund / Forbes : Nvidia Sweeps AI Benchmarks While AMD Misses The Boat. Again. Intel : Intel Gaudi 2 Remains Only Benchmarked Alternative ...

View original