/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@mlcommons

@mlcommons
11 posts
2024-12-07
Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 benchmark offers a comprehensive set of safety grades for today's most prevalent #LLMs. https://mlcommons.org/... (1/4) [image]
2024-12-07 View on X
Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems.  It wants to measure the bad side of AI next.

and industry experts from @Google, @intel, @nvidia, @Meta, @Microsoft, @Qualcomm, and others committed to a standardized approach to AI safety. (3/4)
2024-12-07 View on X
Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems.  It wants to measure the bad side of AI next.

This is a major milestone in progress to a global standard for AI safety. The benchmark was created by the @MLCommons AI Risk & Reliability working group of experts from @Stanford, @Columbia, and @TUeindhoven, civil society reps, (2/4)
2024-12-07 View on X
Wired

MLCommons, a nonprofit that helps companies measure their AI systems' performance, debuts the AILuminate benchmark featuring 12K+ prompts to assess LLMs' safety

MLCommons provides benchmarks that test the abilities of AI systems.  It wants to measure the bad side of AI next.

2024-11-14
2/4 Gen AI benchmarks- GPT-3, Stable Diffusion, Llama 2 70B LoRA fine-tuning - saw a 46% increase in submissions compared to previous round.
2024-11-14 View on X
IEEE Spectrum

Nvidia B200 GPU and Google Trillium TPU debut on the MLPerf Training v4.1 benchmark charts; the B200 posted a doubling of performance on some tests vs. the H100

Samuel K. Moore / IEEE Spectrum :

1/4 Announcing new @MLCommons @MLPerf Training v4.1 benchmark results: 155 performance results submitted from 17 organizations in this round. https://mlcommons.org/... [image]
2024-11-14 View on X
IEEE Spectrum

Nvidia B200 GPU and Google Trillium TPU debut on the MLPerf Training v4.1 benchmark charts; the B200 posted a doubling of performance on some tests vs. the H100

Samuel K. Moore / IEEE Spectrum :

2024-06-13
@MLCommons @MLPerf Training v4.0 benchmark results are out! This round of results includes 2 new benchmarks added to the suite and a first time power submitter! See the results https://mlcommons.org/...
2024-06-13 View on X
IEEE Spectrum

MLCommons shares results from its MLPerf 4.0 training benchmarks, which added Google's and Intel's AI accelerators; Nvidia H100 GPUs topped all nine benchmarks

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

Over 205 @MLPerf Training v4.0 benchmark results are out! Congrats @ASUS_OFFICIAL @Dell @Fujitsu_Global @GigaComputing @Google @HPE @intel/@HabanaLabs Labs @JuniperNetworks @Lenovo @nvidia @CoreWeave @Oracle @QuantaTechno @RedHat @Supermicro_SMCI @SMC_FutureAI @__tinygrad__ .
2024-06-13 View on X
IEEE Spectrum

MLCommons shares results from its MLPerf 4.0 training benchmarks, which added Google's and Intel's AI accelerators; Nvidia H100 GPUs topped all nine benchmarks

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

2024-03-28
@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM
2024-03-28 View on X
IEEE Spectrum

MLCommons shares the results from its MLPerf 4.0 inferencing benchmarks, which added Llama 2 70B and Stable Diffusion XL; PCs with Nvidia GPUs came out on top

no Blackwell submissions yet, sorry Karl Freund / Forbes : Nvidia Sweeps AI Benchmarks While AMD Misses The Boat. Again. Intel : Intel Gaudi 2 Remains Only Benchmarked Alternative ...

The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI
2024-03-28 View on X
Ars Technica

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23.  —  On Tuesday, Anthropic's Claude 3 …

@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM
2024-03-28 View on X
Ars Technica

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23.  —  On Tuesday, Anthropic's Claude 3 …

The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI
2024-03-28 View on X
IEEE Spectrum

MLCommons shares the results from its MLPerf 4.0 inferencing benchmarks, which added Llama 2 70B and Stable Diffusion XL; PCs with Nvidia GPUs came out on top

no Blackwell submissions yet, sorry Karl Freund / Forbes : Nvidia Sweeps AI Benchmarks While AMD Misses The Boat. Again. Intel : Intel Gaudi 2 Remains Only Benchmarked Alternative ...