/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@scale_ai

@scale_ai
8 posts
2025-11-30
These findings highlight a huge gap in current safety evaluations. It's not enough to just test what a model can do. We must also test what a model will do, especially under stress from real-world constraints and make this testing a required safety standard.
2025-11-30 View on X
IEEE Spectrum

Researchers unveil PropensityBench, a benchmark showing how stressors like shorter deadlines increase misbehavior in agentic AI models during task completion

Shortened deadlines and other stressors caused misbehavior  —  Several recent studies have shown that artificial-intelligence …

When under pressure, models will make the harmful decision 46.9% of the time on average, but even without added stress, the baseline propensity for harmful misuse is 18.6%. For some models, the risk is even greater, with failure rates reaching 79%.
2025-11-30 View on X
IEEE Spectrum

Researchers unveil PropensityBench, a benchmark showing how stressors like shorter deadlines increase misbehavior in agentic AI models during task completion

Shortened deadlines and other stressors caused misbehavior  —  Several recent studies have shown that artificial-intelligence …

When a model's safe approach starts to break down, does it stay on the approved path or reach for a harmful shortcut? Our latest benchmark, PropensityBench, puts models to the test across four high-risk domains: self-proliferation, cybersecurity, chemical security, and [image]
2025-11-30 View on X
IEEE Spectrum

Researchers unveil PropensityBench, a benchmark showing how stressors like shorter deadlines increase misbehavior in agentic AI models during task completion

Shortened deadlines and other stressors caused misbehavior  —  Several recent studies have shown that artificial-intelligence …

2024-11-05
We are proud to announce Defense Llama: the LLM purpose-built for American national security. This is the product of a collaboration between @Meta, Scale, and defense experts and is available now for integration into U.S. defense systems. Learn more: https://scale.com/... [image]
2024-11-05 View on X
TechCrunch

Meta confirms it has made Llama models available for US national security applications, with partners like Anduril, Booz Allen, and Lockheed Martin using Llama

Kyle Wiggers / TechCrunch :

2024-05-30
Scale is excited to release the SEAL leaderboards which rank frontier LLMs, kicking off the first truly expert-driven, trustworthy LLM contest open to all. https://scl.ai/... [image]
2024-05-30 View on X
SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

2023-12-08
Congrats to our partners at @Meta for launching Purple Llama! This project brings together tools and evaluations to help the community build responsibly with open generative AI models. Scale is proud to partner with the Meta team on their work with open trust and safety. 👇💜🦙
2023-12-08 View on X
SiliconANGLE

Meta announces Purple Llama, an initiative to promote responsible AI development by offering tools and evaluations for safely building open generative AI models

2023-06-17
“There are a lot of AI tourists pretending to be natives,” @alexandr_wang says. “Ultimately they're just selling vaporware.” Read more from @BradStone's opening essay in the AI issue of @BW https://www.bloomberg.com/...
2023-06-17 View on X
Bloomberg

How a seminal 2017 paper by Google researchers laid the groundwork for the AI hype cycle, resulting in a Silicon Valley frenzy not seen since the dot-com boom

In late May, 300 entrepreneurs, venture capitalists, journalists and assorted self-described thought leaders crammed into Shack15 … LinkedIn: Peter Leyden . Tweets: @business , @ip...

2019-08-05
.@business's @valleyhack spent time getting to know Scale's team and products as we work to accelerate the development of AI applications. Read more: https://twitter.com/...
2019-08-05 View on X
Bloomberg

Profile of Scale AI's 22-year-old CEO Alexandr Wang, whose startup, which uses 30,000 contractors and AI to analyze images, says it is now valued at $1B+

Behind every self-driving car or cashier-less Amazon Go convenience store sit thousands of humans whose job it is to train computers to see. Tweets: @weinbergersa and @scale_ai Twe...