/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
API keys, docs, usage dashboard
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
Entity

ai safety techniques

2 articles stable
Articles
2
mentions
Velocity
0.0%
growth rate
Acceleration
0.000
velocity change
Sources
1
publications

Coverage Timeline

2024-01-15
TechCrunch 13 related

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training  —  This is some sci-fi stuff right here (even if unsurprising)...

2024-01-14
TechCrunch 8 related

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Most humans learn the skill of deceiving other humans.  So can AI models learn the same?  Yes, the answer seems — and terrifyingly, they're exceptionally good at it.

Loading articles...

Quarterly Coverage

Top Sources

Narrative

Loading narrative...

Relationships

Loading graph...