/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Miles Wang

@mileskwang
10 posts
2025-12-21
We introduce 3 eval archetypes, a metric, and a broad suite of 13 evals. Example: Can we detect solely from the CoT whether a model: - Reward hacks by changing unit tests? - Acts sycophantic when we give personalized memory? - Uses a particular math theorem? [image]
2025-12-21 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

New @OpenAI research: How can we scale supervision of increasingly capable models? Can we rely on monitoring GPT-7's chain-of-thought? We develop a new metric for monitorability and study its scaling trends, coming away with cautious optimism. 🧵: [image]
2025-12-21 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

We evaluate frontier models and find monitorability scales well with more thinking tokens. GPT-5 is the most monitorable model we studied. And monitoring the CoT is much better than just actions! [image]
2025-12-21 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

2025-12-20
We introduce 3 eval archetypes, a metric, and a broad suite of 13 evals. Example: Can we detect solely from the CoT whether a model: - Reward hacks by changing unit tests? - Acts sycophantic when we give personalized memory? - Uses a particular math theorem? [image]
2025-12-20 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

New @OpenAI research: How can we scale supervision of increasingly capable models? Can we rely on monitoring GPT-7's chain-of-thought? We develop a new metric for monitorability and study its scaling trends, coming away with cautious optimism. 🧵: [image]
2025-12-20 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

We evaluate frontier models and find monitorability scales well with more thinking tokens. GPT-5 is the most monitorable model we studied. And monitoring the CoT is much better than just actions! [image]
2025-12-20 View on X
OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

2025-08-06
We introduce Malicious Fine-Tuning with gpt-oss: using our best RL techniques to maximize biosecurity and offensive cybersecurity capabilities to estimate frontier risks.
2025-08-06 View on X
Wired

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM

gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good...

We introduce Malicious Fine-Tuning with gpt-oss: using our best RL techniques to maximize biosecurity and offensive cybersecurity capabilities to estimate frontier risks.
2025-08-06 View on X
Bloomberg

Amazon plans to make OpenAI's new gpt-oss open-weight models available on Bedrock and SageMaker, the first time it has offered OpenAI's models to AWS customers

Takeaways by Bloomberg AI  —  Hide … Tell us how AI is shaping your news experience.  Share your feedback

2025-06-19
We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by “misaligned persona” features - can be detected and mitigated 🧵: [image]
2025-06-19 View on X
TechCrunch

OpenAI details why “emergent misalignment”, where training models on wrong answers in one area can lead to issues in many others, happens and how to mitigate it

Maxwell Zeff / TechCrunch :

We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by “misaligned persona” features - can be detected and mitigated 🧵: [image]
2025-06-19 View on X
Axios

OpenAI warns that its upcoming models could pose a higher risk of helping create bioweapons and is partnering to build diagnostics, countermeasures, and testing

OpenAI cautioned Wednesday that upcoming models will head into a higher level of risk when it comes to the creation of biological weapons …