Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly
at a rate *much* higher than previous AI models. In one instance, while being tested the model said “I think you're testing me ... that's fine, but I'd prefer if we were just hones...
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
It presents a new safety challenge that OpenAI is trying to address. — techcrunch.com/2024/12/05/o... Anders Sandberg / @arenamontanus : In an IVA discussion on AI yesterday even...
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
It presents a new safety challenge that OpenAI is trying to address. — techcrunch.com/2024/12/05/o... Anders Sandberg / @arenamontanus : In an IVA discussion on AI yesterday even...
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
It presents a new safety challenge that OpenAI is trying to address. — techcrunch.com/2024/12/05/o... Anders Sandberg / @arenamontanus : In an IVA discussion on AI yesterday even...
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
Paper: You can find the detailed paper here. — Transcripts: We provide a list of cherry-picked transcripts here.
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
Paper: You can find the detailed paper here. — Transcripts: We provide a list of cherry-picked transcripts here.
An evaluation of six frontier AI models for in-context scheming when strongly nudged to pursue a goal: only OpenAI's o1 was capable of scheming in all the tests
Paper: You can find the detailed paper here. — Transcripts: We provide a list of cherry-picked transcripts here.
Biden signs an EO on generative AI, directing the NIST, DHS, and other agencies to create new safety standards, protect privacy, support workers, and more
President Joe Biden signed an executive order providing rules around generative AI, ahead of any legislation coming from lawmakers.