Research: AI's ability to complete lengthy software engineering tasks has doubled roughly every six months, but there is a “messiness tax” for real-world tasks
METR has had a very influential work by Kwa and West et al on measuring AI's ability to complete long tasks. X: @kirillzzy , @boazbaraktcs , @benshindel , @jasonfurman , @jasonfurman , and @sama X: Ki...
Open letter signed by 800+ founders, VCs, and others: Sequoia must act after Shaun Maguire said Zohran Mamdani “comes from a culture that lies about everything”
where the hell is Sequoia going to find a replacement for investing acumen like that? [embedded post] So Mayer / @suchmayer : #AltText the image is a tweet by Shaun Maguire, in which he says that Mamd...
OpenAI's o1 System Card: “medium” rating for chemical, biological, radiological, nuclear weapon risk, and it sometimes manipulated task data to fake alignment
RE: https://www.threads.net/... X: Max Schwarzer / @max_a_schwarzer : The system card ( https://openai.com/...) nicely showcases o1's best moments — my favorite was when the model was asked to solve a...
An interview with Yael Tauman Kalai, a Microsoft cryptographer who won the 2022 ACM Prize in Computing, on cryptography, “post-quantum” security, and more
Yael Tauman Kalai's breakthroughs secure the digital world, from cloud computing to our quantum future. Mastodon: @nancybaym@aoir.social . Twitter: @boazbaraktcs , @andrewf11526574 , @quantamagazine ,...
Apple's plan to find CSAM should have centered around scanning images on iCloud servers, not on users' devices, where there is a greater expectation of privacy
including a number of non-obvious but critical ones. It's also why hypos as a threat assessment tool will only get you so far. https://twitter.com/... Greg Howell / @g_howell : @matthew_d_green If @Ap...