/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

David Thiel

@elegant_wallaby
39 posts
2024-08-31
They removed content using the instances we found as well as an apparent expanded MD5 hash set, chaffing the removals by removing a large amount of other content, including using keywords that could relate to CSAM and non-CSAM that could contain sensitive info about kids.  2/15
2024-08-31 View on X
TechCrunch

LAION, a research org whose dataset was used to train Stable Diffusion and other models, releases a new dataset it claims has been “thoroughly cleaned” of CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models …

LAION has released a revised version of the LAION-5B dataset to address CSAM concerns we previously highlighted.  Here are my impressions.  1/15 https://laion.ai/...
2024-08-31 View on X
TechCrunch

LAION, a research org whose dataset was used to train Stable Diffusion and other models, releases a new dataset it claims has been “thoroughly cleaned” of CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models …

These are obviously all significant improvements; credit to LAION and the other involved child safety orgs for all their work.  It's not quite what I would call a gold standard, but it definitely sets a much better example.  4/15
2024-08-31 View on X
TechCrunch

LAION, a research org whose dataset was used to train Stable Diffusion and other models, releases a new dataset it claims has been “thoroughly cleaned” of CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models …

They also removed all content above a conservative “unsafe” probability; one dataset removing >0.95, which we found covered almost all our matches.  The other dataset is far more conservative, removing the majority of NSFW samples.  3/15
2024-08-31 View on X
TechCrunch

LAION, a research org whose dataset was used to train Stable Diffusion and other models, releases a new dataset it claims has been “thoroughly cleaned” of CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models …

The NCII part is key IMO, because a *lot* of imagery that shows up in random crawls is non-consensual, of dubious provenance or at the very least copyrighted.  And also just private imagery, and risky given identity and age ambiguity.  6/15
2024-08-31 View on X
TechCrunch

LAION, a research org whose dataset was used to train Stable Diffusion and other models, releases a new dataset it claims has been “thoroughly cleaned” of CSAM

LAION, the German research org that created the data used to train Stable Diffusion, among other generative AI models …

2023-12-21
We used a combination of methods to determine this: perceptual hashing, cryptographic hashing, and k-nearest neighbors analysis using the image embeddings.  Seeded from a small subset of the dataset, PhotoDNA identified hundreds of instances, the URLs of which which were reported to NCMEC.
2023-12-21 View on X
Bloomberg

Stanford researchers: LAION-5B, a dataset of 5B+ images used by Stability AI and others, contains 1,008+ instances of CSAM, possibly helping AI to generate CSAM

most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data. https://purl.stanford.edu/... Alex Stamos / @alex.stamos : Lots of p...

I'm not sure what the legal implications are for this; most CSAM possession laws were made with the assumption that only huge service providers would have this much storage of mixed data, and they generally have detection and reporting flows.  But all LAION-5B images can fit in a backpack.
2023-12-21 View on X
Bloomberg

Stanford researchers: LAION-5B, a dataset of 5B+ images used by Stability AI and others, contains 1,008+ instances of CSAM, possibly helping AI to generate CSAM

most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data. https://purl.stanford.edu/... Alex Stamos / @alex.stamos : Lots of p...

Fixing this problem is going to be difficult.  The datasets are already out there, and the models are already trained.  While we've made good progress in getting content removed from the source URLs, removing it from public datasets gives people a map to CSAM and its associated image embeddings.
2023-12-21 View on X
Bloomberg

Stanford researchers: LAION-5B, a dataset of 5B+ images used by Stability AI and others, contains 1,008+ instances of CSAM, possibly helping AI to generate CSAM

most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data. https://purl.stanford.edu/... Alex Stamos / @alex.stamos : Lots of p...

As a follow-up to our work on computer-generated CSAM, we took a closer look at the training data used to train various generative models—most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data. https://purl.stanford.edu/...
2023-12-21 View on X
Bloomberg

Stanford researchers: LAION-5B, a dataset of 5B+ images used by Stability AI and others, contains 1,008+ instances of CSAM, possibly helping AI to generate CSAM

most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data. https://purl.stanford.edu/... Alex Stamos / @alex.stamos : Lots of p...

2022-10-24
“certain discrepancies have emerged in the material used” The passive voice doing some heavy lifting here https://twitter.com/...
2022-10-24 View on X
The Wire

The Wire retracts two recent stories about Meta's XCheck program and says it is using “independent external experts” to investigate its coverage

2022-10-16
Also note that every other email that people have presented — and every email I've received from fb\.com going back to 2016 — is formatted differently from what the video shows. The header list is invariably lowercase and with padding around the colons. 1/2 https://twitter.com/... https://twitter.com/...
2022-10-16 View on X
The Wire

In response to Meta's rebuttal of its XCheck report, The Wire shares a video of a source using a subdomain, DKIM signatures, and more, but experts are skeptical

& many mainstream foreign journalists also questioned The Wire's work. Now, @thewire_in says it's verified the email via- its DKIM signature. https://thewire.in/... Matthew Green /...

2021-12-04
You have no assurance that the Facebook you're getting is the same as other users—in fact you're guaranteed it *isn't*, given A/B experiments and regional issues. There's no meaningful way to audit it and ensure that it hasn't been altered to target you in some way. 6/
2021-12-04 View on X
Wired

Before implementing e2ee, Meta must improve its existing content-oblivious harm-reduction mechanisms, limit recommendation engines and discoverability, and more

in fact you're guaranteed it *isn't*, given A/B experiments and regional issues. There's no meaningful way to audit it and ensure that it hasn't been altered to target you in some ...

ripping messaging out of websites entirely, and relying on purpose-built messaging apps the same way we do with phones and addresses. It's not entirely satisfying or entirely convenient, but IMO the reduced complexity and attack surface is worth it. 13/13
2021-12-04 View on X
Wired

Before implementing e2ee, Meta must improve its existing content-oblivious harm-reduction mechanisms, limit recommendation engines and discoverability, and more

in fact you're guaranteed it *isn't*, given A/B experiments and regional issues. There's no meaningful way to audit it and ensure that it hasn't been altered to target you in some ...

Some thoughts on the complexities that bogged down @Meta's E2EE efforts, and hopefully some hints at a way forward: https://www.wired.com/...
2021-12-04 View on X
Wired

Before implementing e2ee, Meta must improve its existing content-oblivious harm-reduction mechanisms, limit recommendation engines and discoverability, and more

in fact you're guaranteed it *isn't*, given A/B experiments and regional issues. There's no meaningful way to audit it and ensure that it hasn't been altered to target you in some ...

2021-11-24
WhatsApp doesn't recommend people to befriend and interact with. It doesn't host secret groups of unlimited size. It doesn't provide global search of every user. It doesn't group people by location or institutions like high schools. 9/
2021-11-24 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

David Thiel / @elegant_wallaby :

2021-11-23
Has “but the children” been an excuse for all kinds of terrible ideas and government overreach? Absolutely. And government will indeed use it to try to hamper E2EE. But that doesn't mean that real child safety concerns are imaginary or minimal. 14/
2021-11-23 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

Please stop with this. Child safety is not FUD, nor disingenuous. Here is what happened with Facebook's haphazard E2EE plan, from someone who was there and familiar with the underl...

WhatsApp doesn't recommend people to befriend and interact with. It doesn't host secret groups of unlimited size. It doesn't provide global search of every user. It doesn't group people by location or institutions like high schools. 9/
2021-11-23 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

Please stop with this. Child safety is not FUD, nor disingenuous. Here is what happened with Facebook's haphazard E2EE plan, from someone who was there and familiar with the underl...

I generally pro-E2EE, but my enthusiasm for it steadily wanes as we move away from a “private IRL conversation” model to a “social network” model. These systems are safer and work best when decoupled from discoverability, recommendation algorithms and marketing incentives. 12/
2021-11-23 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

Please stop with this. Child safety is not FUD, nor disingenuous. Here is what happened with Facebook's haphazard E2EE plan, from someone who was there and familiar with the underl...

Whereas Facebook tries to take existing social networks, merge them and build new ones. This has led to wildly inappropriate situations (including literally recommending victims to abusers) particularly when combined with contact sync and offsite pixel tracking. 10/
2021-11-23 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

Please stop with this. Child safety is not FUD, nor disingenuous. Here is what happened with Facebook's haphazard E2EE plan, from someone who was there and familiar with the underl...

When this was announced, systems to identify child grooming, sextortion and CSAM distribution without content inspection were operating at <10% of the effectiveness of those systems that did inspect content. It was clear that the majority of harm would escape detection. 3/
2021-11-23 View on X
@elegant_wallaby

[Thread] A former Facebook employee says Meta announced an “absurdly accelerated timeline” for e2ee messaging to preempt antitrust action and generate good PR

Please stop with this. Child safety is not FUD, nor disingenuous. Here is what happened with Facebook's haphazard E2EE plan, from someone who was there and familiar with the underl...