Twitter generated child sexual abuse material via its bot..

David Chisnall (*Now with 50% more sarcasm!*)

One way to think of these models (note: this is useful but not entirely accurate and contains some important oversimplifications) is that they are modelling an n-dimensional space of possible images. The training defines a bunch of points in that space and they interpolate into the gaps. It’s possible the there are points in the space that come from the training data and contain adults in sexually explicit activities, and others that show children. Interpolating between them would give CSAM, assuming the latent space is set up that way.

myrmepropagandist

@david_chisnall @rep_movsd @GossiTheDog

This has always been possible, it was just slow. I think the innovation of these systems is building what amounts to search indexes for the atomized training data by doing a huge amount of pre-processing "training" (starting to think that term is a little misleading) this allows this kind of result to be generated fast enough to make it a viable application.

myrmepropagandist

@david_chisnall @rep_movsd @GossiTheDog

This is what I've learned by working with the public libraries I could find, and reading about how these things work.

To really know if an image isn't in the training data (or something very close to it) we'd need to compare it to the training data and we *can't* do that.

The training data are secret.

All that (maybe stolen) information is a big "trade secret."

So, when we are told "this isn't like anything in the data" the source is "trust me bro"

myrmepropagandist

@david_chisnall @rep_movsd @GossiTheDog

It's that trust that I'm talking about here. The process makes sense to me. But, I've also seen prompts that stump these things. I've seen prompts that make it spit out images that are identical to existing images.

Frank Schimmel

@futurebird @rep_movsd @GossiTheDog

An honest response would be kind of boring…
you: tell me how you arrived at that result
LLM: I did a lot of matrix multiplications

Chebucto Regional Softball Club

Twitter generated child sexual abuse material via its bot..