"Chat GPT told me that it *can't* alter its data set but it did say it could simulate what it would be like if it altered it's data set"

myrmepropagandist

@stuartyeates @tuban_muzuru @marshray @trochee

What isn't understood about how it works?

Marsh Ray

@futurebird I wish I could.
But it would just make people angry, and I feel like I’ve done enough of that today already.

Samuel Smith ✅

@nazokiyoubinbou @joby @futurebird @CptSuperlative @emilymbender I use LLMs generally for two things: Things I know how to do, but don't have time (but can quickly check the accuracy), and things I don't know how to do, but can check the results. Fermented statistics is useful when you know and account for its limitations. It sucks if you don't because you often get a confident answer that is plausible, but wrong.

Captain Superfluous

@nazokiyoubinbou

That's a good example where accuracy doesn't matter at all. It also reminded me of a use I heard about that seemed useful: having a bot seem like an easily confused and scammed elderly person to get scammers to waste time. Oh, and for fighting wrongful health insurance claim-denials

I guess these all fall into the category of bullshitting bullshitters.

@futurebird @emilymbender

Elio Campitelli

@paulywill @futurebird As far as I understand it, chatgpt will take your request and use a diffusion model (Dall-e 3) to create an image. It will come up with a prompt and feed it to the other model and then just show the result. In no sense does chatgpt actually generates the image.
BTW, it seems other diffusion models have the same issue. Here's Stable Diffusion's and Flux's takes

johnaldis

@futurebird @dalias We *have* programs with concepts which do reasoning. The code for your favourite spaceship computer game clearly has the concepts of “spaceship”, “laser”, “planet” etc encoded in it, and all the necessary machinery to reason about what happens when a laser hits a spaceship while it’s orbiting a planet. We have real-world examples as well, reasoning about questions like “what happens if the tide is coming in and we close the Thames Barrier?”

johnaldis

@futurebird @dalias And yes, many places are attempting to use ChatGPT as a sort of input/output “surface” to those systems. But ultimately the folks operating the Thames Barrier don’t need the results presented in a chatty format. Similarly, I don’t like these “put in English to generate code” tools since they’ve just replaced writing a specification of what to do in a programming language with writing one in English, and English is a terrible programming language.

David Chisnall (*Now with 50% more sarcasm!*)

@futurebird @CptSuperlative @emilymbender Summarisation is the one I’d be most nervous about because creating a summary is hard: it requires understanding the content and knowing which parts are relevant in a given context, which is why LLMs tend to be awful at it. They don’t understand the content, which is how you get news summaries that get the subject and object the wrong way around in a murder. They don’t know what is important, which is how you get email summaries that contain a scam message and strip all of the markers that would make it obvious that the message is a scam.

If you’re going to read all of them and are just picking an order, that’s probably fine. The worst that a bad summary can do is make you read them in the wrong order and that’s not really a problem.

Elio Campitelli

@paulywill @futurebird Also, you can ask ChatGPT for an SVG of a glass of wine filled to the brim and look at that, it's surprisingly good.

llewelly

@futurebird a fine example of why "oh, we'll have humans in the loop checking what our bullshit machine does" is fraud; even many smart people are easily fooled by these bullshit machines.

myrmepropagandist

@david_chisnall @CptSuperlative @emilymbender

It can summarize scientific papers well in part because they have a clear style and even come with an abstract.

The words and phrases in the abstract of a paper reliably predict the content and main ideas of the paper.

Moreover, even if you remove the abstracts, it has lots of training data of papers with abstracts.

myrmepropagandist

@david_chisnall @CptSuperlative @emilymbender

But, if I ask it, say, to organize papers into groups based on which hypothesis about eusocial evolution they support. LOL. It tries to give me what a want, on the surface the result had me excited for a moment because it was worded like just what I wanted.

But, some of the papers in the set didn't talk about eusocial evolution, yet they'd been placed in a camp. Some papers were in the wrong camp, and worst? It made up a paper not in the set.

myrmepropagandist

@david_chisnall @CptSuperlative @emilymbender

I know people who like and defend LLM have seen this kind of failure. A failure that exposes how the system works through word, phrase and phrase group association. An associative model can't deal with sets, intersections, unions, hard classification, logic.

But, it can write a paragraph that sounds like what a person who has done such thinking might say. It's just all the content is total nonsense.

myrmepropagandist

@johnaldis @dalias

"English is a terrible programming language."

It is *objectively* the worst possible coding language.

jakob.pxi

@futurebird @nazokiyoubinbou @CptSuperlative @emilymbender

ultimately I don't think fidelity and veracity in text generation matter all that much across (edit: many) contexts, hence a bullshit machine that is 80% reliable while reducing human input to 20% of manual effort will be used everywhere the famous 80/20 rule faces no hard factfulness constraint. Hence, the users not having a more accurate mental model or groking what they do is immaterial to how the output is being used...

myrmepropagandist

@Jirikiha @nazokiyoubinbou @joby @CptSuperlative @emilymbender

"Confidently Wrong: The Brave New World of LLMs"

This would be a good title for a mini-essay for the faculty newsletter on this topic. I'm deeply unsettled when people use these systems in ways that imply a deep misunderstanding about their limitations and design. Can we please just understand what this system *is* doing and what it is not doing?

myrmepropagandist

@Jirikiha @nazokiyoubinbou @joby @CptSuperlative @emilymbender

If I asked chat GPT to "turn off the porch light" and it said "OK, I've turned off the light on your porch." I would know that it has not really done this. It has no way to access my porch light. I would realize that it is just giving a text answer that fits the context of the previous prompts.

So, why do people think it makes sense to ask chat GPT to explain how it produced a response?

stuart yeates

@futurebird @david_chisnall @CptSuperlative @emilymbender

A another really cool trick is to ask the LLM for copies of papers that you know have never been digitised.

There are a ton of high-citation articles from the 80s that haven't been digitised, so the LLM has seen references to this thing but never the text...

jakob.pxi

@david_chisnall @futurebird @CptSuperlative @emilymbender btw, we are working on "edge AI" in a research project and have taken to reframe "summarization" of the source information into "wayfinding" of the information.

Offering fidelity estimations and affordances to navigate the source text from the condensed version should (we hope) inform the mental model of users that they are using a stochastic machine that is merely there to help work with large texts. Still early working hypothesis.

Pete Alex Harris🦡🕸️🌲/∞🪐∫

@futurebird @david_chisnall @CptSuperlative @emilymbender

But the abstract is already a summary of the paper you can scan to tell if the paper will be useful to you, and you can (usually) trust that summary to be accurate to the content of the paper and concise enough to include the most relevant points. You can't assume the same of an LLM summary, so it's worse than an abstract search.

I can see the advantage of a syntax- and context-aware abstract search if LLMs were that, but they aren't.

Chebucto Regional Softball Club

"Chat GPT told me that it can't alter its data set but it did say it could simulate what it would be like if it altered it's data set"