"Chat GPT told me that it *can't* alter its data set but it did say it could simulate what it would be like if it altered it's data set"
-
@futurebird I kinda think the best mental model normies (non-mathematicians/logicians) can easily grasp for LLMs is a cheating partner. For every thing you say, it's going to gaslight you with the most plausible sounding response it can come up with. None of these responses necessarily have anything to do with reality. They're all just chosen based on the likelihood you'll accept them as sounding true.
Maybe but that still implies some kind of organization of concepts beyond just through language or the shape of their output.
I don't see any reason why it should be impossible to design a program with concepts, that could do something like reasoning ... you might even use an LLM to make the output more human readable.
Though I guess this metaphor works in that to the extent there is a "goal" it's to "make it pass" rather than to convey any idea or express anything.
-
@futurebird @AT1ST The temptation to do that is great.
I try to recognize when I’m posting reflexively and not hit ‘Publish’, because it feels like those posts are largely not adding value.
-
@AT1ST @futurebird “It’s not actually doing X, it’s just generating text that’s indistinguishable from doing X.”
is classic human cope.The ability to produce text as it “should look like” is a thing that humans get accredited degrees and good paying jobs for demonstrating.
Good enough is almost always better than perfect, and 80% is usually good enough.
@marshray @AT1ST @futurebird This is a little like claiming that MENACE actually understands how to play noughts and crosses because it behaves in a way that's indistinguishable from understanding how to play noughts and crosses: https://en.m.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_Crosses_Engine
Does the distinction matter during a game? No. Does that mean it doesn't matter at all? Absolutely not.
-
@trochee @futurebird OK, so you just “emitted a word sequence shaped sorta like the word sequences that comes out when metacognition happens.”
And since it’s an argument we’ve all seen before, I can just dismiss your “word sequence” out-of-hand.
See how that works?
Do not try the Chinese Room gambit with me, for I was there when the deep magic was written
-
@trochee @futurebird OK, so you just “emitted a word sequence shaped sorta like the word sequences that comes out when metacognition happens.”
And since it’s an argument we’ve all seen before, I can just dismiss your “word sequence” out-of-hand.
See how that works?
-
@futurebird @marshray @trochee
I do wish philosophy would get off its dead ass and provide us with a cogent vocabulary for what's going on in the machines.
The closest to a competent philosopher is WVO Quine and his school. Google "gavagai" and the indeterminacy of translation.
There are at least a dozen ridiculous attempts to Explain AI and they're all worthless. Our understanding of reality is always relative to a background theory
-
@futurebird @cxxvii Right. I make this differentiation because A. I want to be clear that no matter how they might advertise this or that method is more accurate, it will always fail due to the underlying issue and B. some people think the tech just isn't fully developed, but its underlying mechanism can NEVER improve without changing to something else entirely.
(Well, as a side note, many do actually make an effort to train in more accuracy, just, the fundamental issue always comes back to bite them. They legit are trying, it just can't work.)
Yeah that's the wild bit. Someone runs over and says "we built the plane it designed and it worked!"
That could also happen. But it doesn't change the fact that the design is a product of a process that has encoded nothing about what an airplane is, why you'd want to make one etc.
If the designs work (as translation often works OK with LLMs) that shows that it's possible to get the output without the logical framework.
If that makes any sense.
-
@futurebird @AT1ST The temptation to do that is great.
I try to recognize when I’m posting reflexively and not hit ‘Publish’, because it feels like those posts are largely not adding value.
It's the chinese room problem. Yeah maybe it can generate text that sounds good, but it doesn't know what it's saying in any real sense. And it doesn't know true from false, those are irrelevant concepts to an LLM as it is just using very large statistics to string words together in a way that's statistically plausible.
The machine is fed a shitpost about gluing the cheese on your pizza, now that's an equally valid response for it to give as any real answer bc it has ingested this text.
-
"Chat GPT told me that it *can't* alter its data set but it did say it could simulate what it would be like if it altered it's data set"
NO. It has no idea if it's telling the truth or not and when it says "I can simulate what this would be like"
This guy is pretty sharp about philosophy but people really really really does not *get* how this works.
"Chat GPT told me this is what it did"
No! It told you what you thought it should say if you asked it what it did!
@futurebird@sauropods.win Okay, let me try this.
A ha! The advanced AI GNU Echo told me it cannot edit its source code! But wait, what about this?echo "I cannot edit my source code." I cannot edit my source code.
echo "I can edit my source code." I can edit my source code.
GNU Echo told me it can edit its source code! How can those both be true? -
I have found one use case. Although, I wonder if it's cost effective. Give an LLM a bunch of scientific papers and ask for a summary. It makes a kind of nice summary to help you decide what order to read the papers in.
It's also OK at low stakes language translation.
I also tried to ask it for a vocabulary list for the papers. Some of it was good but it had a lot of serious but subtile and hard to catch errors.
It's kind of like a gaussian blur for text.
@futurebird @CptSuperlative @emilymbender Summaries aren't reliable either.
There are indeed use-cases. But every single one of them comes with caveats. And, I mean, to be fair, most "quick" methods of doing anything come with caveats. It's just that people forget those caveats.
-
@futurebird @CptSuperlative @emilymbender Summaries aren't reliable either.
There are indeed use-cases. But every single one of them comes with caveats. And, I mean, to be fair, most "quick" methods of doing anything come with caveats. It's just that people forget those caveats.
@nazokiyoubinbou @CptSuperlative @emilymbender
I don't want to be so dismissive that people who are finding uses for this tech won't pay attention to the important points about the limitations.
People *are* using this tech, some heavily, many probably in ways with pitfalls we won't see the worst results of until it's too late.
Saying "it's just fancy autocomplete" is basically true, but many people think "but autocomplete can't do --"
So, I really try to find ways to "get" this new tech.
-
@AT1ST @futurebird “It’s not actually doing X, it’s just generating text that’s indistinguishable from doing X.”
is classic human cope.The ability to produce text as it “should look like” is a thing that humans get accredited degrees and good paying jobs for demonstrating.
Good enough is almost always better than perfect, and 80% is usually good enough.
@marshray @futurebird It's more prominently an issue with image detection Machine Learning A.I.s-when given a set of pictures and asked to identify if a skin mole is cancer or not-is it actually identifying cancer in the picture, or just identifying if there's a ruler in the picture?
In the same way,the LLM is not necessarily doing X,but identifying things that are markers of X,despite not actually requiring one to do X.
It's important because it means it won't solve the issues that trip us up. -
@nazokiyoubinbou @CptSuperlative @emilymbender
I don't want to be so dismissive that people who are finding uses for this tech won't pay attention to the important points about the limitations.
People *are* using this tech, some heavily, many probably in ways with pitfalls we won't see the worst results of until it's too late.
Saying "it's just fancy autocomplete" is basically true, but many people think "but autocomplete can't do --"
So, I really try to find ways to "get" this new tech.
@futurebird @CptSuperlative @emilymbender To be clear on this, I'm one of the people actually using it -- though I'll be the first to admit that my uses aren't particularly vital or great. And I've seen a few other truly viable uses. I think my favorite was one where someone set it up to roleplay as the super of their facility so they could come up with arguments against anything the super might try to use to avoid fixing something, lol.
I just feel like it's always important to add that reminder "by the way, you can't 100% trust what it says" for anything where accuracy actually matters (such as summaries) because they work in such a way that people do legitimately forget this.
-
@futurebird @CptSuperlative @emilymbender Summaries aren't reliable either.
There are indeed use-cases. But every single one of them comes with caveats. And, I mean, to be fair, most "quick" methods of doing anything come with caveats. It's just that people forget those caveats.
@nazokiyoubinbou @CptSuperlative @emilymbender
If I don't have the experience of "finding it useful" I can't possibly communicate clearly what's *wrong* with asking a LLM "can you simulate what it would be like if you didn't have X in your data set" and just going with the response like it could possibly be what you thought you asked for.
It's not going away.
And right now a lot of people give it more trust and respect than they do other people *because* its a machine.
-
@nazokiyoubinbou @CptSuperlative @emilymbender
If I don't have the experience of "finding it useful" I can't possibly communicate clearly what's *wrong* with asking a LLM "can you simulate what it would be like if you didn't have X in your data set" and just going with the response like it could possibly be what you thought you asked for.
It's not going away.
And right now a lot of people give it more trust and respect than they do other people *because* its a machine.
@nazokiyoubinbou @CptSuperlative @emilymbender
Consider the whole genre of "We asked an AI what love was... and this is what it said!"
It's a bit like a magic 8 ball, but I think people are more realistic about the limitations of the 8 ball.
And maybe it's that gloss of perceived machine "objectivity" that makes me kind of angry at those making this error.
-
I have found one use case. Although, I wonder if it's cost effective. Give an LLM a bunch of scientific papers and ask for a summary. It makes a kind of nice summary to help you decide what order to read the papers in.
It's also OK at low stakes language translation.
I also tried to ask it for a vocabulary list for the papers. Some of it was good but it had a lot of serious but subtile and hard to catch errors.
It's kind of like a gaussian blur for text.
@futurebird @CptSuperlative @emilymbender Porting code - small blocks at a time - is the only really useful use-case I've found so far. But at that, it's pretty great: requires a few tries with feedback until "we" get something that compiles and works, but I can port #Python to #Rust that way "easily", so long as I can check each little piece to make sure it compiles and produces the correct output along the way.
-
@futurebird @CptSuperlative @emilymbender Porting code - small blocks at a time - is the only really useful use-case I've found so far. But at that, it's pretty great: requires a few tries with feedback until "we" get something that compiles and works, but I can port #Python to #Rust that way "easily", so long as I can check each little piece to make sure it compiles and produces the correct output along the way.
@stevenaleach @CptSuperlative @emilymbender
So... more translation.
-
It's the chinese room problem. Yeah maybe it can generate text that sounds good, but it doesn't know what it's saying in any real sense. And it doesn't know true from false, those are irrelevant concepts to an LLM as it is just using very large statistics to string words together in a way that's statistically plausible.
The machine is fed a shitpost about gluing the cheese on your pizza, now that's an equally valid response for it to give as any real answer bc it has ingested this text.
@neckspike @futurebird @AT1ST In all seriousness, don’t take this the wrong way but:
So what?
Why is that important?
What do you mean by “know what it is saying”?Do you know what you are saying, or are you just repeating arguments that you have read before?
But maybe you do have a meaningful distinction here.
If so, what question can we ask it, and how should we interpret the response, to tell the difference?The pizza thing is not particularly interesting because it’s just a cultural literacy test. It’s common for humans new to an unfamiliar culture to be similarly pranked. And that was a particularly cheap AI.
-
@futurebird @CptSuperlative @emilymbender To be clear on this, I'm one of the people actually using it -- though I'll be the first to admit that my uses aren't particularly vital or great. And I've seen a few other truly viable uses. I think my favorite was one where someone set it up to roleplay as the super of their facility so they could come up with arguments against anything the super might try to use to avoid fixing something, lol.
I just feel like it's always important to add that reminder "by the way, you can't 100% trust what it says" for anything where accuracy actually matters (such as summaries) because they work in such a way that people do legitimately forget this.
@nazokiyoubinbou @futurebird @CptSuperlative @emilymbender I've used it for that sort of roleplay somewhat recently. I fed an LLM some emails from and described the personality of and group dynamics surrounding my nemesis (condo association governance is important but fuuuucckk), and asked it to help me brainstorm ways of engaging on our community listserv that wouldn't leave them too much room to be ... the way that they tend to be. It was actually super helpful for that.
-
@neckspike @futurebird @AT1ST In all seriousness, don’t take this the wrong way but:
So what?
Why is that important?
What do you mean by “know what it is saying”?Do you know what you are saying, or are you just repeating arguments that you have read before?
But maybe you do have a meaningful distinction here.
If so, what question can we ask it, and how should we interpret the response, to tell the difference?The pizza thing is not particularly interesting because it’s just a cultural literacy test. It’s common for humans new to an unfamiliar culture to be similarly pranked. And that was a particularly cheap AI.
If you ask an LLM "can you simulate what it would be like if X were not in your data set?" it may say "yes"
And then it may do something. But it will NOT be simulating what it would be like if X were not in the data set.
It's giving the answer that seems likely if X were not in the data set.