This story about ChatGPT causing people to have harmful delusions has mind-blowing anecdotes.
-
@necedema @Wyatt_H_Knott @futurebird @grammargirl Have you validated your alleged tutor against a topic in which you are well versed?
My experience has been that, for topics in which I am an expert I immediately spot many errors in the outputs generated.
Misunderstood concepts, mistakenly explained, but with utter confidence and self-assured tone - absolutely no doubt or uncertainty.
So, if for topics in which I'm versed I can see it's not reliable, how could I rely on it for anything else?
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl > Have you validated your alleged tutor against a topic in which you are well versed?
ChatGPT and related programs get things wrong in areas I know well, over and over again.
They've told me about stars and planets that don't exist, how to change the engine oil in an electric car, how water won't freeze at 2 degrees above absolute zero because it's a vapor at that temperature, that a person born in the 700s was a major "7th century" figure ...
over and over again, like that.
But worst of all? All delivered with a confidence that makes it very hard for people *who don't know the topic *to tell that there is a problem at all.
-
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl > Have you validated your alleged tutor against a topic in which you are well versed?
ChatGPT and related programs get things wrong in areas I know well, over and over again.
They've told me about stars and planets that don't exist, how to change the engine oil in an electric car, how water won't freeze at 2 degrees above absolute zero because it's a vapor at that temperature, that a person born in the 700s was a major "7th century" figure ...
over and over again, like that.
But worst of all? All delivered with a confidence that makes it very hard for people *who don't know the topic *to tell that there is a problem at all.
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl Oh, and while I'm on this topic, here's your periodic reminder of something else important:
People in my part of the world (the USA), and very likely those in similar or related cultures, have a flawed mental heuristic for quickly judging if someone or something is "intelligent": "do they create grammatically fluent language, on a wide range of topics, quickly and on-demand"?
This heuristic is faulty -- it is very often badly wrong. Not going to have a long, drawn-out debate about this: it's faulty. The research is out there and not hard to find, if you really need it.
In the case of LLMs, this heuristic leads people to see intelligence where there isn't any. That's bad enough. But it also leads people to *fail*, or even *refuse*, to acknowledge intelligence where it does exist -- specifically, among people who don't talk or write very articulately.
In the specific case of the USA, this same heuristic is proving to be very dangerous indeed, with the federal government wanting to create official registries of autistic people, for example. The focus is overwhelmingly directed towards autistic people who can't or don't speak routinely, and it's *appalling*.
-
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl > Have you validated your alleged tutor against a topic in which you are well versed?
ChatGPT and related programs get things wrong in areas I know well, over and over again.
They've told me about stars and planets that don't exist, how to change the engine oil in an electric car, how water won't freeze at 2 degrees above absolute zero because it's a vapor at that temperature, that a person born in the 700s was a major "7th century" figure ...
over and over again, like that.
But worst of all? All delivered with a confidence that makes it very hard for people *who don't know the topic *to tell that there is a problem at all.
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl Representative example, which I did *today*, so the âyouâre using old tech!â excuse doesnât hold up.
I asked ChatGPT.com to calculate the mass of one curie (i.e., the amount producing a specific number of radioactive decays per second) of the commonly used radioactive isotope cobalt-60.
It produced some nicely formatted calculations that, in the end, appear to be correct. ChatGPT came up with 0.884 mg, the same as Wikipediaâs 884 micrograms on its page for the curie unit.
It offered to do the same thing for another isotope.
I chose cobalt-14.
This doesnât exist. And not because itâs really unstable and decays fast. It literally canât exist. The atomic number of cobalt is 27, so all its isotopes, stable or otherwise, must have a higher mass number. Anything with a mass number of 14 *is not cobalt*.
I was mimicking a possible Gen Chem mixup: a student who confused carbon-14 (a well known and scientifically important isotope) with cobalt-whatever. The sort of mistake people see (and make!) at that level all the time. Symbol C vs. Co. Very typical Gen Chem sort of confusion.
A chemistry teacher at any level would catch this, and explain what happened. Wikipedia doesnât show cobalt-14 in its list of cobalt isotopes (it only lists ones that actually exist), so going there would also reveal the mistake.
ChatGPT? It just makes shit up. Invents a half-life (for an isotope, just to remind you, *cannot exist*), and carries on like nothing strange has happened.
This is, quite literally, one of the worst possible responses to a request like this, and yet I see responses like this *all the freaking time*.
-
@hosford42 @necedema @Wyatt_H_Knott @grammargirl
So, almost no one is using this tool in this way.
Very few are running these things locally. Fewer still creating their own (attributed, responsibly obtained) data sources. What that tells me is this isnât about the technology that allows this kind of recomposition of data itâs about using (exploiting) the vast sea of information online in a novel way.
@hosford42 @necedema @Wyatt_H_Knott @grammargirl
There's a whole movement of running them locally but it's niche, though the smaller versions are getting better.
But nobody is creating them locally. Fine-tuning with your own. data yes, but that's a little sauce over existing training data.
-
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl Representative example, which I did *today*, so the âyouâre using old tech!â excuse doesnât hold up.
I asked ChatGPT.com to calculate the mass of one curie (i.e., the amount producing a specific number of radioactive decays per second) of the commonly used radioactive isotope cobalt-60.
It produced some nicely formatted calculations that, in the end, appear to be correct. ChatGPT came up with 0.884 mg, the same as Wikipediaâs 884 micrograms on its page for the curie unit.
It offered to do the same thing for another isotope.
I chose cobalt-14.
This doesnât exist. And not because itâs really unstable and decays fast. It literally canât exist. The atomic number of cobalt is 27, so all its isotopes, stable or otherwise, must have a higher mass number. Anything with a mass number of 14 *is not cobalt*.
I was mimicking a possible Gen Chem mixup: a student who confused carbon-14 (a well known and scientifically important isotope) with cobalt-whatever. The sort of mistake people see (and make!) at that level all the time. Symbol C vs. Co. Very typical Gen Chem sort of confusion.
A chemistry teacher at any level would catch this, and explain what happened. Wikipedia doesnât show cobalt-14 in its list of cobalt isotopes (it only lists ones that actually exist), so going there would also reveal the mistake.
ChatGPT? It just makes shit up. Invents a half-life (for an isotope, just to remind you, *cannot exist*), and carries on like nothing strange has happened.
This is, quite literally, one of the worst possible responses to a request like this, and yet I see responses like this *all the freaking time*.
@dpnash Yes, I agree. GenAI can never give answers: it can only suggest questions to ask or things to investigate.
That can be useful, but it requires awareness and technical knowledge to understand the distinction, which is why I don't think genAI was ready for a broad public release.
But would I be OK with a doctor using genAI on an anonymised version of my lifetime medical records to spot possible missed patterns? Yes, I think so.
-
@dpnash Yes, I agree. GenAI can never give answers: it can only suggest questions to ask or things to investigate.
That can be useful, but it requires awareness and technical knowledge to understand the distinction, which is why I don't think genAI was ready for a broad public release.
But would I be OK with a doctor using genAI on an anonymised version of my lifetime medical records to spot possible missed patterns? Yes, I think so.
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl
A model like you're talking about most likely would be some form of https://wikipedia.org/wiki/Convolutional_neural_network , and bear about as much resemblance to OpenAI's offerings as giraffe does to a housecat.
-
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl
A model like you're talking about most likely would be some form of https://wikipedia.org/wiki/Convolutional_neural_network , and bear about as much resemblance to OpenAI's offerings as giraffe does to a housecat.
@Orb2069 @david_megginson @mzedp @necedema @Wyatt_H_Knott@masto.host @futurebird @grammargirl My own take on valid uses of this stuff:
If by "AI" we mean something like "machine learning, just with a post-ChatGPT-marketing change", then the answer is "oh, absolutely, assuming the ML part has been done competently and is appropriate for this type of data." And there are plenty of uses in medicine I can imagine for *certain kinds* of ML.
IF by "AI" we mean "generative AI", either in general or in specific (e.g., an LLM like ChatGPT), then the answer is "hell no, absolutely not, please don't even bother asking" for most things, including everything to do with medicine, however tangential.
The one single good general use case for "generative" AI is "make something that looks like data the AI has seen before, without regard for whether it reflects anything real or accurate." Disregarding known issues* with how generative AI is built and deployed nowadays, it's fine for things like brainstorming when you get stuck writing, or seeing what different bits of text (including code) might look like, in general, in another language (or computer programming language, in the case of code). But it's absolutely terrible for any process where factual content or accuracy matters (e.g., online search, or actually writing the code you plan to use), and I put all medical uses in that category.
* Plagiarism, software license violations, massive energy demands, etc. That's an extra layer of concern that I have quarrels with, but even without these, bad factual accuracy is a dealbreaker for me in almost all scenarios I actually envision for the stuff.
-
@mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl Representative example, which I did *today*, so the âyouâre using old tech!â excuse doesnât hold up.
I asked ChatGPT.com to calculate the mass of one curie (i.e., the amount producing a specific number of radioactive decays per second) of the commonly used radioactive isotope cobalt-60.
It produced some nicely formatted calculations that, in the end, appear to be correct. ChatGPT came up with 0.884 mg, the same as Wikipediaâs 884 micrograms on its page for the curie unit.
It offered to do the same thing for another isotope.
I chose cobalt-14.
This doesnât exist. And not because itâs really unstable and decays fast. It literally canât exist. The atomic number of cobalt is 27, so all its isotopes, stable or otherwise, must have a higher mass number. Anything with a mass number of 14 *is not cobalt*.
I was mimicking a possible Gen Chem mixup: a student who confused carbon-14 (a well known and scientifically important isotope) with cobalt-whatever. The sort of mistake people see (and make!) at that level all the time. Symbol C vs. Co. Very typical Gen Chem sort of confusion.
A chemistry teacher at any level would catch this, and explain what happened. Wikipedia doesnât show cobalt-14 in its list of cobalt isotopes (it only lists ones that actually exist), so going there would also reveal the mistake.
ChatGPT? It just makes shit up. Invents a half-life (for an isotope, just to remind you, *cannot exist*), and carries on like nothing strange has happened.
This is, quite literally, one of the worst possible responses to a request like this, and yet I see responses like this *all the freaking time*.
@dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl it seems to be an error exclusively in ChatGPT, Llama and Granite answered correctly.
-
@dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl it seems to be an error exclusively in ChatGPT, Llama and Granite answered correctly.
@notatempburner @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I donât know what Google Gemini has under the hood for its LLM, but as of a couple days ago, Gemini made similar mistakes with other isotope names/numbers.
HmmmâŚ
Yep. It bombs this one too.
These two genAI services (ChatGPT and Google) are *probably* the ones most people see most often. Not that it matters a whole lot, of course. Other LLMs will just make different mistakes with different facts.
-
@notatempburner @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I donât know what Google Gemini has under the hood for its LLM, but as of a couple days ago, Gemini made similar mistakes with other isotope names/numbers.
HmmmâŚ
Yep. It bombs this one too.
These two genAI services (ChatGPT and Google) are *probably* the ones most people see most often. Not that it matters a whole lot, of course. Other LLMs will just make different mistakes with different facts.
@dpnash @notatempburner @mzedp @necedema @Wyatt_H_Knott @grammargirl
I mean the reason it fails is it's all associative aggregation. This is a powerful tool in some ways but useless one in others.
I was horrified that teachers said LLMs could "improve their lesson plans" but I looked at the results. It was OK.
So I tried it a math lesson.
LMAO. It's so bad at math.
One thing to say "this letter game could also have this other game as an alternate" for a math problem doesn't work.
-
@dpnash Yes, I agree. GenAI can never give answers: it can only suggest questions to ask or things to investigate.
That can be useful, but it requires awareness and technical knowledge to understand the distinction, which is why I don't think genAI was ready for a broad public release.
But would I be OK with a doctor using genAI on an anonymised version of my lifetime medical records to spot possible missed patterns? Yes, I think so.
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I would definitely not be okay with a physician using generative AI to summarize medical records and/or to look for patterns. There would be no way to know if the output would be in any way accurate. Unless the doctor reviewed every piece of information that went into the AI. Which is no different than the doctor reviewing the records without AI.
-
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I would definitely not be okay with a physician using generative AI to summarize medical records and/or to look for patterns. There would be no way to know if the output would be in any way accurate. Unless the doctor reviewed every piece of information that went into the AI. Which is no different than the doctor reviewing the records without AI.
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I say this as a retired physical therapist who worked with complex patient for decades. They routinely came to me with years of medical records I had to review & synthesize. And there are nuances to narrative records that I would pick up because of my long experience that I seriously doubt any algorithm could.
-
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I would definitely not be okay with a physician using generative AI to summarize medical records and/or to look for patterns. There would be no way to know if the output would be in any way accurate. Unless the doctor reviewed every piece of information that went into the AI. Which is no different than the doctor reviewing the records without AI.
@LJ @david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @grammargirl
Doctors already apply stereotypes enough as it is. I don't want a stereotyping machine. And I feel like that's what the current LLMs would do. For example, as a Black woman who lives in the South Bronx doctors make a whole raft of assumptions about me. Some correct, some based on "statistics" and some just bogus.
They have a general idea of what I might need often wrong in harmful ways.
LLMs regress to the mean.
-
@david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @futurebird @grammargirl I say this as a retired physical therapist who worked with complex patient for decades. They routinely came to me with years of medical records I had to review & synthesize. And there are nuances to narrative records that I would pick up because of my long experience that I seriously doubt any algorithm could.
@LJ @david_megginson @dpnash @mzedp @necedema @Wyatt_H_Knott @grammargirl
Or even better, maybe you look at a set of records, as I often have for my students and I DON'T know what I'm seeing. I can't say "this student is struggling because of X and needs Y they don't fit any pattern. Details don't work. LLMS kind of smooth that over and just shoehorn an answer that sounds good.
And some people do that too, but I don't like it.
Just say "I don't know really. Never seen this before."