I use speech to text in apple's notepad on my phone.
-
I use speech to text in apple's notepad on my phone. Since the most recent OS update I've noticed a big jump in quality.
Wondering if the text is getting brushed up by an LLM?
For example: I narrated a paragraph about ants and it got the word "integument" correct. (I fully expected it to be wrong. "In Ted, good man")
This happens to be one of the few things I think LLMs do well --but I also would like to know about all the water I'm destroying.
@futurebird I don't know the details of that Apple product, but It probably uses a lot less power than you're imagining.
There's a diminishing returns problem with the intelligence of LLMs, and the companies chasing those diminishing returns are building massive data-centers.
But if you're not trying to build something that can pass as a sentient being, then LLMs don't have to be power-hungry.
There are small models that are actually quite useful for common tasks.
That's getting lost in this rush to try to build the world's smartest AI that can just do anything you ask it to.
-
@futurebird I don't know the details of that Apple product, but It probably uses a lot less power than you're imagining.
There's a diminishing returns problem with the intelligence of LLMs, and the companies chasing those diminishing returns are building massive data-centers.
But if you're not trying to build something that can pass as a sentient being, then LLMs don't have to be power-hungry.
There are small models that are actually quite useful for common tasks.
That's getting lost in this rush to try to build the world's smartest AI that can just do anything you ask it to.
Why isn't functional speech to text more of a big deal? Seems like a massive tech win that could change workflows all over the place.
-
@futurebird @canacar That's how I understand it.
Apple's AI push seems to have fizzled out a bit. They will certainly try and expand Siri with more features that require a data-center, but speech recognition should still be all in device.
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
-
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
-
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
@jannem @tsturm @futurebird @canacar
None of those accomplishments were by generative AI were they? All the crappy AI is generative. Translation and speech recognition are just mapping, no hallucinations.
-
I'm experiencing stress from wondering where exactly this tech is being injected all of the time.
This might drive me off the iPhone at last, although I did just get my screen fixed and felt like I ought to be good for two more years.
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@jannem @tsturm @futurebird @canacar
None of those accomplishments were by generative AI were they? All the crappy AI is generative. Translation and speech recognition are just mapping, no hallucinations.
@Phosphenes @tsturm @futurebird @canacar
"Generative ai" is a misnomer. Some useful tools use the same kind of architecture as image or video generators (text to speech for instance), and some use the same kind of transformer architecture as chatbots.But that's all implementation details. That's not important (it's like arguing what language was used to write a specific program). What matters is what it's used for, and by whom.
-
@Phosphenes @tsturm @futurebird @canacar
"Generative ai" is a misnomer. Some useful tools use the same kind of architecture as image or video generators (text to speech for instance), and some use the same kind of transformer architecture as chatbots.But that's all implementation details. That's not important (it's like arguing what language was used to write a specific program). What matters is what it's used for, and by whom.
@jannem @Phosphenes @tsturm @canacar
I think the "generative" adjective is about if one is using the training data to correctly match, or to extrapolate.
Sometimes when I do dictation there is a loud noise or I mumble: I get a bunch of nonsense. The new nonsense is much more like normal sentences. It's doing a better job guessing in that way. I said a sentence and it only has a few sounds so it gives me a sentence (the wrong one)
But this same improvement lets it get words right more often.
-
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@futurebird @canacar Apple has made a big deal about having to “opt-in” to AI related services that happen off device. I am certain that the text to speech model is something like whisper running on your phone.
Quantized inference is relatively small and efficient. Most of the energy demands are on the training side.
@gdupont pointed out I could just disconnect from internet and wifi and test it again. So I did.
And it works great! It really must be doing most of the work locally, including the more fancy stuff where it goes back and fixes words as you add more context.
That makes me very happy because I like this feature.
-