I use speech to text in apple's notepad on my phone.
-
@futurebird @canacar most text processing are done on device, including voice to text.
@rom yes, Apple has been good at making sure user has knowledge and control over how these things work. I expect local processing, as smaller LLMs can run on newer phones, but Apple also recently announced "Private Cloud Compute" which is supposed to offload compute intensive tasks to cloud while preserving privacy. Not sure which one this is, but turning off all network connectivity and checking if the transcription quality changes should help.
@futurebird It seems Apple is not immune to pushing the LLM hype into their devices. They did delay rolling out "LLM Siri" and had papers about LLMs inability to do reasoning, so their devices may still be a better choice in that regard, especially considering how MS and Google are fully on board with this.
-
@futurebird @canacar Apple has made a big deal about having to “opt-in” to AI related services that happen off device. I am certain that the text to speech model is something like whisper running on your phone.
Quantized inference is relatively small and efficient. Most of the energy demands are on the training side.
How to get Apple Intelligence - Apple Support
Apple Intelligence features are integrated across apps and experiences to help users communicate, express themselves, and get things done. Feature availability can vary by platform, language, and region, as noted.
Apple Support (support.apple.com)
“To get started with Apple Intelligence features on your compatible iPhone, iPad, Mac, or Apple Vision Pro, update your device to the latest software version, and ensure you have Apple Intelligence turned on under Settings > Apple Intelligence & Siri.”
-
How to get Apple Intelligence - Apple Support
Apple Intelligence features are integrated across apps and experiences to help users communicate, express themselves, and get things done. Feature availability can vary by platform, language, and region, as noted.
Apple Support (support.apple.com)
“To get started with Apple Intelligence features on your compatible iPhone, iPad, Mac, or Apple Vision Pro, update your device to the latest software version, and ensure you have Apple Intelligence turned on under Settings > Apple Intelligence & Siri.”
-
I'm experiencing stress from wondering where exactly this tech is being injected all of the time.
This might drive me off the iPhone at last, although I did just get my screen fixed and felt like I ought to be good for two more years.
@futurebird @canacar Apple is doing that on the phone, it's a small pre-trained AI model that lives in the "neural engine" or whatever they call that part of the CPU.
There are supposedly some new iOS services that will ask nicely if they can take your data offsite for better processing, but that should be all opt-in.
-
F myrmepropagandist shared this topic
-
@futurebird @canacar Apple is doing that on the phone, it's a small pre-trained AI model that lives in the "neural engine" or whatever they call that part of the CPU.
There are supposedly some new iOS services that will ask nicely if they can take your data offsite for better processing, but that should be all opt-in.
-
@futurebird @canacar That's how I understand it.
Apple's AI push seems to have fizzled out a bit. They will certainly try and expand Siri with more features that require a data-center, but speech recognition should still be all in device.
-
@futurebird @canacar That's how I understand it.
Apple's AI push seems to have fizzled out a bit. They will certainly try and expand Siri with more features that require a data-center, but speech recognition should still be all in device.
-
I use speech to text in apple's notepad on my phone. Since the most recent OS update I've noticed a big jump in quality.
Wondering if the text is getting brushed up by an LLM?
For example: I narrated a paragraph about ants and it got the word "integument" correct. (I fully expected it to be wrong. "In Ted, good man")
This happens to be one of the few things I think LLMs do well --but I also would like to know about all the water I'm destroying.
@futurebird I don't know the details of that Apple product, but It probably uses a lot less power than you're imagining.
There's a diminishing returns problem with the intelligence of LLMs, and the companies chasing those diminishing returns are building massive data-centers.
But if you're not trying to build something that can pass as a sentient being, then LLMs don't have to be power-hungry.
There are small models that are actually quite useful for common tasks.
That's getting lost in this rush to try to build the world's smartest AI that can just do anything you ask it to.
-
@futurebird I don't know the details of that Apple product, but It probably uses a lot less power than you're imagining.
There's a diminishing returns problem with the intelligence of LLMs, and the companies chasing those diminishing returns are building massive data-centers.
But if you're not trying to build something that can pass as a sentient being, then LLMs don't have to be power-hungry.
There are small models that are actually quite useful for common tasks.
That's getting lost in this rush to try to build the world's smartest AI that can just do anything you ask it to.
Why isn't functional speech to text more of a big deal? Seems like a massive tech win that could change workflows all over the place.
-
@futurebird @canacar That's how I understand it.
Apple's AI push seems to have fizzled out a bit. They will certainly try and expand Siri with more features that require a data-center, but speech recognition should still be all in device.
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
-
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
-
@tsturm @futurebird @canacar
I believe the Android speech recognition and translation functions are also local models. And the Firefox language translator is also entirely local.There's lots of truly useful and significant tools coming from recent ML advances (one, Alphafold, got a Nobel prize), but they're not LLM chatbots and don't get all this public recognition or money.
@jannem @tsturm @futurebird @canacar
None of those accomplishments were by generative AI were they? All the crappy AI is generative. Translation and speech recognition are just mapping, no hallucinations.
-
I'm experiencing stress from wondering where exactly this tech is being injected all of the time.
This might drive me off the iPhone at last, although I did just get my screen fixed and felt like I ought to be good for two more years.
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@jannem @tsturm @futurebird @canacar
None of those accomplishments were by generative AI were they? All the crappy AI is generative. Translation and speech recognition are just mapping, no hallucinations.
@Phosphenes @tsturm @futurebird @canacar
"Generative ai" is a misnomer. Some useful tools use the same kind of architecture as image or video generators (text to speech for instance), and some use the same kind of transformer architecture as chatbots.But that's all implementation details. That's not important (it's like arguing what language was used to write a specific program). What matters is what it's used for, and by whom.
-
@Phosphenes @tsturm @futurebird @canacar
"Generative ai" is a misnomer. Some useful tools use the same kind of architecture as image or video generators (text to speech for instance), and some use the same kind of transformer architecture as chatbots.But that's all implementation details. That's not important (it's like arguing what language was used to write a specific program). What matters is what it's used for, and by whom.
@jannem @Phosphenes @tsturm @canacar
I think the "generative" adjective is about if one is using the training data to correctly match, or to extrapolate.
Sometimes when I do dictation there is a loud noise or I mumble: I get a bunch of nonsense. The new nonsense is much more like normal sentences. It's doing a better job guessing in that way. I said a sentence and it only has a few sounds so it gives me a sentence (the wrong one)
But this same improvement lets it get words right more often.
-
@futurebird
@canacar Can't you just disconnect from any network (telco & wifi) and try? Does it still work offline? -
@futurebird @canacar Apple has made a big deal about having to “opt-in” to AI related services that happen off device. I am certain that the text to speech model is something like whisper running on your phone.
Quantized inference is relatively small and efficient. Most of the energy demands are on the training side.
@gdupont pointed out I could just disconnect from internet and wifi and test it again. So I did.
And it works great! It really must be doing most of the work locally, including the more fancy stuff where it goes back and fixes words as you add more context.
That makes me very happy because I like this feature.
-