i could be wrong, but AI is all based on images and text it scrapes from the interwebs. is that not a basic flaw? monkey-see-monkey-do?
every now and then i do image searches for AI Fails (which really should be more popular than it is) but one funny fail was raw salmon filets floating in a river. the AI text was “salmon in the river”.
AI is like any other tech really, we have to find a problem for it to solve. but like so many other things, it can be easily used for bad too. it’s all creepy and strange but cool!
I think they keep the contents/definition of their “database” of training content as a trade secret. I think, in general, because the training process takes a long time (a lot of computer power) and a lot of computer RAM, and so that it can be made repeatable, it is done with a local database (rather than actively being online and grabbing content as it goes.) I presume the training database is somewhat curated, but it is unclear if that makes a significant difference to the end result. Certainly there are expert systems that are only trained on a specific sub-set of data that limit them to a specific field of “expertise.” I think, in the longer term, there will be more training data that is more vetted and relies less on the randomness of training data scraped from the Internet.
The ML model is basically simulating a network of neurons and is left to be free to “make connections” on its own. It’s basically a form of statistical modelling and I believe one of the approaches used is to start with a randomized set of connections and let the training data refine them. This could be the source of the “hallucinations” … the training process could allow the model to infer connections that, in essence, define non-sense. It’s possible humans do the same thing (children are known to have “make believe” friends, for example,) but our culture tends to suppress certain behaviours, and call for more of others. Perhaps this is a model for how we need to train AI’s too. Perhaps they need to interact with “guides” of some sort that suppress some development and encourage other types of development.
This may be the case for the publicly accessible LLM services, ChatGPT etc. So yes, utilizing a massive generic web scrape is a flaw in that case. Another flaw is in the way people use them. A large language model is merely trying to guess the most appropriate word in a given sentence, and the software will prioritize that over providing accurate information. Your mobile phone will do the same thing with predictive text, the only difference being the autocorrect database doesn’t have a web crawl behind it.
However, ML platforms that are more focused, purpose-built are trained on relevant data. For instance, I know of a company that monitors wind turbines using many thousands of cameras. They used to have an army of interns reviewing footage for certain events. After training an ML model on the intern’s actions, it was able to identify these events with greater accuracy than the interns. It’s a really well-trained model, but it won’t draw you a picture of a banana.
The marketing term “AI” is really causing a lot of confusion and problems in this way. Hopefully we’ll get past this in a few years, because machine learning is really not an “AI” in the way people think of that word. It’s my belief that continued development of this software can never lead to a true general artificial intelligence. It’s like trying to improve a orange so much that it becomes an apple (or vice-versa if you prefer apples!).
thank you for that last paragraph! i agree 110% AI this AI that. reminds me of everything else really, but one story in particular comes to mind. i sold stereo gear back in the day and digital media (CD, DAT etc) hit that price point where everyone could get one etc… a speaker company decided to slap a phrase on their boxes “DIGITAL READY” because they could measure response from 20hz to 20khz which is what digital CDs freq response was.
Honestly, I’m not sure we know what human (i.e. un-artificial) intelligence is. If you don’t really understand how the thing you’re trying to copy works, how do you know when you have a “working” copy? I’m afraid the average human can easily be fooled by a “good enough” copy, and make a leap from “in knows how to form a cogent sentence” to “it understands the meaning of the cogent sentences it forms.” It feels like this sense of “we assume other things that appear to be human must be human” is going to be a huge source of folly.
it’s a funny thing – similar but i don’t know if “they” call it AI or not, watching NFL games the announcers sometimes say “according to amazon AWS, you go for it here on 4th and 3 instead of kicking the field goal” that really makes me chuckle because unless there’s something i don’t realize going on with AI calculations, it sounds like they are playing percentages, which sounds like chance, which ends up being 50/50 100% of the time!
and that starts with P and that rhymes with T and that stands for TROUBLE!
If a commentator has to fall back on computer predictions, they aren’t the commentator for me. I want somebody passionate about the sport, who can talk their head off without having to refer to a piece of paper or a computer screen.
I used to love Murray Walker and James Hunt doing F1 in England and Niki Lauder in Germany, for example, they knew the sport and they knew the drivers.
until '98 or '99, we had max mcgee and jim irwin calling packer games on local radio and talk about knowledge of the sport – wow. none of the bull all facts and strategy and insights. if packer fans are the most knowledgeable it’s because those two trained them over the radio