TWIG 703: Spicy Autocomplete

Beep boop - this is a robot. A new show has been posted to TWiT…

What are your thoughts about today’s show? We’d love to hear from you!

1 Like

One of the biggest problems with AI, and Mike glossed over it quickly, is the training set.

It was mentioned that in the past, people were punching information into the AI and now the AI is training itself with “the Internet”.

I think this is a big problem with the current AI models. We suddenly have all this data available, “free” and easy to access, so the AIs are just pointed at it and told to get on with it.

But that is an age old problem, as old as information processing in general: garbage in, garbage out.

The Internet is not an authoratative source for anything. It is so full of contradictions, conspiracy theories, abuse and everything nasty about the human race - I can see where the stories comes from of rogue AIs deciding that the human race needs to be protected from itself!

Of course, there are some good sources on the Internet and some qualified information, but those are small islands floating in a vile, filth ridden sea. The data that is fed into the AIs needs to be sorted out, so that misinformation, hatred etc. are left at the door, and only “real” informaiton is allowed in.

Mike also pointed out that in some countries, like Iran, I think he said, that the regimes decide what information is approved. This goes then too far in the other direction, because it is done under the guise of deep religious belief.

I have posted this elsewhere, but it reminds me a lot of the Babylon 5 episode “Infection”, not a particularly good one, I believe MJS had a tight dealine and was very ill when he wrote it. But that is beside the point, the underlying story is actually very relevant to the current situation - at leas the background story.

The Ikarans were at war with each other and they made an AI weapon to “cleanse the planet” of all non-pure Ikarans. The problem is, well the real problem is they made an AI weapon, but the other mistake was that they got the religious class to define what a pure Ikaran was. This was directly out of their sacred texts… And nobody on the planet was able to live up to this religious definition of what an Ikaran should be, so the weapon depopulated the whole planet.

Now, that is a little extreme, and our AI can’t currently destroy anything, other than the fragile egos of bloggers asking Bing questions and getting stroppy answers. But it shows very clearly where such AI training could go.

Either we have AIs learning from the unfettered Internet and deciding the human race can’t be allowed to look after itself (The Matrix) or fanatics (religious, political, whatever) get to decide what information is fed into the AI and it only has a very biased, and totally divorced from reality, view of the world.

Neither extreme is what we want. Some middle ground of finding valid, accurate and neutral datasets needs to be found, which don’t have any biases. But, as those datasets will be defined by humans with biases, that is an impossible task.

There needs to be a lot of hard work put into how AIs are trained and how they are restricted, how the law works in regard to them (not least the willy-nilly vacuuming of copyrighted works into their training sets without permission or compensation).

I’ve been interested in AI since my first seminar back in the late 80s, but I still don’t see that we have reached a real watershed moment. With each new generation, we have come a step further, but I still think we have a long way to go, before they will be useful in general purpose use.

I think they will be much more useful in more limited scenarios, such as an example that Brad Sams gave on Paul Thurrott’s First Ring Daily podcast, telling Excel, in plain English, how to perform an analysis of data and how to output it. That is something that can be more easily accomplished.

Likewise, some of the artwork creation tools look interesting, copyright concerns aside. Although I feel they all have a sort of uncanny valley type look to them, they look interesting, but when you look closer, they almost always look artificial, the “soul”. as it were, is missing.

But search? The tests from Bing AI and the “preview” of Bard from Google have both clearly shown that neither is ready to be unleashed on the general public.

The answers need to be 100% accuate every time - which means at the moment, for every AI search you do, you need to do possibly dozens of traditional searches to verify the information that has been returned is accurate! But, once we reach that level of accuracy, people will stop double checking the results and assume they are correct, so that if the search results drift away from reality, people won’t notice, because they are taking the results for granted.

At the moment, on some subjects, I can do a search (traditional or AI) and when I see the results, I’ll go, “hmm, that isn’t right” and I’ll search further. But put a couple of generations in front of AI results who don’t know how to research for themselves, who is going to be able to tell if those results are accurate? Will they even care that the AI is telling them that Genghis Khan and Hitler formed a powerful alliance, which saw them take over most of Asia and the whole of Europe in a dynasty that lasted for nearly 700 years?

3 Likes

I think you’ve nailed the issue with ChatGPT and Bing chat - they’re just referring to an undifferentiated mass of text (i.e. the Internet as a whole). Much easier doing that than training it on authoritative materials only however, so I undertand.

Neeva attempts to only use sites it has approved as authoritative, at least for controversial content, and it works, although eliminates the more spectacular conversations you can get from ChatGPT and Bing.

The former is a parlor trick, the latter actually useful if less entertaining.

5 Likes

A WikipediaGPT would be an interesting (if boring) AI chat agent…

4 Likes

…and the statement in your post will now form part of the training data of the next AI. :wink:

2 Likes