Beep boop - this is a robot. A new show has been posted to TWiT…
What are your thoughts about today’s show? We’d love to hear from you!
Beep boop - this is a robot. A new show has been posted to TWiT…
What are your thoughts about today’s show? We’d love to hear from you!
I am with Steve. If the website has explicitly said that Perplexity (and other AIs) shouldn’t scrape their site, they should respect that, not use an undocumented agent to do it instead.
Whether the AI is using the scraped information to learn from the site or to provide the user with a one-off summary of the site doesn’t matter. The owner of the site has taken time and effort to produce the content and to format it in the way they want it to be shown to visitors.
@Leo ‘s argument that there is no difference to Perplexity scraping the site and providing a summary in its own words, with its own formatting, is no different to a web browser visiting the site is specious. The former doesn’t take you to the site and it doesn’t use the author’s own words and display it as the author wanted, the latter does explicitly that.
Leo’s argument is that, if he goes to a military compound to look around, the guards will turn him away, because he doesn’t have permission to visit and look at the compound (equivalent to him being Perplexity and he visits a site and robots.txt explicitly tellis him to bog off). Now, if he comes back after dark in camo-gear and cuts through the wires and goes through the camp, or he steals a security pass from a member of staff and cons his way onto the military base, that is totally okay.
He is welcome to try his hypothesis out, but I doubt the guards will be too pleased with him, when he is caught…
What Perplexity should do is put up a statement saying that the site owner has explicitly requested that Perplexity does not scrape the site and the user should follow the provided link to the site in their browser, if they want to find out more information about it.
Why should multi-billion dollar companies have the right to read for free? They are selling the results of the information they have scraped, without paying the content creators? Right to read, Leo, walk into your local bookshop and try walking out with an arm full of books without paying.
Blocking the AI from reading doesn’t affect your right to read, one is a multi-billion dollar business, the other is an individual who has rights, and you can still go to those sites with a browser and read it for yourself.
Edit: we used to have sites blocked, if you went there with a specific browser - mainly Firefox, because the sites were designed to work with Internet Explorer and later, the other way round, IE was blocked, because it couldn’t display the site correctly.
I think that AI needs information, but uncurated information from the Internet isn’t the solution, there is so much misinformation and bias out there, meaning the results are often unhelpful und biased.
Edit: They should be paying for quality training information, I mean, they are selling the results of that training, so they just need to up their prices to cover doing business legitimately.
25 saved passwords is diabolical in a world where there’s a Google Password Manager & Passwords (previously Keychain Access) on Apple platforms. I consider myself a power user and I’ve never felt the need to pay / try any of the popular leading brand password managers.
Great episode!! I actually listened to the entire thing. I usually start Security Now episodes but I rarely finish them due to ADHD distractions.
I used to use LastPass, but switched to 1Password before the major breach at the end of the teens - I had moved everything across and, luckily, deleted every password from my vault a month or two before the leak happened, but I still went through and changed all my passwords.
I regularly switch browsers and platforms (Windows, Linux, macOS, iOS, Android) and use computers at home and work, the latter often dictates which browser(s) we can use. For those reasons, a platform and browser independent password safe is very important to me.
Also, at work, we have a few hundred departmental level passwords (switches, firewalls, servers, SaaS and Cloud services etc.) that everybody in the department needs. Having shared vaults for those sorts of passwords, plus a private vault for personal accounts is a godsend. As is being able to share passwords with my family. My wife wanted to set up Netflix in her hobby area, I just shared the password with her in 1Password, until she had logged the device on.
1Password also allows multiple accounts to be open at one time. The best part is, I now get my Family account for free, as 1Password offer a free Family account for each business account you have, so, until I change employers, I won’t have to pay for my family.
I’m mostly in agreement with your thoughts, and I think Leo is in the wrong. The bottom line is these ML orgs should be good netizens and respect the wishes of rights holders until legislation catches up with reality.
But…
I take issue with your analogy of a web property to a military base. I’ve always thought a public site is more akin to a poster or a billboard located in a public area (relevant XKCD - https://www.explainxkcd.com/wiki/images/f/f5/cia.png), with a robots.txt file sort of like a subtext saying “don’t read this poster if you’re ‘X’.” Is it fair and reasonable to expect any ‘X’ to abide by this subtext? Better yet, is it wise to base business choices on the expectation that this subtext will be remotely effective?
IMO - a site protected by a paywall or even simply requiring a free membership would fit your military base analogy better, and if Perplexity or anyone was actively circumventing that sort of protection then I’d sit up and pay a bit more attention. I would posit that anyone who is concerned about their material being used in ways they don’t want shouldn’t be putting that material up on a proverbial billboard, and should be publishing it in a military base.
Maybe a bit too much from my side, but the same for the security in a building. The robot.txt is the security for entering the site, if you are on the unwelcome list, you don’t get to come in, or rather, security is usually the other way round, but the point is the same.
but , because the web is the way it is, most can’t afford a security guard and they have to make do with a ‘don’t walk on the grass’ sign for web crawlers. Companies should respect that, plain and simple. If they want to play on the grass, they have to gain permission first, like buying a pass to use the grass.
I haven’t watched/listened to this episode, but since people are talking about Perplexity ignoring website features….
One of my company’s sister company’s website is getting “attacked” by what appears to be Perplexity. When this occurs, it also affects my company’s website since it’s housed on the same server. Both of our sites have basic Cloudflare protection (it’s part of the service provided by our hosting provider, our parent company pays for a dedicated server for all of our sites). Apparently, Cloudflare is unable to stop these crawlers, or at least that’s what I’ve heard.
Steve mentioned uBlock Origin on this episode and today had a complaint I wanted to share with the forums so I’m posting it here.
Sites have already started getting better at working around mv3 compliant blockers. It was easy to get used to having no ads on my iPad but now it’s back sometimes, it’s always humbling to be reminded how quick you get used to new comforts.
What is interesting is that browsers (Safari on iOS, for example) have built in AI summaries. The line between that and what Leo described is thinner and less clear than one would think.