Will most content producers block AI until monetization kicks in?

It appears that the New York Times is making a move to block AI companies from making unauthorized use of their content to train AI.

I wonder if they’ll engage in some sort of hiding text in their content to entrap and/or confuse AIs that don’t respect its wishes. I think it would be pretty easy to publish a block of text that is otherwise gibberish, but extract the actual content using scripting. If the scrapers turned off scripting they would get text, but it would not be useful, and if they turned on scripting, their computing costs could be driven up significantly (while violating the terms of the site,) if the developers are clever.

Rather than engaging in such a war of attrition, it might be smarter for the AI companies to gain a license, which could be another income stream for these sites that are sorely in need of revenue from content. The best way to do that, would be to even have an AI API that the site could enable, to prevent the scraping in the first place.

2 Likes

I think this is the right way, at least for now. The problem is, the AI companies are benefitting from the hard work of journalists, bloggers etc. without having to pay them any compensation, regardless of what copyright is on the works.

This has been seen with authors and artists - they complain the AIs have read their works and the AI companies say, “tough, you put it on the Internet, so it is fair game,” only they didn’t in many cases. People made copies of the works and published them without consent, sometimes a fan posting a copy of a favourite image, but often big pirate sites copying books, images, audio and video illegally, but that doesn’t seem to bother the AI companies.

Either they have to be held to account for training their AIs on “stolen” goods, or they need to find a way to block sites that are wholesale illegal bazaars, full of illicit goods. If I go on holiday and buy a fake Rolex at a market, I expect Customs to probably sieze it, when I re-enter my home country. I might be lucky and get away with it, but if I am coming from a country known for its knock-off products, I can expect them to give it more than a cursory glance.

Likewise, if you order goods over the Internet, they will probably be inspected by customs and illegal substances or knock-off products will be confiscated and I’ll end up having to pay a fine and the goods destroyed.

AI companies need to put in place the same controls. If a site is known to only hold illegal content, it must be blacklisted from the AI training database. On social media sites, for example, it is more difficult as there are a relatively small number of copyright infringing posts of images or excerpts of text from copyrighted works, those will be harder to find and ignore, but nobody said the job would be easy and it is a job they sought to do.

All to long, “Big Tech” has flounced all over the law, “we are too small, we’ll deal with it, when we are bigger!” Only to then say, "we are too big, it would cost too much to implement this at scale, but we can poke those wronged in the eye by giving them a paltry payment and let them do our job for us, by scouring the feeds and flagging up anything they think is copyright infringement (or copy cat goods, for example, on Amazon).

These companies should have been held to account when they were still small, they would have found a way to comply with the law and that method would then have had to scale with the growth of the business. But we let them get away with paying lawyers and fines, until those costs exceeded actually doing something about the problem.

Now small companies scream it is unfair, that they are being held to account and stopped from expanding quickly, which is true, whilst Big Tech reaps the benefits of being able to ignore the law or only partially follow the law, which is also true.

The whole system is broken.

1 Like

Always fun replying to yourself :wink: but this article just went up on Ars Technica. It says the New York Times is planning to sue OpenAI for using their content, presumably with the intention of forcing OpenAI to license it properly and pay for it.

3 Likes