Learning English Urdu Education News UrduPure| AI firms accused of scraping publisher sites without permission |

AI firms accused of scraping publisher sites without permission

Jun 22,2024

Multiple artificial intelligence companies are circumventing a common web standard used by publishers to block the scraping of their content for use in generative AI systems, content licensing startup TollBit has told publishers.

A letter to publishers seen by Reuters on Friday, which does not name the AI companies or the publishers affected, comes amid a public dispute between AI search startup Perplexity and media outlet Forbes involving the same web standard and a broader debate between tech and media firms over the value of content in the age of generative AI.

The business media publisher publicly accused, Perplexity of plagiarizing its investigative stories in AI-generated summaries without citing Forbes or asking for its permission.

A Wired investigation,published this week found Perplexity likely bypassing efforts to block its web crawler via the Robots Exclusion Protocol, or "robots.txt," a widely accepted standard meant to determine which parts of a site are allowed to be crawled.

Perplexity declined a Reuters request for comment on the dispute.

TollBit, an early-stage startup, is positioning itself as a matchmaker between content-hungry AI companies and publishers open to striking licensing deals with them.

The company tracks AI traffic to the publishers' websites and uses analytics to help both sides settle on fees to be paid for the use of different types of content.

For example, publishers may opt to set higher rates for "premium content, such as the latest news or exclusive insights," the company says on its website.

From Apple bouncing back, to what the Pope thinks about bots,

It says it had 50 websites live as of May, though it has not named them.

According to the TollBit letter, Perplexity is not the only offender that appears to be ignoring robots.txt.

TollBit said its analytics indicate "numerous" AI agents are bypassing the protocol, a standard tool used by publishers to indicate which parts of its site can be crawled.

"What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites," TollBit wrote. "The more publisher logs we ingest, the more this pattern emerges."

The robots.txt protocol was created in the mid-1990s as a way to avoid overloading websites with web crawlers. Although there is no legal enforcement mechanism, historically there has been widespread compliance on the web.

More recently, robots.txt has become a key tool publishers have used to block tech companies from ingesting their content free-of-charge for use in generative AI systems that can mimic human creativity and instantly summarize articles.

The AI companies use the content both to train their algorithms and to generate summaries of real-time information.

Some publishers, including the New York Times, have sued AI companies for copyright infringement over those uses. Others are signing licensing agreements with the AI companies open to paying for content, although the sides often disagree over the value of the materials. Many AI developers argue they have broken no laws in accessing them for free.

Thomson Reuters, the owner of Reuters News, is among those that have struck deals to license news content for use by AI models.

Publishers have been raising the alarm about news summaries in particular since Google rolled out a product last year that uses AI to create summaries in response to some search queries.

If publishers want to prevent their content from being used by Google's AI to help generate those summaries, they must use the same tool that would also prevent them from appearing in Google search results, rendering them virtually invisible on the web.

Try Out Our Learn English Speaking in Urdu Mobile App

Go to the App Store or Google Play to get the mobile version.

App Install Now

About UrduPure

UrduPure.com: Your Ultimate English to Urdu Dictionary Resource | Urdu To English Dictionary | English To Urdu Dictionary | Vocabulary | Synonyms & Antonyms Having a robust vocabulary is essential for effective communication. A person with a strong vocabulary can write, speak, and understand others more efficiently. Learning a language is also beneficial for brain health. In Pakistan, both Urdu and English are vital. UrduPure.com offers users an excellent opportunity to enhance their vocabulary in both English and Urdu. It features an exclusive Urdu dictionary and an English dictionary, providing a comprehensive English to Urdu dictionary. For those proficient in English but lacking in Urdu vocabulary, UrduPure.com offers a valuable service that translates English to Urdu. It also facilitates Urdu to English translations, helping users understand words from Urdu to English. To know the meaning of an English word in Urdu, users can visit UrduPure.com to expand their knowledge and get accurate translations from English to Urdu.

Do you wish to become fluent in English while speaking Urdu? And you have taken the choice to enhance your English abilities, and we feel that this is the greatest decision that you have made since it will allow you to communicate more effectively. It is not difficult to become fluent in English. This software is the ideal one to use if you want to learn English while speaking in your native Urdu language. آؤ انگریزی سیکھیں Master the art of speaking English. Urdu speakers may acquire English-speaking skills. Urdu Words for English Vocabulary. Conversational translation of English phrases into Urdu. Also available in reverse. The English language is used all over the globe and is considered a global language. In many parts of the globe, including South Asia, people do speak English as a second language. This is also the case in South Asia. UrduPure was developed with the regional preferences of our community in mind. You got it correct; we have designed both this website and this app just for you so that you may learn English quickly while speaking Urdu. Install our app right now, and you can get started studying right away. This program makes use of proven techniques for teaching English, and as a result, you will be able to learn English in Urdu very rapidly with the usage of this free app. You are going to improve your Speaking, Understanding, Vocabulary, and Grammar skills.

Login

AI firms accused of scraping publisher sites without permission

AI firms accused of scraping publisher sites without permission

Categories

Try Out Our Learn English Speaking in Urdu Mobile App

About UrduPure

Quick Links

Contact Us

Follow us