Swarajya Logo
Swarajya Logo
Politics States Economy Society Business Culture Infra Defence World Books Ideas Science Technology Heritage Archives Legal Movies Sports
  • Our Views
    Politics States Economy Society Business Culture Infra Defence World Books Ideas Science Technology Heritage Archives Legal Movies Sports
  • Magazine
  • Store

About Swarajya

Swarajya is a publication by Kovai Media Private Limited.

Swarajya - a big tent for liberal right of centre discourse that reaches out, engages and caters to the new India.

editor@swarajyamag.com

Useful Links

  • About Us
  • Subscriptions Support
  • Editorial Philosophy
  • Press Kit
  • Privacy Policy
  • Terms of Use
  • Code of Conduct
  • Plagiarism Policy
  • Refund & Cancellation Policy

Useful Links

  • About Us
  • Subscriptions Support
  • Editorial Philosophy
  • Press Kit
  • Privacy Policy
  • Terms of Use
  • Code of Conduct
  • Plagiarism Policy
  • Refund & Cancellation Policy

Participate

  • Contact Us
  • Write for us
  • Style Guide
  • Jobs

Stay Connected

  • Artboard 2 Copy 6Created with Sketch.
    Artboard 2 Copy 10Created with Sketch.
    Artboard 2 Copy 7Created with Sketch.
    Artboard 2 Copy 9Created with Sketch.
  • Andriod Logo
  • IOS Logo

Technology

OpenAI Announces Its Web Crawler GPTBot, Tells You How To Block The Bot Collecting AI Training Data

Karan KambleWednesday, August 9, 2023 5:56 pm IST
OpenAI
OpenAI
OpenAI
  • GPTBot, OpenAI’s web crawler, will help improve the ChatGPT maker's AI models.
  • OpenAI has announced a web crawler called GPTBot, whose job will be to scour the internet for public data to improve artificial intelligence (AI) offerings, specifically the ChatGPT maker's large language models GPT-4 and potentially GPT-5.

    The name “web crawler” gives away what the function is — crawling the web.

    Web crawler (or spider) bots scan the web for content. “Their purpose is to index the content of websites all across the Internet so that those websites can appear in search engine results,” technology company Cloudflare explains.

    “Web pages crawled with the GPTBot user agent,” says OpenAI, “may potentially be used to improve future models.”

    GPTBot will, however, steer clear of “sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies.”

    “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI says.

    The AI research and deployment company has given publishers and website owners the option to either fully opt out of GPTBot's surveillance or allow partial access. Check here for how to do that.

    Although the option to opt out of web crawling by GPTBot is welcome and suggests a respect for privacy, it does put the onus of taking steps to disable access upon publishers and website owners.

    Instead, an opt-in feature, where one is asked for permission, would have been more respectful.

    Besides, the GPTBot has become known only now. It is unclear whether it, or any other such OpenAI web crawler, has already been collecting information and for how long — days, months, or years?

    OpenAI trains its machine learning models on public web data. This choice has led to questions of ethics and legality.

    For one, the aspect of consent for the reuse of information is absent. The source of information isn’t typically highlighted in an ordinary interaction with a chatbot powered by an AI model. A chatbot user also isn't redirected to the source, so the latter doesn't benefit.

    In this scenario, a source of information is forced to compete with a platform that rechannels that same information, while also acting as a one-stop shop for any other information necessary, clearly handing the latter the advantage.

    “Why would any producer of free online content let OpenAI scrape its material when that data will be used to train future LLMs that later compete with that creator by pulling users away from their site?” asks Alistair Barr, writing for Business Insider.

    In addition, some of the information on the web, for instance, is copyrighted.

    OpenAI’s free use of copyrighted material — text, images, sounds, videos, and what not — to improve their models and grow their revenue, therefore, becomes a contentious issue. It becomes grounds for copyright infringement.

    Comedian Sarah Silverman sued OpenAI for copyright infringement in July, and she is one among several authors who have taken objection legally.

    On the other hand, OpenAI and the Associated Press joined hands in July for the ChatGPT maker to license the New York-based news agency’s archive of news stories.

    Also Read: How India Is Using AI To Build The Internet For Local Languages

    Tags
    OpenAI
    ChatGPT
    Privacy and data security
    large language model
    GPTBot
    web crawler

    Comments ↓

    An Appeal...


     

    Dear Reader,

     

    As you are no doubt aware, Swarajya is a media product that is directly dependent on support from its readers in the form of subscriptions. We do not have the muscle and backing of a large media conglomerate nor are we playing for the large advertisement sweep-stake.

     

    Our business model is you and your subscription.  And in challenging times like these, we need your support now more than ever.

     

    We deliver over 10 - 15 high quality articles with expert insights and views. From 7AM in the morning to 10PM late night we operate to ensure you, the reader, get to see what is just right.

     

    Becoming a Patron or a subscriber for as little as Rs 1200/year is the best way you can support our efforts.

    Get Swarajya in your inbox.


    Magazine


    Swarajya Magazine Cover Image
    Merchandise

    Merchandise


      Politics

      Lok Sabha MP Danish Ali, Who Protested Against Mahua Moitra's Expulsion, Suspended By BSP For Indulging In 'Anti-Party Activities'

      Lok Sabha MP Danish Ali, Who Protested Against Mahua Moitra's Expulsion, Suspended By BSP For Indulging In 'Anti-Party Activities'

      Swarajya Staff
      2h

      MoS External Affairs Meenakshi Lekhi Denies Signing Lok Sabha Reply On Designating Hamas As Terror Organisation

      Swarajya Staff
      5h

      Hadiya, Woman At Centre Of Kerala Conversion Controversy, Says She Has Remarried

      Swarajya Staff
      6h

      Economy

      Enough Of Big Bang Reforms? No Major Announcements In Next Budget, "Wait Till July 2024", Says Nirmala Sitharaman

      Enough Of Big Bang Reforms? No Major Announcements In Next Budget, "Wait Till July 2024", Says Nirmala Sitharaman

      Nishtha Anushree
      2d

      ₹58,378 Crores! Govt Set To Open Taps On Additional Welfare Spending With 2024 In Mind

      Swarajya News Staff
      3d

      India Anticipated To Be Fastest Growing In Next Three Years, Set to Become Third Largest Economy By 2030: S&P

      Nishtha Anushree
      4d

      Defence

      On This Day, In 1971, Indian Navy Once Again Targeted Karachi While Airlift At Meghna Put India Ahead In The Race Towards Dhaka

      On This Day, In 1971, Indian Navy Once Again Targeted Karachi While Airlift At Meghna Put India Ahead In The Race Towards Dhaka

      Ujjwal Shrotryia
      4h

      On This Day, In 1971, Indian Military Keeps Up Pressure: With Intense Armour Clashes In Western Pakistan, While Second Naval Raid On Karachi Looms

      Ujjwal Shrotryia
      1d

      Indian Army Sets Sight On Future: Developing AI And Cybersecurity Tech; Can Now Interpret Enemy's Electronic Order Of Battle

      Swarajya Staff
      1d

      World

      After Becoming Longest-Serving President, Putin Announces Running For Election In Russia Again

      After Becoming Longest-Serving President, Putin Announces Running For Election In Russia Again

      Nishtha Anushree
      1d

      Maldives Skips Key Indian Ocean Security Meet With India And Other Neighbours, Opts For Chinese Forum Amidst Growing Tilt

      Swarajya Staff
      1d

      Azerbaijan President Accuses India And France Of Fuelling Armenia's Fire Of 'Revenge' For Loss In Nagorno-Karabakh War

      Swarajya Staff
      1d

      Culture

      From Jodhpur To Ayodhya On Bulls, Preserved Through Herbs: The Story Of Ghee To Be Used For Ram Lalla's First Aarti

      From Jodhpur To Ayodhya On Bulls, Preserved Through Herbs: The Story Of Ghee To Be Used For Ram Lalla's First Aarti

      Nishtha Anushree
      11h

      ‘Garba Of Gujarat’ Declared As Intangible Heritage By UNESCO

      Arun Kumar Das
      2d

      Gujarat's Garba Is Now In List Of UNESCO Intangible Cultural Heritage of Humanity

      Swarajya News Staff
      3d
      States

      infrastructure


      Arun Kumar Das
      1

      Indian Railways To Run 4,500 Vande Bharat Trains By 2047, Says Union Minister Scindia

      2 Mins Read
      Arun Kumar Das
      2

      Path To Prosperity: National Tourism Policy Paves The Way For Economic Boost

      1 Mins Read
      Amit Mishra
      3

      Gurugram: Haryana Government To Incorporate 'New Company' For Millennium City Metro Corridor

      2 Mins Read
      Amit Mishra
      4

      Ayodhya Airport To Be Ready By December End, Says Jyotiraditya Scindia

      2 Mins Read