Google Clarifies the “Google-Extended” Crawler Documentation

Last updated on

Google has recently refreshed its Google-Extended crawler documentation and introduced an additional clarification.

Recently, Google revised the documentation for its Google-Extended web crawler user agent, aligning with changes in product naming and providing clarity on its implications for search. This update may be of interest to those considering blocking the crawler. The revised documentation now provides clearer instructions on managing content access, particularly for AI model training purposes.

Google-Extended User Agent

Introduced on September 28, 2023, Google-Extended provides web publishers with a user agent to regulate the crawling of their websites. Publishers have the option to permit or prohibit the Google-Extended user agent’s access using the Robots Exclusion Protocol, giving them control over whether their content is scraped and utilized in AI training datasets.

While Google refers to Google-Extended as a “standalone product token,” this phrasing diverges from the standard terminology understood by publishers regarding User Agents.

The initial announcement outlined the purpose of the new user agent:

“Today, we introduce Google-Extended, offering web publishers a new control mechanism to manage whether their sites contribute to enhancing Bard and Vertex AI generative APIs, including forthcoming iterations of models powering these products.

Through the utilization of Google-Extended to regulate content access on a site, website administrators can decide whether to facilitate the refinement and advancement of these AI models over time.”

Blocking Google-Extended involves employing the “Google-Extended” User Agent:

User-agent: Google-Extended
Disallow: /

Google Changelog

Google maintains a changelog documenting significant updates related to guidance and communication with web publishers and the search marketing community. A recent entry in the changelog on Google’s developer pages signifies a modification to the Google-Extended documentation.

This revision follows the rebranding of Bard to Gemini Apps, clarifying that the indexing performed by Google-Extended now supports Gemini Apps and Vertex AI generative APIs. The updated language aims to reassure publishers that this change does not impact Google Search, thereby addressing potential concerns regarding the consequences of opting out of Google-Extended AI data collection.

What Changed?

Google’s changelog provides clarification that Google-Extended crawling is specific to Gemini Apps and does not influence Google Search.

The Changelog states:

“Updated the description of the Google-Extended product token
What: With the name change of Bard to Gemini Apps, we clarified that Gemini Apps is affected by Google-Extended, and, based on publisher feedback, we specified that Google-Extended doesn’t affect Google Search.”

The revised guidance eliminates the use of the Bard brand name, replacing it with Gemini. Additionally, the following sentence was added:

“Google-Extended does not impact a site’s inclusion or ranking in Google Search.”

Original news from SearchEngineJournal