Send us your questions on Twitter with the hashtag AskGooglebot and your question might just be answered!
Google Search Central on Twitter → https://goo.gle/3f4Z0a8
Watch more episodes of AskGooglebot → https://goo.gle/2OjWcvS
Subscribe to the Google Search Central channel → https://goo.gle/SearchCentral
Googlebot is a web crawling software robot (also known as a spider or webcrawler) that gathers information from web pages used to populate Google’s search engine results pages (SERPs).
Googlebot collects documents from the Web to build Google’s search index. By constantly collecting documents, the software discovers new pages and updates existing ones. Googlebot uses a distributed design covering many computers, so it can grow along with the Web.
The web crawler uses algorithms to determine which sites to crawl, how fast to crawl and how many pages to retrieve. Googlebot starts with a list generated from previous sessions. This list is then supplemented by sitemaps supplied by webmasters. The software explores all linked elements in the web pages it scans, noting new sites, site updates and dead links. The information collected is used to update Google’s web index.
Googlebot creates an index within the limits set by webmasters in their robots.txt files. If, for example, a webmaster wishes to prevent certain pages from being accessed by Google, he can block Googlebot in a robots.txt file located in the site’s top-level folder. To prevent Googlebot from following any link on a given page of a site, it can include the nofollow meta tag; to prevent the robot from following individual links, the webmaster can add rel=”nofollow” to the links themselves.
Every few seconds, a webmaster can detect computer visits to the address google.com, using the Googlebot user agent. In general, Google tries to index as much of a site as possible without saturating the site’s bandwidth. If a webmaster finds that Googlebot is using too much bandwidth, he can set a rate on the Google Search Console home page, which will remain in effect for 90 days.
During a presentation at the SearchLove 2011 conference, Josh Giardino claimed that Googlebot was in fact the Chrome browser. This would mean that Googlebot not only has the ability to crawl text pages, as crawlers do, but can also execute scripts and media as web browsers do. This capability could enable Googlebot to find hidden information and perform other tasks not recognized by Google. Giardino went so far as to say that Googlebot could be the original reason why the company created Chrome.