“It’s Impossible To Crawl The Whole Web”


In response to the question why SEO tools don’t show all backlinks, Google’s search advocate John Mueller says that it is impossible to crawl the entire web.

This is stated in a comment on Reddit in a thread started by a frustrated SEO professional.

They ask why all the links pointing to a site are not being found by the SEO tools they are using.

What device the person is using is not important. As we learn from Müller, it’s not possible Any Tool to find 100% inbound link of website.

Why here?

There is no way to “properly” crawl the web

Mueller says there is no one right way to crawl the web because there are an infinite number of URLs.

One doesn’t have the resources to keep an endless amount of URLs in a database, so web crawlers try to determine what’s crawlable.

As Muller points out, this inevitably leads to URLs being crawled over and over again or not at all.

“There is no objective way to properly crawl the web.

It is theoretically impossible to crawl all of this, as the number of actual URLs is effectively infinite. Since no one can afford to have an infinite number of URLs in a database, all web crawlers do is guess, simplify, and guess what is actually worth crawling.

And even then, for practical purposes, you can’t crawl all that much all the time, the Internet doesn’t have enough connectivity and bandwidth for that, and it’s going to cost a lot of money if you want to access a lot of pages on a regular basis. costs (for the crawler, and for the site owner).

In the past, some pages change rapidly, others haven’t changed in 10 years – so crawlers try to save effort by focusing more on the pages they expect to change, not the pages they expect to change Don’t expect to change.

How web crawlers determine what is worth crawling

Mueller further explains how web crawlers, including search engines and SEO tools, figure out which URLs are worth crawling.

“And then, we touch on the part where crawlers try to figure out which pages are actually useful.

The web is full of junk that no one cares about, pages that are needlessly spammed. These pages may still change regularly, they may have proper URLs, but they are only destined for landfills, and any search engine that cares about their users will ignore them.

Sometimes it’s just not even obvious junk. More and more, the sites are technically fine, but don’t quite reach the “bar” from a quality standpoint to merit more crawling.

Web crawlers work with a limited set of URLs

Mueller concluded his response by saying that all web crawlers operate on a “simplified” set of URLs.

Since there is no one right way to crawl the web, as mentioned earlier, each SEO tool has its own way of deciding which URLs are crawlable.

So one tool can find backlinks that other tools can’t find.

“Therefore, all crawlers (including SEO tools) operate on a very simplified set of URLs, they have to figure out how often to crawl, which URLs to crawl more often, and which parts of the web.” There are no fixed rules for any of these, so each tool has to make its own decisions along the way. That’s why search engines index different content, why SEO tools have different links Let’s list why any of the metrics built on top of these are so different.


Source, reddit

Featured Image: rangizzz/Shutterstock





Source link

Leave a Comment