Fixing crawl budget issues Chart of crawl budget issue
Posted: Wed Jan 29, 2025 5:38 am
So you can imagine on an e-comm site, imagine we've got a laptops page. We might be able to filter that by size. You have a 15-inch screen and 16 gigabytes of RAM. There might be a lot of different permutations there that could lead to a very large number of URLs when actually we've only got one page or one category as we think about it — the laptops page. Similarly, those could then be reordered to create other URLs that do the exact same thing but have to be separately crawled.
Similarly they might be sorted differently. There might be pagination and so on netherland gambling data and so forth. So you could have one category page generating a vast number of URLs. Search results pages A few other things that often come about are search results pages from an internal site search can often, especially if they're paginated, they can have a lot of different URLs generated. Listings pages Listings pages. If you allow users to upload their own listings or content, then that can over time build up to be an enormous number of URLs if you think about a job board or something like eBay and it probably has a huge number of pages.
solutions and whether they allow crawling, indexing, and PageRank. So what are some of the tools that you can use to address these issues and to get the most out of your crawl budget? So as a baseline, if we think about how a normal URL behaves with Googlebot, we say, yes, it can be crawled, yes, it can be indexed, and yes, it passes PageRank. So a URL like these, if I link to these somewhere on my site and then Google follows that link and indexes these pages, these probably still have the top nav and the site-wide navigation on them.
So the link actually that's passed through to these pages will be sort of recycled round. There will be some losses due to dilution when we're linking through so many different pages and so many different filters. But ultimately, we are recycling this. There's no sort of black hole loss of leaky PageRank. Robots.txt Now at the opposite extreme, the most extreme sort of solution to crawl budget you can employ is the robots.
Similarly they might be sorted differently. There might be pagination and so on netherland gambling data and so forth. So you could have one category page generating a vast number of URLs. Search results pages A few other things that often come about are search results pages from an internal site search can often, especially if they're paginated, they can have a lot of different URLs generated. Listings pages Listings pages. If you allow users to upload their own listings or content, then that can over time build up to be an enormous number of URLs if you think about a job board or something like eBay and it probably has a huge number of pages.
solutions and whether they allow crawling, indexing, and PageRank. So what are some of the tools that you can use to address these issues and to get the most out of your crawl budget? So as a baseline, if we think about how a normal URL behaves with Googlebot, we say, yes, it can be crawled, yes, it can be indexed, and yes, it passes PageRank. So a URL like these, if I link to these somewhere on my site and then Google follows that link and indexes these pages, these probably still have the top nav and the site-wide navigation on them.
So the link actually that's passed through to these pages will be sort of recycled round. There will be some losses due to dilution when we're linking through so many different pages and so many different filters. But ultimately, we are recycling this. There's no sort of black hole loss of leaky PageRank. Robots.txt Now at the opposite extreme, the most extreme sort of solution to crawl budget you can employ is the robots.