SEO Crawling Myths: Why Crawl Budget Isn’t Your Problem

Edward Sturm| 01:14:09|Apr 26, 2026
Chapters27
The speaker debunks the myth that crawling can be optimized as a single, magic fix, explaining crawling is a complex, multi-layer process driven by site authority and observation rather than a simple one-off event.

Crawl budget isn’t a lever you pull to rank higher; authority, internal linking, and smart content strategy matter far more than tinkering with sitemaps or crawl frequency.

Summary

Edward Sturm sits down with David Quaid to bust crawling myths that still float around SEO circles. Quaid argues crawling isn’t a single, optimizable event and that sitemap fixes are often overestimated, especially for sites with weak authority. He explains how Google uses page-level prioritization, discovery vs. high-priority crawling, and how authority (backlinks, CTR, pogo-sticking, and external links) drives how aggressively pages are crawled and indexed. The discussion covers why more crawling doesn’t automatically mean better indexing, how internal linking can pull new pages into faster crawls, and why pruning pages usually isn’t the solution for bigger sites. Quaid also tackles cannibalization, duplicate content, and the pitfalls of over-specializing content, offering practical guidance on when to merge, rework, or remove pages. Across the hour, the emphasis stays on realistic, authority-driven SEO rather than quick-fix sitemap hacks. The host and guest also reference John Mueller, Matt Cutts, and the evolving discourse around quality signals, backlinks, and the role of authority in modern search.

Key Takeaways

  • Authority governs crawl priority: pages with higher topical authority are crawled and indexed more frequently, while low-authority pages languish in lower pools.
  • Internal linking matters more than sitemap edits: linking from high-authority pages can pull new content into faster discovery and indexing.
  • Crawling is multi-staged, not a single event: Google uses discovery, QDF, and tiered refresh cycles, with different pages in different crawl pools.
  • Cannibalization isn’t a penalty; it’s a problem of two pages competing for the same keyword in the index, often fixable by removing or reworking the weaker page or changing slug/target keywords.
  • Pruning is rarely the right first move for large sites: focus on building and redistributing authority through smarter content architecture and hub pages.
  • Duplicate content can cannibalize visibility when two pages map to the same slug or keyword; evaluate slug differentiation and canonical signaling rather than broad deletions.
  • HTML sitemaps can still pass authority and sometimes help more than XML sitemaps for discoverability; consider including a footer site map on large sites.

Who Is This For?

Essential viewing for SEO professionals managing large sites or websites with thousands of pages, who struggle with crawl issues, cannibalization, and authority distribution. The episode helps demystify crawl budget and reframes optimization around authority and content strategy rather than quick sitemap tweaks.

Notable Quotes

"“Can you optimize crawling? Not really.”"
Quaid starts by debunking the idea that crawling can be directly optimized like a resource allocation problem.
"“Authority is just a count of click-through rate, pogo sticking effect, clicks on a topic and links from other pages.”"
Definition of authority and why it matters for crawl and index decisions.
"“Red flags on crawl budget aren’t solved by sitemap fixes.”"
Sitemaps are rarely the cure-all for indexing issues in practice.
"“If you have no topical authority, you may not even get your sitemap looked at.”"
Highlights the importance of authority over mechanical sitemap submission.
"“Cannibalization isn’t a penalty. It’s a problem of two pages competing for the same keyword in the index.”"
Clear explanation of why duplicate content topics matter and how to fix.

Questions This Video Answers

  • How does Google actually prioritize crawling across a large site with thousands of pages?
  • Is crawl budget a real ranking factor or mostly a myth for most websites?
  • What is page-level authority and how can I increase it for new pages?
  • How can I identify and fix cannibalization without harming overall traffic?
  • Do HTML sitemaps still help SEO, and how should I implement them effectively?
SEO CrawlingCrawl Budget MythPage AuthorityInternal LinkingCannibalizationDuplicate ContentXML SitemapHTML SitemapJohn MuellerMatt Cutts
Full Transcript
We're doing SEO crawling myths with the one and only David Quaid. Hey David, I have a question for you. Can I ask Can I ask this question, David? Yeah. See, David, can you optimize crawling? Not really. No. Um, it's just one of these old myths I think that that persists and um, it's probably the most discussed topic I've had online this week and and last week and I think it needs to be tackled, right? I think there's a lot of content from Google and there's a lot of content off Google that says that you can, right? There's a lot of content from Google says you can't. And I think it it comes from a misunderstanding and an oversimplification of what crawling is. And it and it goes back to the early days of you know this is what a spider is and the spider tries to understand your website and it's such a naive childish video right and from that it spawned all these things right like people think that crawling is a single process event so you put up a site map it looks at all of your URLs it goes after them you also have the ageel problem of um you know point of observation you have people Like if if you're going to be a person in a web team looking after a website, you're at a big company, right? You're after taxes, that's like half a million dollars. If you've got three web engineers looking on your website, that's not a mom and pop shop. That's a big company, right? So, if you're at a big company and they've got lots and lots of authority and you fix, you realize, hey, the site map's been broken for years, none of the URLs are accurate, you put up a new site map, Google goes off, does a lot of crawling, that's a great fix and that's a great solution and it makes sense that you want to share that with people, right? But it's a very unique situation. First of all, it you have to assume that everyone else's is broken. And when I see people asking questions like I can't get crawled, I can't get indexed or I'm at crawled not indexed. covered craw index before. People are like, "Oh, did you do do you have a sitemap?" That's like 30 to 36% of the replies, right? And then it's like maybe your sitemap's broken and then it's maybe like this and it's kind of like look the status is crawled, not indexed, not we couldn't find the damn thing, right? So, it's it's found, it's been retrieved, which means there's no technical impediment. Um, I saw two people sparring on Reddit the other day and it went on and on until the one person said like, "I I think you're trolling me, right?" Um, because it was like, "Oh, but it could be this, it could be that." And it's just like, you're just jumping from gap to gap. Um, if it's crawled and you can see the page, there's no tech technical impediment. Uh, so stop stop saying it and stop thinking that site maps will fix it. Stop thinking you can optimize sitemaps uh uh or crawling events. Now, if you've got a 10 million page site, first of all, you're going to struggle to get that indexed. But the third problem in all of this is the absence of authority. And I did see another debate where someone was trying to talk objectively and then they just lost it and they gave up. They're like, there's no such thing as this magical authority. And I'm like, it's not magical. And without it, you would need magic, right? Authority is just a count of click-through rate, uh, pogo sticking effect, uh, clicks on a topic and links from other pages and links from external pages. And I think people always see that advice in a negative view and say like, I'm always pro backlinks and I'm really not pro backlinks and I'm not ultra wide. I'm just really somewhere in the middle of like this is how Google works, right? And I know a lot of people don't like Pagrank and when when LLM search came out, I think a lot of people who hated Google and hated PageRank and hated playing that game thought this was their savior, right? I've seen, for example, a lot of CMOs go on ads for products they shouldn't be advertising that have lost huge amounts of traffic due to scaled AI. And it's like, you know what, it's not that difficult to figure this out. Any competent SEO with a refs or semrush could have figured this out for you in 10 seconds. probably wouldn't even have charged you. And so, you can't get out of this technical appreciation of Google, right? Um, it's not just all good content and, you know, catchy headlines. It just doesn't it's just not like that for everyone. It may be like that if you're writing for Microsoft, right? I can't imagine you can write something for Microsoft's website and it will fail in SEO, right? And so one of the the So you can or can imagine I can't imagine Microsoft writing a blog post on anything that wouldn't rank, right? Because their their websites are so broad, right? They have so many subdom they've just got an excellent SEO strategy, right? And of course they're Microsoft. They they were like they were the number one accessed website when I was a software engineer. They're one of the most clicked and that's because they websites in the world. they're they're they're I would say they're definitely a nearest seed in page rank NS, right? Um the other thing I think people see is that they get assigned a spider and it it kind of doesn't work with how how the web is PR. So Matt Cuts did a very very good video and I can't find it anymore, but it it basically came out with caffeine. Caffeine was where Google looked at the problem of how do we crawl the whole web, right? um how do we manage all of these bit you know hundreds of millions of pages if not billions back then this is like 14 15 years ago and so they realized look the most important thing to crawl frequently is news articles and that's another myth inside this the the sort of like crawl optimization thing is that people think more crawling equals better indexing outcomes and I don't know why you right these are like software systems it's not like I think people still have this and they and we still see this language, right? Like Google's assessing you, right? And sometimes I you've seen me on X and I I make jokes like the website content appreciation committee is meeting in the Danny Ponzi library, you know, $10 cover charge, but you get $100 if you bring 10 people cuz it it's the Ponzi library. But it's it's it's so uh we have to get away from this talk, right, that Google's assessing you and evaluating you and you're you're just bouncing up and down. It's not an evaluation cycle. It's just the way the numbers fall into place, right? Like they do in poker or whatever. This method of marketing is so effective, I had to make sure it wasn't against Google's rules before I kept doing it. It's a form of SEO I call compact keywords. Whereas most SEO focuses on putting up articles to answer questions, how, what, when. Compact keywords focuses on putting up dozens of pages that sell to searchers who are actually looking to buy. These pages rank on Google and convert so much better than normal that when I discovered this years ago, I couldn't believe this was allowed. It's less work, too. The average Compact Keywords page is only 415 words. Compact Keywords is a 13-hour deep course on getting sales with SEO. A customer recently said, "Each lesson is dense with information. You're giving years worth of experience boiled down into 15 to 30 minute lessons with no filler or fluff. I feel like I'm gaining a new superpower. Compact Keywords is about setting up an SEO funnel that brings you sales for years and years and years. It works with AI. It's less work than traditional SEO and it makes way more money. You can get it now at compactkeywords.com. Back to the podcast. And so you don't get assigned a crawler. Crawlers crawlers work in two function two modes. First of all, there's a there's a discovery mode, which means like if you've got a site map like CNN, like a news site map. Let's say they've got a single news feed for just like US politics. That crawler will crawl that site map every second just to see if it updates. And if it updates, it'll grab the URL and then put that URL in a high crawl list. It doesn't go after it, right? Bots that are in a discover mode go and find new URLs and put them in the crawl list. And if that URL doesn't meet the topical authority, it gets dep prioritized. So there's a a separate system that that handles those incoming URLs. So the way Google triaged the web is to say, look, news, discover, QDF stuff. We're going to high authority sites, we're going to we're going to crawl this and every hour the whole worldwide web will be refreshed, which is pretty bold, right? And you need lots of data centers all around the world to do that. And then they said next level tier one stuff will we'll get to this in like 12 hours and then everything else smaller. And so what they do is every page crawling is at a page level not a domain level. And you can test this right? Don't take my word for it. Log into search console look at your performance report. Look at your pages top to bottom. Highest number of clicks. Inspect them and you will see the indexing date go down. Now the indexing date will differ if like you updated the page or not. it may or may not require. So there's a bunch of tests done when a page comes in. Is this important enough to index? Did the file change? Did the structure change? If it's very high authority, it doesn't have to pass the other ones, right? So sometimes you might update your your fave icon or you might update the page title and you're you're just not enough authority. It's just not going to update. Um but if you look at the pages with no impressions, no clicks, they'll have the least frequent index date. And so the way it throttles is the pool that you're in has a higher ratio of crawler bots to pages. So if it's like a one to two, you're getting crawled every second turn, right? If you're in one in 100,000, you know, maybe the 29th of May, you'll that page will get crawled and that's how they do it. And so there are some broken um philosophies and I I see a lot of SEOs who who don't like Google quoting John Mueller saying, you know, or say if you quote John Mueller, you're not a good SEO. And I don't really understand that, right? I'd much rather get my advice from John Mueller than purple pineapple 2823. Right. Yeah. Um, and so I if if you're so one of the thoughts is that if you've got like 100,000 pages and a thousand of those are pageionated pages that if you remove them, you're going to reduce the number of pages and increase your crawl budget. You're not. Your pages that are getting clicks will always be in a high priority queue. They will always be crawled. And that's why internal linking is so important because if we go back to our last video and you put a a link to a new page in there, you're going to pull it out of that low priority queue and and pull it up, right? It's going to get discovered faster. That doesn't mean it moves. Um but it if if it gets authority, it'll probably move to a higher pool, all the better. Um, and so I I think um and and then when you look at the lower pages that just are getting called indexed, that's because of authority. It um if if those pages are getting crawled and not indexed or they're getting indexed and not ranking, you don't have authority. And it comes back to um one of my favorite indexes is urgent care, which is like 450 million pages. What's the point of adding more pages to this index if they have no authority, right? And so I I also think Google are partly to blame for the situation we're in, right? They've removed like most of the videos that Matt Cuts did about authority and backlink are all gone, right? Unless they're hosted on an agency account. They the sitemap pages and crawling pages all talk about quality. And we know that their yard stick for quality is unfortunately backlinks or internal links. And again, I'm not doing this because I want to sell backlinks. I don't sell backlinks. I don't buy backlinks. Um, a lot of people do and I think it creates an unfair market situation. It's not my fault. It's not my problem. Well, it is my problem. I've got to deal with it like other SEOs, Yeah. But also to be fair, lots of people who buy backlinks mess it up. A lot of people who buy it mess it up. And again, I think where people like you and I and the people that we talk to, the the Harper Gagen goes of this world, all of the people in our network, Sean Anderson and stuff. um and and other SEOs as well like like Joy um Lay wonderful SEOs the the the our heart is in the small businesses right we're all business owners right we're all small business owners and our heart is like with the business owners that have to compete in this area with all of this competing information and disinformation um I shared a link with you a screenshot earlier with you yesterday and today of like all the people have reached out on X and LinkedIn to us saying you know Edward's done such an amazing job love the work you and Edward do um and it's great to see and and that it's clear that people need to hear a solid reliable source of information and I think Google's been a little absent in that right um by removing authority they've created a vacuum that has allowed schema and llm.txt TXT and and also it makes a joke of them because if you look at um other SEOs that are also incredibly capable SEOs, they're saying like how can you listen to Google and take them seriously when a lot of what they talk about isn't really accurate. And that's not an that's not a wrong statement, right? Maybe maybe Google wants that. Google doesn't like I kind of do you do you ever wonder if uh Google doesn't want you to know how to how to do well on Google with organic absolutely I mean that if you look at the SEO starter guide it says things we don't want you to focus on we don't want you to focus on eat it's not a ranking single and while page rank is fundamental right in the dictionary definition is like cannot exist without that's what fundamental means we don't want you to to obsess with it right in other words don't want you to go buy back links. That is the most contrarian statement, right? Because you can't deny it's not fundamental, right? But there they are trying to downplay it. So yeah, you're 100% correct. They're absolutely trying to throw the cat amongst because Matt Cuts, I've been sharing him so much on this show recently. Matt Cuts was so helpful to so many SEOs. Um, do you think the reason there's not another Matt Cuts from Google is intentional or just kind of like an accident? accidentally intentional and I'll I'll explain why. Um John Mueller comes from a from the web industry, right? Matt Cuts was an early founder and builder of algorithms inside Google. What I loved about Mac cuts is when I was an engineer in Dell, the senior engineers that built Dell, not a lot of people will know this, but when we built Dell in the 1990s, it was built Dell wrote its own version of Unix called Dell Unix release 5. And they they tested their systems up against um SAP and things like that. And these guys built this MacGyver like system over in in midnight, right? And it was so fast. It ran on like Neville servers, Unix machines and they built it themselves and their the way their passion for like file level engineering was exactly the same as Matt Cuts. And Matt Cuts talked about this the actual engineering and that's why I think we all like him. The problem is that the people that now talk about are given limitations and specific things to attack and it's not intentional. I think Google wants to engage with the SEO community. It used to be a lot more wishy-washy, right? Um, you know, there was no web master tools for a long time. Uh, it SEO didn't exist. Every now and then like Google would change and pretend that Google that SEO wasn't even a thing. And there's no there doesn't seem to be a happy medium for them, right? they seem to go all in on telling you how it works and then backing off, right? Maybe partly because they can't deal with link farms and and and and PBNs because they're just engineered so well, right? In other words, people do such a good job of engineering link farms that they mirror real sites so much that there's no is very difficult to to determine except by looking at weird heristics like strange unnatural link patterns, right? Competent black hats. competent black hats do such a good job. Yeah. Yeah. Absolutely. And and I'm I'm sure a lot of people who do that don't even consider themselves black hat, right? Like they're not trying. A lot of people that that I hear talking about it's like I I need my business to exist. It's an existential failure. So I think maybe that's why Google doesn't penalize as much. Yeah. But you don't need a PBN for your business to exist. No, but I mean to buy the backlinks from link farms, right? That that's why people engage in link buying. I think they feel they have to because they have no choice that it's like I either go bust or I get caught in 3 years and I get busted. They just don't realize that they don't have to play that game. I think they feel they have to. Like um No, but they when they feel they have to, but they don't realize they don't have to. Yeah, they don't realize that they don't have to do it. And to be fair also, I just want to say like I when John Mueller when John Mueller responds to questions, like I love it and I share I share it all the time. Yeah, John Mueller is super active in the SEO subreddit, which is awesome. He's always giving useful responses, his responses on social media. Yes, I always see Barry whenever he does I I see Barry uh write about it and I whenever I see Barry's tweet, I go and look and and it's always inspirational. But I just want to correct myself because I don't want to say to other people, be aware of your own point of observation. I have to be aware of my own point of observation, too. I'm not exactly in a high um e-commerce space, right, where there's only three sites that are collecting 90% of the clicks. I can't really tell people that they do or don't have to, right? It's not I don't think it's fair of me to say that. I don't want to I'm not here to judge people, right? No, no, no. And with more competitive niches like I gaming, you know, it's you're going to you're going to skirt the rules a bit. Absolutely. I'm not I'm not making up I'm not making excuses for it, but I also don't want to be here like a but the thing is but the thing is most most business owners are not in these crazy they think that they are that's the thing they think that the searches are so competitive. You see that I see that all the time too like someone will someone will message me and they'll be like do you think I could do SEO for like a keyword like this? I imagine it's really competitive and they just don't really know how to evaluate competitiveness. It it it's interesting. I found myself in a conversation this week with people that are very very competent SEOs, right? I have a lot of experience in SEO. Um, and while you're trying to hypothesize and and this is a very big website that we're working on. It's a a very interesting project. Um, and we found ourselves talking about the fact that um a page had something that made the company look alive. Like it was something to do with like um it was advertising jobs and other jobs were closed and maybe that's why the page was down. And I was like, "Okay, look, we're in tin foil hat territory here, right? Um, you have it just doesn't make sense, right? It means that companies that aren't advertising jobs then don't have signs of life, right? And it and it comes back to like this is how Google thinks and the evaluation committee and and I think when you're going down that street, you've got to stop and just turn back and start looking for rational answers. Aams razor is a good friend." Oh, you know, the simplest answer. The simplest answer. Yeah. Yeah. Um, it's the best answer. Uh can you can you optimize crawling with authority? Absolutely. Yeah, for for sure. Um and that's again why your links and your link control in pages is so important. Um reducing pages on your site isn't going to help. And if you made the argument even if a hundred other 100 million other sites did the same thing, yeah, maybe those pools would get smaller, right? But realistically, that's not going to change things. And also Google removed the last web master control which was um throttling your crawlers, right? Um just building links to large sites doesn't make pages get indexed. But yeah, you can 100% optimize with authority. It it it is authority driven and and I think we have to be clear on that. Um that's not magic. That's not us pushing uh page rank algorithm over content quality. It it's just reality. The pages with the most clicks get crawled the most. And if your pages don't have authority and you put links in them to each other, and I saw someone sharing a page that said like, oh, if you if you have your pillar pages and they link to this page and they link to that page and they link to this page, it sends all these signals. Stop inventing signals to create pillars to support your ideas, right? That's not how it works. You're you're not doing anything magical. These spiders are they're the DHL of the internet, right? They go off and fetch pages or they go off and fetch uh tracking documents and give you the status of the tracking documents. That's all they do. Um just because a page is crawled doesn't mean it gets indexed. And if it gets indexed, it's indexed. It doesn't have to be indexed 100 times. You're not making an appeal. It's not like the indexer doesn't trust you, right? The these algorithms have to process pages in milliseconds, right? And they you go through the entire algorithm many milliseconds. Maybe you don't go through the spam ones, right? Like you're not going to get caught for scaled content in the same algorithm. That's going to come down the road days later once you're finished and you've hit a certain huristic, then you'll get nailed. But with these with with page level optimization, there's no assessment. It's not waiting. It's it's not trying to understand you. If your page is Hyundai brake pads for a 1998 Elantra, right? Guess what? Your page is for brake pads for 1998 Elantre. There's nothing confusing about it, right? It it would be not confusing. It would just be less informative if your page is called about us or services. That's just stupid and that's just a waste of architecture. Um and I think the other thing is how is the word architecture and technical SEO, right? Technical SEO to me used to mean um how do you design a big site like Indeed uh or Amazon? And it it's it's come to be like fixing every error, right? And if we fix all our errors, we'll get a gold card. And when you look at how like at Kemp, we were like what a 250 person company and we had like 10 people in web engineering and SEO and paid search because we were 99% reliant on Google. And so when you look at an organization like that, the web team is only fixing the tickets, right? And as like someone who's in charge of that part of marketing, when we have our marketing get together, I want the web team to know they've done a good job because they've got multiple stakeholders. They've got the sales team, the support team, the product team, product management, corporate marketing, partner marketing, field marketing. And so I want to show them that when they fixed our web tickets like internal links or put up a new site map or put up a new section, our traffic went up. Um, and I think some web engineers walk away with that thinking they are the SEO and publishing hygiene is restorative, right? So, in other words, if you go and buy a second Mercedes and you're driving around and everything's going well and then you notice your fuel economy is going down and you're it's not accelerating like you used to and it splutters and you go to a mechanic and he change goes your spark plugs award and he changes the spark plugs, you're not AMG, right? you're just you're just fixing your Mercedes, right? And so when people again this the second most common response when people are like you know my page isn't being crawled or I'm not ranking or I don't get SEO people like look after your tech stack. Uh this is just nonsense right and and there are if you search for it there are blog posts that will literally tell you like Google assesses your tech stack. This is third party information. This is someone's opinion. This is subjectivity. And this is why we're we seem to be so bad at understanding objectivity. And this is why I think denouncing John Mueller and the the Google search team is not good because they are firsthand sources. And if they're getting it wrong or we're interpreting it wrong or they're opening to interpretation, that's fine. But I don't think we can just dismiss it. I don't want the only people to have a voice to be the people who are saying Google has a tech stack, you know. So if you're using WordPress, it really likes it. If you're using Wix, it doesn't like it. you know it it's not like that. There's no there are no codes in the HTML. It doesn't process the whole HTML document. It looks for certain things in it. Different algorithms, different parses do different things with that, right? Like it might take your page description. It might take the OG description and use that description. It might not. It might write its own. They might use AI to write a new um snippet for you. There there are lots of things that they can do. Um, but this idea that looking after your tech stack, um, I think it's important. Big websites, it's important. I'm not undermining the value of it. What I'm saying is if you take something that's optimized and it breaks and you fix it, you're it's restorative. And at the same time, if somebody's just built a new website and it's a 10page Wix website, it's probably not broken. It's probably not the site map. And again, the sitemap is not a to-do list. Google doesn't follow it from A to Zed. If you're very high highly authoritative, yes, absolutely. You you put a page in there, it will go and fetch it and index it 100%. If you have no topical authority, it may not even come back to your site map. You can delete your sitemap, you can publish a new sitemap, you can put urgent in your site map, you can change the date, time in your sitemap, it doesn't matter. It's going to look at the URL and go, I just don't care, right? Because it's in a pool. And it's not because Google doesn't love you or you built a terrible website. It's just a lack of authority. How do we how how do how do you use this um this authority and crawling discussion to think about uh for for the websites that out there that have like tons of pages to think about content pruning. Um, I think look at the root cause first of all. If a lot of sites I see the slugs are ridiculously long. Um, or the slugs are off center. For example, the page is about kettles and the blog post is about a company retreat. The blog post on the company retreat is not going to do anything, right? So, keep it on the blog or put in LinkedIn. Pruning it isn't going to change. It doesn't dilute your authority. You can't dilute authority like that. You dilute authority by putting too many links in a page or by linking from a page with no authority. So pruning, you know, it doesn't deliver much and and and having a thousand pages or 100 pages on a site doesn't necessarily add that much overhead. And if it does, then you've got other issues, right? But I can't like um don't worry about trying to get a gold star in Google Search Console. The the the reason Google Search Console monitors your site and gives you feedback is it's the largest web crawler and they assume that not everybody they're trying to cater for everyone that look we're calling these pages. If we can't access it, we should tell you that we can't access it. Um and they don't differentiate between page types, which is vital. So they if if they don't pick up that your parameters are broken or that it's UTMs, which I think is silly, right? I think it's a I think it speaks to the age of the Google systems, right, that they can't fix those things because it means rebuilding too much. And so the fact that those errors persist, cannibalization and duplicate content persist shows the ages of age and the basicness of the systems, which Gary Alles talks about a lot. So, if you're worried about, you know, these all of these pages and it choking up your system, forget about it. If you if you if you think it's important to Google, they're just giving you a warning saying, "Look, we can't access this because of a 5x error or 4x error. If it's crawled, not discovered, it could be a rendering issue. It's unlikely. It's more likely to be it's off topic, it's not linked to from an authorative page, you don't have enough topical authority." Um, if you look at the heristics of people who present with this problem, they are all brand new sites with no links. And while it could be this and it could be that to all the people who keep wanting to post, I don't know if you think you're being helpful, I don't know if you think you're looking smart, I don't know if you're fishing for business, but you're just prolonging the solution for the person, right? If they have no backlinks or if they aren't publishing pages that are getting clicks. So, for example, somebody just starts a website and they want to rank number one for furniture. That's not going to happen. And crawl optimization is not going to fix it and pruning is not going to fix it. For big big websites, it's much more important to figure out how to stretch your authority. We spoke about this before about links dying 85% per link. So, if you've got a 100,000, a million, a 5 million page site, you've got tiers and tiers and tiers of pages. Not all of those pages are going to rank. If they don't rank, authority is dying and they become end points. And so you need a marketing brain, not a a a a web engineering brain. You need a marketing brain to figure out how can I land traffic on this page and reinvigorate it and and and apply power back into the grid. Think of your link network as a grid. So if you're if you think the best way to light your house is to link every room in parallel, that's not an efficient use of energy, right? Were there any um were there any interesting g I saw that actually I think it was um on the SEO subreddit content pruning from our episode with Lars Lofrin and Gagen Gotra the HCU episode that was actually being discussed. Do do you remember that? Yeah. Yes. I it was a great conversation and um I I saw people sharing it on uh on X last week. Yeah. Um, I I'm I I'm still not certain like if if nothing is getting indexed, I don't think removing pages will help with indexing unless you were it was a a volume targeting vector that Google used. Um, also I've seen people claim that like there's been hundreds of thousands of recoveries, which I also find very hard to believe. I'm I'm sure black hats have, but um I I also think that the only way to recover a HCU domain is to move domain and and and and I wouldn't say moving 10,000 pages or is a good first starter, right? That that's sounds like scale. Maybe instead of content pruning, people need to think about how do they organize their sites effectively so that they can build authority to different hub pages for for then page for subpages so that you can have authority flow to all of your page as well. You make such a good point. Thanks for for for bringing that up. So actually one of the best videos you did was like how you jump from position 50 to one by republishing your slug, right? I think that's vital. It always comes down to these basic principles and when you try to over complicate it and try and go like oh wo is me I have a special scenario I have this URL structure a special scenario yeah you're you're being dealt with like every other page you're get you're getting the same crawler or or there's so many crawlers they're getting rotated you're not getting people are so certain about it too which is really crazy you like you hear like the certainty I know that I have a 100% different scenario from everybody else know You don't. No. I think a lot of people that think they got hit by HCU or a lot of people that got hit maybe got an authority loss, right? Um for example, if you if if Google zero is your backlinks, I think a lot of people think like, oh, I should just get to rank and it's like, no, no, no. You've lost that power station in your network, your authority is down. So, if you had pages targeting very high value keywords only, right? Like very short slugs like for/hyundai, right? You're not going to rank for that anymore. So, if you think that by fixing your 404s you're just going to magically bounce back, that's a very naive thought or it's a very lazy thought. You're going to have to republish pages like you were talking about, right? You're going to have to say, "Look, this slug isn't going to work in a in a lower authority world. You're you're effectively going down the cornerstoneing, finding your watermark, and going back up." And I think that's a lot of work and that's what people overlook. And I think it's easier to try to look for a a skate, you know, a a a special switch kill switch that just turns it on magically. Um maybe it's buying hundreds of backlinks. I don't know. You know, it's not my solution. But yeah, if if your if your site has lost authority, the pages that were ranking in highly competitive niches aren't bouncing back. You don't own it. You don't have a right to it. Um and I think that's part of the problem with the good content thing, right? and and people say stop writing content for um for bots. I don't think people do that. I think maybe people are lazy. Maybe people write bad content and don't but I don't think people actually write for bots. I think people one of the interesting things is there's no subreddit, there's no ex community where people post up their designs or content for others to evaluate. And I I think there's a simple reason why is because everyone knows that every other designer is going to be out for blood, right? They're going to they're going to destroy your design. Go like he put it he put about us on the top right. That's so stupid. When we did our when we tested the last 17,000 brands and it's it becomes an ego thing and everyone knows that because it's subjective. So, if you write an article and you ask like a hundred other people that think they're competing with you to review it, they're they're all going to find a problem with it. And so, there is no good content. So, don't assume that people's content is bad. Um, if if I put that content on my site, it'll rank. I promise you. So, about I was going to say a lot of people also create like um like I I've I've had discussions with people. They say like, "Oh, it's thin content." I'm like, "No, I'm pretty sure it's not thin content. We we we had the fun with with the fun influencers page, right? That's not thick content." Um, is David Quaid always right? That's not very thick content. Um, people also ask content. That's not very thick content. And then people go like, "No, no, no, but there's circumstances." Yeah, you can overcome it with authority. I'm like, "That's because it is authority, right? It's not like it." And then they they're like, "Oh, maybe it's it it doesn't have enough information gain." What's the information gain in what is an a fun influencer, right? Um it it's it's it's kind of like the god of the gaps argument when agnostics and and um theists debate which I always think is a good way to learn about argument structure debating structure logical arguments. Uh what about um how do you think about duplicate content in all of this? So duplicate content really the problem is the the document name and the emphasis that Google puts in the document name. And if two documents so for example let's say you have like uh mysight.com/hyundai brakepads and then you have parts/hyundai brakepads those two pages are 100% going to cannibalize. Um, and if you spell or let's say it's tires and you have the European or the UK version of tires with a Y and you have the US betting with an I, they're synonymous or semantic. They're going to cannibalize. The problem with duplicate content is that the two pages block each other after the results. It's not an algorithm issue. Um, Ryan Jones is adamant that uh, cannibalization cannibalization isn't real. I'm very adamant it is because 30% of the projects I work on are just decanibilization projects and it's because Google isn't actually that good at semantic understanding. It's because it confuses it. Um where we have pages where Google thinks that there's an implied word and even though that page doesn't have that word, Google is deciding that it applies to another index where that word exists and there's already a page in that index. It's not an algorithm. It's not a penalty when the So when when does it become a problem? When does duplicate content become a problem for sites? It only becomes a problem when when Google can't tell which page to show and they block each other and that map cuts video is still there. It's called how does Google deal with duplicate content and it says we just don't care. 35% of the content we index every day is duplicative. It's not an issue. We're not trying to save money. They block each other. So that's what I was talking about. Um, if you've got two pages in the same index, the indexer is not the tool that delivers the final results. If you think about the SER results page, typically each domain gets one result on the same page and maybe it gets an indented result where you've got uh cannibalization. The two pages are in the same page of results and the parser parser that prepares the document removes the page. So there's two parts. is one that tailors it for like Google's requirements and then there's a second parser that builds the SER results that tailors it for you as the user. Between the two, the one picks the one page and the other picks the other page and in the final results, neither are shown. So they're both ranking. They both show up in search console but neither makes it to the user. And that is that problem in Google engineering is as old as Google and it hasn't resolved and that's why there's a duplicate content issue. A lot of people think that, especially, funny enough, European SEOs, right, who don't understand American engineering. Um, there's a famous quote from a World War II general, America doesn't solve problems, it overwhelms them, which is an interesting take. Um, Europeans, on the other hand, are all about saving cost. Like if you take your key card out of the door, your hotel, your lights turn off. In America, the air condition is running 24/7, so it's always cool when you come in and it saves energy that way. And so when you look at the the different a lot of people say, "Oh, you can't have duplicate content with thin content because like Google wants a return on investment or it needs to save money or it doesn't." There's no sign that Google wants to save money. And I think I've said this before, an hourong 4K video would store like 11 billion HTML documents, right? Crawling is not a cost-saving error for Google. It's like one of the few areas they spend money, right? they need to index the whole worldwide web. That is their value to society, to their user, to their shareholders. So, um, duplicate content is not an issue. Cannibalization is an issue where you have two pages that at a slug level target the same keyword. And sometimes that's synonyization. Sometimes that's topical authority lending keywords to pages. Can you explain by at a slug level? Um, so would a subfolder change that or would adding a word to one of the slugs change that? It depends. For example, words like best, top are um adjective. They're not the So best car parts and car parts are the same slug sometimes, right? So I have an example here. So you have mysight.com/tires for school buses and then mysight.com/schoolbuses tires for school buses and the pages are mostly similar but there are some differences between them on a mysight.com has a lot of authority a lot of topical authority these are fine pages but the result is both pages do not get shown in the in the search engine results So yeah, the the the the subfolder isn't going to change anything at all. Um, and the problem is that Google will test between two pages. It'll automatically test between two pages. And that's probably why they can't resolve the engineering problem behind cannibalization. And so if you had um school dash buses dash Ohio and school bus dash Ohio, it's the same slug. And if you had bus for school kids- Ohio, it's the same slug. And they'll end up in the same index together. They'll end up in lots of indexes, right? They'll end up in an Ohio index back at page like 10,1. Um and then also you you have an issue where um for example just to be clear the result of that is that neither of the pages will show. Could be could be right it it could be that the one page has a longer history right so if one page has been alive for 11 years the other page will just never get a chance to interrupt it because it has clickthrough history. It's where the two pages where the second page can come in. And the other case is that's a very explicit example. the thing that catches people up. Um, and I'm just trying to think of an example that's like sort of like broad and random. I'm going to pick um data centers, right? So, let's say like you have server data centers, white labelled data centers, um, um, failover data centers. And then you might call them like server centers and then data service centers. And then you might also have another search phrase for like um storage data centers. The problem is that if Google thinks data and storage are synonymous, both pages will each enter each other's index and then that's where the where the problem starts. So it doesn't have to be like school-bus Iowa or school bus- Iowa. It could be that data and storage are synonymous or NAS and storage become synonymous or network and storage become anonymous synonymous and then um the end result parser doesn't show the page so it can't get clicks but it starts affecting its clickthrough rating and or one page will drop because it didn't have a positive click rating the other page will start to rank and then Google will automatically start a new test auction a week later and the cannibalization starts all What would you what would you say to websites that are like, "Oh my gosh, like I have a very specific niche. I'm always writing about the same thing. Uh though I write about it in different ways. Does this mean that I can't keep writing about the same thing because like now I'm going to cannibalize all my stuff? Like how should I go about doing this?" I think you need to figure out where whether the word is required in the slug or implied in the slug, which I know that sounds weird, right? But for example, if you have a lot of topical authority around data centers, you may not have to have the word data center. You might be able to call the page server storage and data center is implied. So you have to figure that out first, right? Then you have to see if you're already ranking for the phrase, right? So for example, let's say you wrote about network storage and now uh your product managers come up and said, "Look, we need to create a page for NAS storage. We've now got a hard disk unit that plugs in as a USB drive. to straight to the network with a network point. Don't assume that you don't already rank for it. Um, and if you're ranking on page three and think a specialized page might do better, just be careful because if you're already ranking, you're now adding another page to that index. You're in introducing a new chance of of cannibalization. So, and that's that's often where cannibalization comes from where people hyper specialize. So um yeah you're you're if if if you if you're very narrow topic um like you know load balancing for example um and so the way to make it differential is to make sure that the keywords in the slug are truly differentialized right like adding engine X to it or adding adding IIS um and you can test that by doing the search and seeing if it's only engine X pages If suddenly like web server pages also appear then you know that web server is now synonymous with engine X and that's problematic. So do you ever find that the uh actually I I I had someone message us on X I don't know if you saw this comment and someone's like this the FAQ page strategy a big problem with that is Yeah. Oh, sorry. Sorry. It's not It's not the I think it's it was the Yeah. FAQ. It was a people also ask strategy. People also ask, right? Right. If you've got those if you've got questions in other pages like as H2 titles. Yeah. That could be problematic. So, you definitely again um No, because that's that's different from the slug because if you're creating a specific page for this one thing and this but it's just an H2 on a page that's about lots of other things. So it depends does that H2 and that page rank right um rank. So if the H2 on that page ranks and then you put and that's already ranking and then you put up a people also ask page going after the same thing that you had in the H2 and this page actually is goes deeper into the topic. So just jet like more informative but and this p and this people also ask page is more relevant to the to the query because you have it in the page title in the H1 in the URL slug and all you have in the other page is the H2. So what is what's the result? Let's say the other page in the H2 has a rank history, right? So the relevancy is very low. Let's just say it's like a 10 out of 100. But that page ranks for other keywords and it's got a lot of authority and let's just say the authority is a thousand. And your new page comes along and its authority is only 100, right? But it's got a higher relevant score because it's in the title and in the URL and it's effectively the document name. So the whole document is relevant to just that title. Whereas the other page is relevant to a lot more things and because of its age and because of its ranking for its primary keyword, it's got a lot more topical Because of the lack of relevance, the H2 only gets to use 10% of that thousand authority points, which is 100. The new page, which only has 100 authority points, is 100% relevant. For that phrase, the two are perfectly equal. They will absolutely cannibalize each other all day, every day. So the PAAA is for people that are developing authority. So if you had a PAAA and another page and it's not ranking, go remove it before you add your PAA because you're just going to develop a problem down the line because eventually that PAAA is going to give that topic as an authority to your site and the other page will start to rank because it'll have more authority and more relevant with its lower relevance score. But if you're already ranking for it, then you don't need to use the PAA because you don't need to grow authority for that. You already have it. So, you need to avoid that and just move on with something else. And so, how much how much do you think about when you're putting up when you're putting up any sort of content, how much do you think about like, am I already ranking for this content? Because maybe like just a lot of people, they just want to go deeper on topics and they're like, I don't even want to have to think about that stuff. I just, you know, I had a new idea. I want to attack this this problem from a different angle with a different piece of content. So there's two issues where that happens. One is like where you have a change in content developers and you've got like a 10,000 page blog post or or or site, that's why you have SER reports, right? So you can see that you're already ranking. If it's a case that it starts to generate a lot of longtail keywords and you're not ranking for all of them, the temptation then is to start splitting out and be and and developing specialist pages. And so that's where you've got to be worried. So one, have a SER report. Make sure you don't, you know, sort of trade back over keywords you've already up written for. And also the SER reports generally have cannibalization reports. They're not 100% reliable, but they're pretty good. They're like 50/50 50% good and they will the other thing you'll do is you can notice that where you had a rank position and you dropped then you need to cannibalization should be one of the things you check for right and and and and the easiest way to do it and test it is to go to removals and do a manual removal request and remove that page and if the other the other page should bounce back between within 12 24 hours at least. Um, and if it does, then take the page and find another home for it, like find another index to put it in. And that's anyway, how about, you know, we were talking about striking distance keywords, keywords where you're ranking below position seven, or was it three to seven? Um, so what if like uh you you have a page, it's ranking for this keyword, but you're like, you know, this I don't want to I don't want to add more about this keyword to this page. I'm going to create a new page for this keyword is ju just targeting this keyword going deep on that keyword which you are already ranking for, but it's not position one to three. Is that a viable strategy? So, one of the things you're you're saying, for example, is like sometimes like you might go deeper, add more content. That is not going to make any difference, right? No. No. I'm saying I'm saying I'm saying for people who like just want to like explore it. I get you. I guess what I'm trying to say is that I just don't want anyone to think that by going deeper that the one page will be better. There's no way to value. They they they want to do that for like visitors and their brand and all that stuff. Totally get it. Totally get it. Um, you again, you've got to check to and and basically go and look at the page and see what it's ranking for, right? If it's ranking page 5, six, or seven, that's a safe zone. 3 to 7 is too dangerous. Um, at that point, I would not try to dig um into specialist zone. If you're ranking 3 to 7 for that keyword, uh, that's not a good time to um specialize. That's a good time to develop more topical authority in other pages or in that page. So if you don't want to add more to that page, find other keywords that are related that don't cannibalize. But doubling down on that keyword that is just not a good idea. That that's how cannibalization happens. So you have like uh you have a page about crawl budget and it also ranks for page ranked decay. Uh, it ranks it ranks position four for page rank decay because there's a short H2 about page rank decay in the crawl budget article. And then you just you put up a new article and it is entirely about page rank decay and take it out of the other page. Mhm. Or turn it into a word and link to your new page. Um, could that be that a good solution which is to link? It can be. The problem is if it still contains the word, right? um and it's got previous click-through rate history, the presence of the word can be enough. So, not always. Um that's what makes this a very difficult area and and and I think again looking at the the discussion I've had with Ryan Jones about and other people is that where you have a lot of authority and you build one page and it ranks for 10,000 keywords. I think there's an a lot of people jump to the conclusion that you just have to put in one keyword and Google knows it also means all these other things. as you have diminishing authority, that breadth goes down very, very quickly, right? Because you're not authoritative enough to rank for the higher keywords. And that's probably why it doesn't affect them. It's not something they see. And then when you start to lose authority or after a Google core update or after a spam update or whatever, you've lost some backlinks or you've lost some authority, the pages you have start losing traffic. It's because of issues like this, right? where one way of targeting a keyword was highly effective under the authority you had and now it just doesn't work anymore. And so these are things you have to look for. Did I over specialize in keywords or how do I build new pages? You've got to make sure it doesn't cannibalize. You've actually got to put effort into it and it and it is difficult. And so the the the other side was you said um this SEO Ryan Jones was like no this isn't a thing. Yeah. Yeah, he was saying to me, um, cannibalization is is is my eat. Um, that it's not really a a thing. And I was like, what was his argument? I think it, um, I think he was he thought that, uh, cannibalization was an algorithmic penalty or an algorithmic update. And, and also, he's not he's very busy person. So, um, you know, he was probably jumping in and out of the conversation, but it it's a very real issue. It's not an algorithm update. It's a duplicate content issue in the sense that Google thinks the two pages belong in the same index. Nothing's being penalized. It's a problem with how Google returns the results. If the page is blocked from returning in the results, people can't click on it. People don't see it. And that is the problem. You no longer get clicks from it. It's not So the solution is remove it from pages where it's not the best. You then you can create your own page for it which you can use once a page is Yeah. Once a page is cannibalizing the only treatment is to remove the page. You have to eject the page from you have to you have to remove the page from the from the index. You can't just remove the section from the page if it's just the section. Right. But normally cannibalization in one of the pages is the slug. Yeah. Yeah. But we're but I'm I'm talking like this is that this is that scenario where you have a page that is ranking for many different keywords. Yes. If a page is ranking for many different keywords and another page could also overlap. If it's not in the slug, it's unlikely to enter into cannibalization. The other page will just um win. The it the problem is is where you have a page where for example it's also ranking in a keyword and you then build a specialist page that has more relevance and you put that in the slug, that page has to enter that index because it's in the doc it's the document name. That's what c that's what also causes cannibalization. And so the only remedy is to eject the page. You just have no choice. If you have two H2 tags on two different pages, the one page is going to have slightly more relevance to the H2, even if it's very very very remote, than the other. It's the fact that it's in the slug is what normally causes pages to enter indexes and and where cannibalization starts. Why is the why is a solution to remove the first page and not just remove the section? the first page it's removing. So it you can if it's if it's just a section, but that's that that might be okay. Again, like I said, the page now has historical click-through rate history, which is very difficult to deal with. And so maybe that keyword is in the footer, which is very common, and that's enough for that page, right? Um because it's got so much historical authority built up that even if it's just a 1% signal, it's enough. How much should uh new new websites who want to do SEO worry about this? Who who are now getting stressed out like I just wanted I just want to put put out content and build links. The most common the most common issue for small businesses is service companies that rank their homepage for like let's do web design or SEO like my own site. I can't build a page on my site called SEO expert New York or SEO consultant New York or SEO agency New York. Even though none of these keywords are in my homepage, my homepage ranks for like 125 variations of that. There's I cannot build a new page unless I'm willing to suspend my homepage and eject it from the index, right? No indexing it, which would be a terrible idea. But if you if you're ranking for like roofer and now you like let's say your domain name is like McN roofers in Tampa and you now want a specialist page like best roofers in Tampa, you're going to struggle with that page. It if if it if it even gets a chance to rank and because of the slug. So in other words, when you add the page with that slug, you're forcing it into the same index that your homepage is already ranking for. And that's the problem. that's the only index it can live in because it exactly matches the slug that's going to be problematic. So if you're already ranking and this is what always happens is people are like oh I'm in 11th 12th place or even fourth place or fifth place if I develop the specialist page and link to it. If you can get a ton of backlinks, like your brother works in Microsoft and he can get a link on the homepage, then you're probably not going to have to worry about it, right? You'll just get so much authority it'll it'll overcome it. But for most people, adding the page will be problematic. Um, and I think that's where most of the cannibalization happens. like um you are ranking really well for um say um fish tanks in Ohio and then you decide take the family business nationwide and you build like 50 town pages and Google's like I don't really see the town being a differentiator so the town is in Ohio so I'm just going to treat them all as the same. those 50 pages all start cannibalizing each other and so suddenly you take your website you add pages you do the right thing right and now you're locking yourself out of Google so you got to be careful again the way to check that is am I if I go and do a search for like a town name am I already ranking under the opaces of of of being the best fish tank in Ohio then you don't need to do it or you've got to be careful about doing it so try it watch what happens and and just get ready to remove it when it when it doesn't work. How do you decide when when it's okay to just remove uh when you said when the click-through rate for a term becomes a problem? When you can't just remove a section from a page and put up another page or when you can't just like leave that section on this page and then uh and then interlink it to the new page. Like how do you differentiate between when can and can't? If you had a page that consistently ranks for that phrase over time and then it becomes intermittent and the other page becomes intermittent and the average position starts to to drop, then essentially those pages are blocking each other and you can look at it at at a keyword level. That's why I think site audits are so problematic because a lot of site audits are done. It's very difficult to do an actual SEO audit without understanding the strategy of the company, right? And it's very difficult to do that in an hour. Uh it it takes me months. I I I I most cannibalization again unless it's a five-page site where they added four new pages and they're cannibalizing each other. Um and I found this for photographers and I found this for a lot every business. Um usually for like B2B companies, we discover it down the road. We don't know it's actually the issue. It it sort of comes up when I start testing for keywords. Um it's very difficult to diagnose. I've I've also equally seen people have clawed misdiagnose um cannibalization issues and hallucinated. Um and I'm like, yeah, you overreacted to that problem and pruned way too much and you actually can go back and publish the content again. Um I I think I think the idea that an SEO can do a site audit in an hour or a couple hours or a day is is is in itself very naive. Um understanding the strategy, the intent, the competition, the mindset of the user takes weeks of of osmosis, right? getting in and sitting in with the marketing team understanding that sock is sometimes systems on a chip and sometimes it's security operations center and that Google can't really tell the difference right so that's what I mean like BERT is often a lot of a lot of um technical SEOs will say like oh Google semantic you don't have to target keywords no you target key phrases and and patterns but bird is not a an SEO tool BERT is a user tool right so and BERT just doesn't understand AI sock and sock are two different phrases. It doesn't yet once websites uh and SEOs and marketers actually go in and do the do the work of really focusing on decanibilization. I think it should become apparent like I assume that most SEOs have a SER report, right? And you've got your competitive phrases, your money key phrases, your you know branded. If you start losing a keyword that's vital to lead generation, right? That's that describes one of your services that if people land on that page, you get like 10% leads and you lose that and you've been putting out a lot of content. Then start looking into that page and why it fell. Maybe cannibalization is a reason. Um, another way you can look at it is uh some search SER tools keep a track record. So the problem with using a search console, right, is that if the page no longer ranks, it doesn't show in the report anymore. Whereas SER tools actually do a pretty good job of historical look back and you can say, look, between the last 7 days, last um 3 weeks, 28 days, 6 months, whatever, what did I lose? And if you see a keyphrase or a variation of a keyphrase you lost, um see if if drill down into that page and that keyword and look at that keyword has started to to bounce. So if you if you're tracking a keyword like um top web design course and best web design course and best course for web designers um and then one of those keywords starts to get erratic. then look and see uh filter in on the keyword in search console and then look at the number of pages especially in the last 24 hours and if there's a number of pages ranking around the same or they're splitting it or in the last week they're taking turns in different days and so if you focus on one page it's up and down and then you focus on the next page and it's up and down on alternating days that would look like it. So the only way really to diagnose it is to go to a keyword level. So pages can the problem with cannibalization is that pages can rank for their main headline keyword um and and not show up and it doesn't look like they're losing anything but it's just one keyword and it might still rank for a while or show is ranking or it's dropped in ranking but it's because of cannibalization and when you where you see two pages rank for the exact same keyword or a reax of keywords then you've got an issue. You might also see it where you've let's say you've got multiple teams writing about the same lettuce salads all the time, right? And you you're a big lettuce provider from from New Mexico. Um and you're starting to add like best vinegarette Italian vinegarette lettuce salads and rocket salads and it's going to be bound to creep in. If you start seeing new posts rank, we like I call it the you know the the failed X rocket launch, right? If it if it goes up and comes back back down and then flat lines, that probably didn't have enough authority to cannibalize. Or if it starts going up and down and it's got a broken orange line, that's another this clear sign of of cannibalization. So, it's difficult to diagnose. I wouldn't trust too many um AIs. I see a lot of people like just dumping their data in it. It's a difficult to diagnose issue. But in those instances where you see like those bottom blog posts not performing well. If you're not getting clicks for it, you're going to lose nothing by um so if you see the same keyword and you see three blog post and one blog post has only had three clicks and it's the same keyword in another blog, then just remove it. And if you see that the the average position normalizes and flatlines or starts to go up, then you know you've that you've corrected the issue. So um you probably want to make there's two ways to diagnose it. One is a primary keyword at last or because there's so many variations of keywords and you can't track everything. Um you notice that two if if you start looking at for a keyword or you start looking at a page and then look at a keyword it's ranking for um over time you notice that the orange starts to slip. If you look at those periods and you see a second page appearing then that's a sign of cannibalization and you might want to address it. Yeah. Um, I want to do a uh a quick true or false recap for things that we already discussed and uh it's just going to be a nice way to finish off the episode. So, it's going to be I think I have six things here. All right. So, true. True or false? You can optimize crawling. False. True or false? Reducing pages equals more crawl budget. Crawling is a single event. I'd like to take things John Mueller said for 200. Um, no, it's a it's a multi-staged event. Uh, it could be like up to five different crawlers to get to one page. Yeah. More crawling equals better indexing and higher rankings. Changing dates equals more crawling. It it can do. It's largely false, right? if you have So, first of all, site maps only start to work when you're highly authoritative for something and you get effectively a listener bot. There's no such thing as a listener bot. What I'm saying by listenerbot is what um, Google used to have like index now. Index now still happens for sites that are highly authorative. Like if if CNN writes a page, they don't have to go ping Google, right? They just add a new page to their index. And that's because they have a bot that it's in such a small pool that it's crawling their site maps every second. if they change the dates on their pages, they will their bot will automatically pick it up and their pages will get added to a high priority key, which means it'll get crawled within seconds. If you have no authority on your site, Google won't even look it won't even want to check your site map for another 3 weeks. That's your next index date. If you have a mediocre, if you're middle of the road authority where most people are, like the 35 to 75 region, if you do that too often, Google will ignore your last mod date. If, in other words, it says, "Oh, it's changed. I'll go look at the page." Part of the indexing process will go will store a CRC check that gives like an overview of what the page is without doing it like comp an entire word for word comparisons. Takes too long. and it'll say, "Okay, I'm going to I'm going to reindex it because of the authority of the page and the last mod date changed, but I'm going to look back and see that over the last eight changes, there was only a one bite difference." And then it'll go, "Okay, stop trusting it." And then you fall down and then suddenly your index cycles get further and further apart. There's no going back from that. So, you can do it, but it'll cost you down the road. Great map. Oh, go on. I was going to say there's a there's a process to understanding like from crawl to indexing that goes like does it have authority? Does it have authority for this index? Um when was it last indexed? What's the file size? At any one of those points, it can be passed straight to indexing. And then it's like is the file size um large enough? So, in other words, if you've got pages that aren't very authoritative, but they were only a oneliner, and you add 100 lines of text to it, it might get reindexed. And I think that's where people get the freshness myth from and where people get the, "Oh, well, I added more content to my page." That's because you're on such a a high you have to pass over so many thresholds to get indexed. And so, some pages have to pass through all of those checks, right? Was it a significant update? Is the last mod changed? do I need to pass a CRC check? Um, if if if you just changed your page title, you have low authority, you're not going to get reindexed and you might that's that's when you have to do a manual crawl. That's like one of the few reasons for doing a manual crawl. Um, and so I don't think people realize that those checks are applied to every site. More authorative sites need to pass less of those checks. And so that's the stages that that people have to go through. Last one, which we just said, site maps. Do they solve crawling? XML sitemaps um do if you have authority and do nothing and and then and please go look at the um Google dev guide. It's really clear. It goes like if your site's really small and all of your internal page is linked, you don't need a sitemap, right? We can find all your pages. Discovered not indexed means they can find your page. So sitemap's not going to help. HTML sitemaps, which aren't an official thing, right? Because a HTML sitemap is just a page. They were very very common and now they're very uncommon. Like Wix does not natively support, web flow doesn't natively support it. HTML sitemaps actually pass authority and if you put them in your footer, even though they get incremental um authority, they will actually solve a lot more problems than an XML sitemap. You do you recommend HTML sitemaps? Always. Mhm. So, you have one in primary position? Uh, I think so. Used to have one. Looking I'm looking uh now what would you just call it the site? Would you just call it a site map in the footer? Okay. I'm uh SEO resources. I'll just search the page for site map. No, I don't see it. I wanted to see how you how you do it. But yeah, you see those sometimes. Um, so, so sometimes we just build manual ones where we just like link to the main service pages, product pages, um, or avoid service and products altogether if you can. I I don't know why after 28 years of web design we still do that. I think it's terrible practice. Microsoft don't do it and Apple don't do it and that's good enough reason. Wait, what is it? Um, so a lot of a lot of website design still come out with services or products or products and services, right? So in the main navigation and everything is grouped under services and everything is grouped under products. So, if you're a product company, a lot of people just automatically add a products menu. And I think it's just super, right? Like, you're either selling a product or a service, right? There's not there's nothing else. And like I said, Microsoft and Apple don't do that. So, I don't know why everyone else does, but it just seems to be like this unchanged uh thing in web design where every web design agency by default, I don't know if it's the agencies or if it's the clients, but people still insist on having products and services page. You've got one or the other, right? If you're an SEO company, you don't really have products, you have services. So, why why bother creating? What's what's the issue? Like, if you're a if you're a local business, what's the problem with having a services subpage where you where you list your services and the services and each page is targeting a different a different keyword. What I mean is don't make people jump from your homepage to a services page. List the services on your homepage. Get straight into it, right? There's no need to put them on tier three, right? Bring them straight Bring them straight in, you know. Yeah. Or or figure out the most important ones and then link directly to those and then Exactly. Exactly. And then you can fan out from Yeah. Exactly. Exactly. Yeah. Yeah. Uh cool. I think this has been uh this has been a very awesome conversation, David. Yeah. I hope this helps a lot of people. Me too. Um thanks for having me. Yeah. Thanks. Thanks for Oh, David, you are always welcome on this show. I yeah just enjoy that last being on the show. The last one. Did you see the image that that person posted of us listening where like I have the the glass of coffee and you were you were like you were listening like did you do you know you did you see that I missed that? No. Yeah. No, I missed it. Yeah. Yeah. Someone somebody shared on X. Um I like I I want to just I want to just read the the person's comment. It was so it was so good. I think I even I reshared it. Yeah. It was uh shout out to John Livingston and John said interesting episode photo taken during the quote unquote the approach of this as an SEO problem is unhelpful and this image of your face is and he tagged you in it and he's and he said as a as always I'm a fan of all your podcast and material as well as the great David Quaid and uh it's such a it's such a great image. Um, I think I Oh, you're gonna have to put it into the im into the uh podcast. I'll do it. I'll do it. I'll put it I'll put it into the podcast. Yeah, I'll go looking for it after this. Yeah. Uh, all right. Cool. Um, David, thank you again for coming on the show and um, this is episode 1,026 of the Edward Show. 1,026 days in a row doing this podcast, the the best search engine optimization podcast. the only daily search engine optimization podcast with no days missed. And if you watch this on YouTube, thank you so much for watching. If you listened on Spotify or Apple Podcasts, thank you so much for listening and I will talk to you again tomorrow. Bye now.

Get daily recaps from
Edward Sturm

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.