Why 70% of Your Ecommerce Pages Aren’t Indexed (And How to Fix It)

Edward Sturm| 01:07:43|Jun 2, 2026
Chapters15
Discusses the challenge of indexing over 1 million product pages on a large e-commerce site and introduces the topic of improving indexing, crawl efficiency, and overall visibility in Google.

For massive ecommerce catalogs, focus on authoritative hub pages and rethinking crawl authority rather than just pruning pages or chasing sitemaps.

Summary

Edward Sturm hosts a deep-dloat conversation with David Quaid about indexing challenges on sites with millions of products. They argue that Google’s shift in how authority flows—and how crawlers process vast catalogs—means sheer page count isn’t the bottleneck; it’s about topical authority, crawl efficiency, and strategic hub pages. Quaid explains concepts like crawl pools, discovery versus FedEx fetch modes, and why shrinking a site’s pages rarely improves crawlability. The discussion then pivots to practical structure: use hub-and-spoke pages, rank product groups alongside individual products, and route authority through targeted pages rather than forcing a rigid tiered taxonomy. They also tackle real-worldities like catalog naming, slug relevance, and the pitfalls of over-optimizing folders. Throughout, Edward pushes for a marketing-minded approach to SEO—building content that joins related products and improves user journeys—rather than chasing perfect canonical structures. The episode closes with tactical checks a million-plus product site owner would perform first, plus a candid take on AI-generated content and its place in ecommerce SEO today. David’s insights are profuse with concrete examples (BMW bulbs, JFK cargo pens, hub pages) and a recurring emphasis: your page name and topical relevance matter far more than you might think.

Key Takeaways

  • Indexing on large catalogs hinges on top-level topical authority, not merely sheer page volume.
  • Crawlers operate in pools; 90% of URLs end up in a bottom pool with minimal crawl frequency, so growth in indexed pages requires smarter authority shaping over pruning.
  • Use hub pages that aggregate related products and topics, then link individual products to those hubs to efficiently pass authority.
  • Slug and document naming carry substantial weight for discovery; if the slug isn’t clarifying the topic, Google may decide not to crawl or index the page.
  • Avoid relying solely on sitemap-driven indexing; focus on pages that actually attract traffic and convert, then tier authority to those pages.
  • When designing URLs, parent folders are less important for ranking than the content on the page itself; prioritize clear, keyword-relevant page content and internal linking.
  • A marketing-minded SEO approach—creating content that links related products (e.g., blog posts around car restoration that tie together BMW bulbs, LEDs, and accessories)—can dramatically boost crawlability and indexing.”]},

Who Is This For?

Essential viewing for ecommerce SEO teams and web engineers managing catalogs with millions of SKUs. If your site struggles with indexation, this episode helps you rethink category structure, hub-page strategy, and how to allocate authority to where it actually drives traffic and sales.

Notable Quotes

"“Crawlers and the way you architect for crawlers will determine how authority actually shapes across your site.”"
Quaid frames crawl architecture as the key driver of how topical authority flows across a catalog.
"“Authority isn't stretched by the number of pages you have. Authority is stretched by the number of links in a page.”"
A core principle challenging the naïve view that more pages equal better indexing.
"“The document name has so much importance. If you have a nonsafe URL, Google may not fetch the page.”"
Emphasizes the risk of poor slug naming for discovery and indexing.
"“You want to bypass the four layers of folders and get content ranking closer to the tier where the traffic actually lives.”"
Advocates for hub pages and smarter content placement over strict hierarchical depth.
"“Google isn’t going to ban AI content; it’s going to judge content by topical authority and usefulness, not a blanket label.”"
Touches on AI-generated content and the evolving evaluation criteria in search.”}],

Questions This Video Answers

  • How can I structure hub pages to boost indexation for millions of ecommerce SKUs?
  • What is crawl pool behavior and how does it affect indexation in large catalogs?
  • Should I prune or prune-and-republish pages to improve crawlability for ecommerce sites?
  • What role do slugs and topical authority play in Google indexing large product catalogs?
  • How can I use blog or hub content to support product pages for better indexing and traffic?
Google IndexingCrawling and Crawled DiscoveryHub-and-Spoke SEOEcommerce SEOURL StructuresTopical AuthorityCanonicalizationFaceted NavigationAI Content in SEOTechnical SEO Best Practices
Full Transcript
We have a large website with over 1 million products and we are currently having a hard time getting pages indexed right now. Only about 30% of our pages are indexed. How can large e-commerce sites improve indexing, crawl efficiency, and overall visibility in Google? This is a question I got from Greg, a listener of the podcast. Thank you, Greg. Today we are tackling how very large sites, especially e-commerce sites, over 1 million products can do SEO. And Greg is having a hard time. He says only about 30% of his pages are indexed. Joining me for this podcast is the one and only, drum roll please, here we go, Mr. David Quaid. Welcome, David. So good to be here again. Thank you. Thank you, David. David, how can large e-commerce sites with over 1 million products do SEO? That is such a great question. I think this is also this question is going to become bigger and bigger. Um, I'm seeing it all over Reddit and X at the moment, seeing a lot of people talk about it. Um I definitely think the way Google is dealing so I think this is this has got roots in both how Google is currently rolling out or what Google is rolling out in the core updates and how it's managing spam and quality in the search engine. I think that Google will continue to tweak the mechanism rather than try to tackle the superficial layer. Right? So in other words, if you think like say um AI posts are causing an issue, right? like they're causing a lot of spam. I don't think Google's going to ban AI posts. I think it's going to tackle how those posts get authority from and that's why people are seeing problems with pages getting indexed. Google is changing the way authority flows across a site, how it es and flows. So those rules about how a link carries authority, um how authority is related, all of those are getting tightened um in a way. And I was having this conversation with Charles Float on X before I joined. In a way, it's kind of like one way to go after spam link farms and PBNs, for example, could be to go doortodoor and go, "Hey, you're a link farm. We're shutting you down." Right? Like the prohibition era type mentality. Another way is to actually force them to either become really, really good content sites or force them out of the index, right? And it can't do that by actually judging content. And so how they get authority, how they spread authority, and how effective they become, I think is how Google's doing it. And so that's why I think people are going to see challenges with indexation and how to resolve it. Uh the next step I want to jump into is that crawability and call efficiency is definitely a another topic that requires like a deep dive, right? I think that um within especially the web developer community, I think that they they still look at like websites efficiency at how crawlers engage and interact with a site. And I think that's more symptomatic than prescriptive, right? I think that um crawlers and the way you architect for crawlers will determine how authority actually shapes across your site. If you try to come up with sort of like a global north star that says if you link every page to this page then Google will understand all your pages and the site will come together cohesively and magically. I think that's a little naive and I think that like falls into like the spider fairy tale like the spider goes across all your pages and it looks at all your um firstly I think if you if you look at what Google has been talking a lot about how crawling works uh we've talked about this on the show before crawling works by uh pools right so you have pools where URLs are dumped into right so bots are in one of two modes discoverability and like FedEx mode FedEx mode means they go and fetch the page, take the page, break it up into constituents, and feed Google systems. That could be the snippet builder, the indexing service, um, whatever, right? When they're in discovery phase, all they're doing is collecting your URLs and putting them into URL processing managers who put them into a pool. So, if you've got a page with five million URLs, 90% of those URLs are likely to be in a bottom pool, which means they're in a swimming pool where there are 10 crawlers for every 100,000 pages, which means they're getting crawled on the never. And whenever I ask people who are in this mode of thinking why they think more crawling equals better indexing, they will say like, "No, no, I'm not thinking that." But you have to be thinking that because a page to for a page to get indexed it only actually has to be crawled once. And so if you if you maintain that then you can't also be in the other camp. So the second thing I think that people fall into and I see this in lm a lot right is that somehow you can control or improve crawling by reducing the number of pages. This is like the pruning idea and in both the page rank camp and the web development camp it has two issues right? One is authority isn't stretched by the number of pages you have. Authority is stretched by the number of links in a page. That's just like a fundamental um cornerstone of how it's done. And that's going to be a fundamental of the uh mechanics that Google's tweaking at the moment. So you unless you can find a way to reduce all of the pages of the pool that your lower pages are in, you can't improve callability. Right? If you're in a pool where there's 100,000 pages per bot, you could reduce or maybe there's 100 million pages per bot, you can reduce 100,000 pages off your but it's not going to make a sizable difference. It's not going to improve how often or when your pages get crawled. So, and then because of dampening and because of topical authority of flow, the usual structure and also the structure that people talk about like natural links where like your homepage is a whole bunch of branded links, those aren't helpful either, right? So, you've actually got to step out and you've got to go back to authority shaping um and you've got to look at where pages get clicks. And so I like to use the idea of like imagine like you have a power station and a bunch of cities around the power station and you've got a whole lot of substations in between and you have electricity flowing. If the electricity flows through an industrial estate and then gets through to your housing estate, the draw on that electricity is going to be enormous, right? And you can't solve that by just building more and more pylons, right? Which is like more and more links. You've got to become very very efficient, right? because the pylons themselves draw current out of the network, right? And if you look at it today, that's the US is still running on 110 volts and Europe and the rest of the world runs on 220 because um Tesla realized that 220 was more inherently more efficient than 110. And so if you look at how links die, even if you got a link from your cousin at Microsoft, within four links, it's mathematically dead on arrival, right? cannot overcome a four. You mean within four clicks? Within four clicks on your site. Yeah. Four clicks. Four clicks deep. This method of marketing is so effective, I had to make sure it wasn't against Google's rules before I kept using it. It's a form of SEO I call compact keywords. Whereas most SEO focuses on putting up articles to answer questions, how, what, when, compact keywords focuses on putting up dozens of pages that sell to searchers who are actually looking to buy. These pages rank on Google and convert so much better than normal that when I discovered this years ago, I couldn't believe this was allowed. It's less work, too. The average compact keywords landing page is only 415 words. Compact Keywords is a 13-hour deep course on getting sales with SEO. A customer said, "Compact Keywords contributed to a $4,000 sale within the first 6 weeks." Another customer said, "Give it to a junior employee. Have them follow it exactly as Edwards laid out. You don't have to do anything and you're going to gain a six-figure SEO level employee just by having them go through this course. Compact Keywords is about setting up an SEO funnel that brings you sales for years and years and years. It works with AI. It's less work than traditional SEO and it makes way more money. You can get it now at compactkeywords.com. Back to the podcast. So, if you've got 5 million So, let's build it backwards, right? If you got 5 million pages, Yeah. What? Yeah. What's the parent for those? How many are in a category? How many category subcategories do you have? How many parent categories do you have? You're immediately creating four tiers. So that means you've got to create tiers within tiers that actually receive Google traffic, right? That or all of your authority is dead. And there's just no other way around it. You can prune it, you can no index them. It's not going to improve crawlability until you get authority to those pages. There's also a lot of talk about like, oh well, they're thin pages or Google recognizes they're duplicate pages. Google doesn't have the time to do that, right? It doesn't care if like why would it care? Why would 10 duplicate pages be okay, but a thousand duplicate pages wouldn't be that's a subjective opinion, right? So, uh, Google has to apply the same system to every page and it doesn't have a bunch of if this, then that, oh well, we'll let that one slide case points, right? So if you're struggling to get um above a certain number of pages indexed, you've got to look at how are those pages targeting those keywords. Uh for example, have you have you gone from your naming culture and started giving those pages names that you don't actually get clicks for? If that's the case, then you've got to find some way to bring them back into the fold or you've got to expand into a new topical authority cluster and from there you've got to spread it, right? So the network also doesn't carry electricity that's compliant with your pages. Right? So authority is not a universal number. It's a per topic number. And if you've got pages that include products that are where the naming naming culture, right, like the slug, the page title isn't related to the category because you're using different keywords to avoid cannibalization or you're using different keywords to target what people are searching for. um then you effectively cut them out the fold. And so if you've noticed that the number of pages you have that are not indexed anymore going up, then you're in the exact same boat. You've got the exact same problem. And it's very difficult to do. And I think I read a few years ago that there are a couple of websites that study like what the percentage of indexation is on large sites and for a lot of sites this is like 42% is good or 38% is good because of that problem. The other thing is you look at if you try to look at behind the scenes, right? I think a lot of people look at the at the shop front, right? So they go to Amazon, they go to the TV section, they go like, "Oh, well, I'm doing the same thing Amazon's doing, right? I've got like the same clean URL structure. What does clean URL structure mean? Doesn't that doesn't really mean anything, right? That that could mean that could mean you're just using two words. That doesn't from an efficiency point of view, it doesn't help, right? So if you actually go back and do what does a clean what does a clean URL structure mean to you? I have an answer. Clean URL structure. Mhm. To me it means I don't use words that aren't helpful to what people are looking for. Yeah, me too. You have subfolders that will make the slugs make sense, right? But you have to remember that the sub the parent folder doesn't contribute to the slug. But it could help with UX. It definitely helps with US architecture. It helps I would it help Okay, let me ask you this. Would it help with having can you can you have topical authority apply to a parent folder? It depends. So let's say let's take BMW light bulbs, right? And let's say let's so how are how are they cataloged? They're cataloged by the chassis number, the year, the model number. So you have like station wagons, cars, SUVs, and then you have and a lot of the parts are shared, but they're actually given new part numbers for each one, right? The part number can be useful from a cataloging process because you're guaranteed a unique part number, right? the cataloging can be um the part number catalog idea can be useful because people search for the part number. But if people start searching for like, oh, I need a light bulb for 2022 X5, then you've got to build that into other parts of the page. Now, you can build that into the parent folder, but let's say the parent folder is BMW/light SLUVs/part number. the BMW from the parent parent, you know, the grandparent doesn't actually contribute to the document name. However, if all of the pages have the word BMW and they're all in the same folder, then BMW can be inferred across all the pages, right? So, if you've got a competitor who's been doing this in the past, a lot of people think like, oh, I can just go and look at what my competitor does. You can to a point except that your competitor might be doing something for 10 years and that's grandfathered in and you come along and copy it today, but you don't have the BMW search phrase as part of the keywords. It doesn't get grandfathered in. And so that's the problem with like points of observation and and so how can you talk about also how uh you you said that the subfolder name doesn't contribute to the slug. So let's say that you yeah let's do something simple very simple is like local services for example and your keyword is uh Austin roofing repair and your subfold and you you serve multiple locations multiple cities Dallas Austin so you have a subfolder for Austin now do you do your slug as just roofing repair or Austin roofing repair. I would do it as Austin roofing repair. And so all of the you're saying that all of the services like you have all these different services within Austin, all of the slugs would still have Austin in them and you wouldn't you wouldn't actually use cannibalization. Cannibalization is like coming back to mind here where um where if you've got like Austin inferred you don't need to do it, right? Um, if we come back to the BMW part because it might be easier. If if BMW is in the page text and it's in like you're already ranking for BMW and it's inferred across your site, you don't need it. If it's not, then you need to add it to the slug. So, that's the problem with giving like universal advice. It's like your site's DNA is dependent on how old your site is, how long it's been getting clicks, what the makeup of those clicks are. And so it it doesn't matter if your competitor is doing X or if I do Y. What what's happening to your site and whether or not that will create cannibalization or whether or not that will influence or infer a search phrase or whether the search phrase is already inferred is going to be up to you in your site. And the way to find that out is to actually go and look at the page. And if you see BMW in all of the search phrases, you can rest assured it's inferred. If it only ranks for the part number, then it's not working. Right. Back to Austin. If you've created a page called roofing for Austin roofing repair. Yeah. Let's say let's say within roofing repair you have corrugated iron uh perspects steel. If the pages only rank for search phrases with corrugated iron and steel and they're not ranking with Austin, then Austin's not inferred. If Austin is being inferred, then you don't have to include it, right? So you Oh, you're talking about you're talking about a page that's already up and seeing what it's ranking for, or are you talking about a new page? So, no, I'm talking about a new page. So your domain name if you if your site is like if your site's name is like Joe's roof repair in Austin, right? Then Austin can be a referred in all the pages without it needing to be part of the slug. Okay? And so that's the challenge that you're looking at. So, what I'm saying is if you've if you're now at a point where you've got five million products and three million aren't being indexed, you have to look at is are the are the words that I need that people are searching for in the slug and is that why they're not being indexed? Right? So, do I need to give more information? The answer might be do do you need to give more information in the slug or or just throughout the page? So the do the slug is the slug is the document Nate because I'm I'm thinking like I'm thinking you have like uh Okay, let's do just let's go back to like e-commerce and you have like you have a section of of things for like airports maybe it's like you have like forward slash airports and then forward slash out like outfits and there no no sorry let's be better pants airport pants forward or and then shirts and then within shirts you have forward slash uh blue and then forward slash red and and this is kind of so now you have like forward/ airports/shirts slash red so the airport the parent the folder name doesn't join the the document name so if you have like slashj jfk pens it's not jfk red pens what if it's also in the page title and the h1 if it's in the page title Well, the relevance is lower and that might be enough. And that might You mean the relevance is higher? The relevance is higher highest than the slug. The relevance is highest in the slug. Right. Yeah. So, so the page title is next then the H tags are next. The first tag is the highest. So, does again that mean that you're doing forward slash airports or sorry forward? Yeah. airportward slash airport-shirts slash airport-shirts-read. Is that how you're the the parent slugs don't make any difference, right? Except that the pages inside them share keywords, right? So, let's just say you need to rank for JFK cargo pens, right? JFK/C cargo. But each subfolder could be its own product landing could be its own landing page for multiple products. It could the best the safest way to do it is to put JFK in the slug or if JFK is in the domain name the domain name will actually join the slug. The parent folders won't. That's the problem with parent folders. So So you've got to look at your you've got to look at your site's DNA, right? If JF, if your site ranks for anything to do with the word JFK, then you probably just need to give enough information in the cargo pads. If your page ranks for everything to do with BMW, however, if you're selling car parts for every single brand of car, then BMW might not be inferred and putting in the page title might not be enough. So, where you've got pages that aren't being joined, you need to look at step one, do I need to republish and add more information to the slug? Step two, do I need to make the parent page rank? And the way around it, and if you go and look at how Amazon and eBay do it, is they create what we call like saved search pages or sync pages that get lots of traffic in, like for example, plasma TVs or 42in TVs, and they're typically just a list of blue links. And those pages get tons and tons and tons of hits from actual people. then those pages pass authority onto their trial pages. Whereas what most people do is they try to build pages like in tiers homepage master category lower category subcategory division category lower category. Um you need to plan for traffic at the end closest to the end tier and that's what I think the difference between web developers and web builders. Web marketers will think, "Oh, I need to get traffic at these tiers." Whereas web engineers are trying to get crawling over by pruning the site or by trying to encourage more bots to crawl in and that just isn't going to work. So, and I don't know as well if they will look at republishing, right, and adding to the document name, but that's essentially how you fix it. So, probably the most important thing is yeah, is to get pages with traffic as a parent page or have multiple parent page, right? You can have a page that's in your cate, you know, your sort of like categorical architecture and then you can have group pages that for example talk about home cinema that combine TVs, furniture and surround sound systems, right? And then relink to constituent parts. How should large e-commerce sites think about different category filters, sorting parameters, and faceted navigation? Um, that's a difficult because Google um, one, Google handles parameters so badly and two, they removed parameters. I think um unless you've got a lot of traffic coming to some of your search pages, try to not have them. Try to have fixed pages with fixed results so they're not constantly moving around, right? or have pages with fixed parameters of the most common searched keyword combinations link to output pages or link directly to like grouped category pages and then also how you get links. I think there's also like a lot of talk around like natural link profiles. The natural link profile I think is a bit of an urban legend created by people to make them feel safe about buying links. And so I worry about its real about having any real credibility, but you need unnatural link patterns because you need links to get to these outer tier pages, right? If you're like eight categories down, having lots and lots of PR links to your homepage just isn't going to filter there. And building more and more links to your homepage doesn't solve that problem, right? An 85% tax after two runs is so heavy. And so that's why you need to create articles out that link directly to these product pages. Another way around it and why I think blog posts are so important is because you could write an article that joins different components. Like instead of just talking about BMW lights or visiting JFK airport or whatever, you talk about how an experience and that starts to join related products across different product categories, right? Like instead of talking about what light bulbs you need for a service, talk about car restoration. Talk about like all of the different car parts that go into car restoration. And all of those things bring new traffic into those pages and then sort of like refresh the authority going out to those pages, brings them back in, makes them more relevant. That way when they get crawled, Google has a reason to index it. That's the way to think about it. And I think that's like a that's like the oldest way to think about SEO. Yeah. You're saying that you have different hub pages for either individual products or groups of products. and and and and you want you want the individual products to to rank and you want the groups of products to rank. And so instead of directing links to one individual product page or group product page, you do it to a hub page that house all of these so you can spread some authority to them. And then maybe for the for the keywords that are more competitive, then you might build a link to that individual product page or group product page targeting the competitive keyword. Absolutely. If you talk about like for example um how BMW owners are starting to change from like traditional um bulbs to like blue zenon bulbs or now changing to like colored LED bulbs and then you you do like a survey amongst all your buyers and they're like, "Yeah, we we want to keep our BMWs looking up to date, so we're changing the color of the LEDs." and then you find that like 85% of owners prefer to keep their BMW updated than buy a new BMW. Maybe you can find someone who wants to write an article about it and then link to that blog post. And what you're doing is you're bypassing that like tiered sort of like canonical approach and you're just building a page that joins just the end tiers but that page has direct backlinks ranks in Google for a particular set of phrases and you've sort of like instead of trying to pass authority from like 1 2 3 4 5 6 7 8, you're just doing it at level seven directly down to level eight. Right. Right. That's how to solve the problem. Yeah. I Yeah. going on. And I don't think enough web engineers think that way. And it's just a small change in thinking. It's not it's not very complicated. Well, that's not that's is not even an engineering thing. That's more of a marketing thing to build links to a specific hub page. Right. Exactly. And I think the that when you look at a site that's, you know, productled and a site that's like managed by tiers, they're trying to overcome that by making the tears. And you've got to bypass that. So you've got to step out of that thinking into a marketing frame of mind, right? But it's not a very difficult thing to do. It doesn't mean that you have to go hire a PR agent. Do you think building links to hub pages is unnatural? I don't think it's unnatural. No, I think it's very natural. I think it's an important thing. That's I think it's natural. I use So you're like you're like, "Oh, I hate the term natural vacuum profile." I use that on the show like every episode. But but I think like there's ways to do it and there's Okay. So like so an example that I give is like if you're a real business you what are some things that real businesses do that look natural? It's like okay they have foundational links they have social media links they they maybe submitted to some directories um they they've shared their their homepage on on social media as well. This is like posts and like yeah these are no index but like this is part of like a natural backlink profile. You have a mix of no follow links and do follow links and it's things it's you have a mix of like spammy links and non- spammy links and it and it's a very few exact match anchor text and then mostly like mostly is probably just a brand name or a naked URL. Absolutely. So, if you've got a company that's got a SAS product and you're now creating a an app for the AWS marketplace or the Azure marketplace, Microsoft's going to want to link to that product page, right? Not not the product category page, but to the actual page where they can learn about how your product is deployed in their marketplace. That's a natural link. What I'm saying, I think what I was trying to say earlier is that people have built an idea of natural link profiles that says, well, 90% of your links need to be branded and to your homepage. And I'm trying to say I'm trying to challenge that idea. We we are we are on the same page. Yeah. Yeah. Yeah. It's it's it's it's think about think about having a clear site architecture where you can minimize the steps. you can minimize the clicks to get to pages that you want to rank on Google and that way you can direct authority to one of these hub pages which allows it to flow to different pages that you want to rank. And then something else to keep in mind is that the more pages you have in a hub page, the more authority that hub page needs because you're spreading this authority to a bunch of different pages. If you have lots of pages in that one hub page, now you you need more authority because it's getting spread between the pages. In other words, if I just sell lights for Mercedes-Benz, it's going to be easier for me to have Mercedes-Benz as an inferred keyword in every page because maybe I have it in in my domain name, but I have it across my whole site. Now, when you step up a a level and I'm I want to sell lights for American cars and European cars and cars from from Asia, I now have to I now have a bigger problem, right? And so, if you're if you're coming back to the design philosophy and expect that, let's say somebody's got a BMW dealership in in Florida, they don't want to have to go from the top level and go, "Oh, am I talking about American cars or European cars and then down to Mercedes?" They just want to get to Mercedes, right? or to BMW, right? Or whatever, right? Or Hyundai. So you can't you have to then start thinking about how do I get content ranking closer to that tier rather than up here. I need to get I need to get clusters over here getting traffic and sending traffic to those links. And that's what I think is missing in those large e-commerce sites. Can you give another example? So like let's say you're Yeah. Let's say you're an e-commerce site and you have faceted navigation and you you have like different categories. So what would what would URLs look like? Let's say just like maybe even give an example that's uh with four parent folders. What would what would what would a good what would a good URL structure look like for that? I I don't think it matters. I think what matters is like you have something like blog posts or or sections of blogs, right? Maybe you have like blogs for different uh clusters and you start building thematic content, right? And you know like a lot of people in SAS for example understand that they've got like their money page which is like what their SAS product does. It's an overview of their SAS product, what problems it solves, who the ICP is, and then they start building use case pages like this could be very good for companies, 50 to 100 people. This could be very good for companies that are trying to get attribution or companies that are trying to help uh enable their sales team. When you've got these big e-commerce sites, you need to remember that the user doesn't care about how vast your collection is. You need to start building themes around those collections. and try to jump over the four layers of parent folders. Sometimes those folder organization things are only useful to you because you have so much inventory, right? They're not actually helping the user, they're not helping Google, right? So the thing is it doesn't matter if you have four levels of folders. It's how can you o bypass them as fast as possible. In other words, if you were driving from Miami to Palm Beach, do you really have to drive through Hollywood and Fort Lauderdale and Bokeh or can you just get on the I95 and just drive directly to West Palm Beach? Right. That's really what you want to do, right? And have like an exit for West Palm Beach so that you don't have to drive through every town. You don't have to make the user drive through every town. You're still talking about You're still talking about like authority and hub pages. Mhm. Yeah. But I but I think there are some people who might listen to this episode and be like, "Okay, do I need to do I need to do uh forward slashshirts slash uh blue shirts forward slash blue shirts with sparkles slash long blue shirt long blue shirts adding adding more words to the for to the parent folders doesn't do anything is what I'm trying to say, right? you need the word these are people who would also want the parent folders to rank themselves. that's I guess that's one way of doing it, right? But you need you can put that in the page. You don't have to have it in the in the in the folder name, right? You you within the folder you're saying you can put that in you you'd put that in the page title. You would put that in the H1, right? But so let's just go back. So, I don't think there's any point in having a folder naming structure that's got blue shorts/blue shorts/ blue shorts short or cargo. Yeah, that's what I'm saying. It's redundant. There's no point repeating blue. That's what I'm saying. It's redundant. It's very redundant. But you can have a page at any point of that that's got a whole bunch of words in a slug, right? You're saying you can or cannot can. Oh, yeah. You absolutely can. Sure. The only time your p your the only time the folder is going to dictate the pages of a slug is for the root page of a folder. Say that one more time. The only time the parent folder name is going to dictate the words of a slug is for the root page because the root page typically doesn't actually have it's just forward slash page, right? So root in. So I I I still think in Unix terms, right? Can you give an example? Um um I need a spreadsheet but basically the root page of a folder is effectively the folder name. Okay. The child pages of the folder are the folder name forward slash or the sl words. Yes. Exactly. Yeah. Yeah. But the but the root page actually is the page that doesn't actually get any words in its document name. That's the only reason I would have words in the folder. You're saying you're saying wait the you're saying the root page which is the same thing as like a subfolder is does not have any words in its document name. Yes, that's that's what makes it a root page. So root page is just the parent folder forward slashno. That is the root that is technically the root page. And that's because of how Unix and and DOSs were designed. And that's how Google actually still runs today. So typically what makes a folder is that the words between a forward slash and a forward slash unless it's a trailing forward slash, right? That's a folder name. But the root page actually doesn't have a name. And that's why you would typically have words in the folder name is to give the root page a name. You typically used to have a root page. So basically, if you if you're on a category page and you delete the slug, that should bring you back to root page. It doesn't always, but that's that's technically what should have happened. David, wait. So I'm So I'm basically saying the names in the parent folder are kind of useless. Okay. But kind of what I'm saying. What? That they're That's kind of what I'm saying. they you're well you're saying that they're useless and they're not useless because sometimes you want something the parent folders are their own pages that you want to rank but you could also build a page inside those folders right and give them lots and lots and lots um keywords yeah that's what I'm saying so like um okay just you have forward slash okay you have forward slashshirts forward slashblue and forward slashsparkles so The So the one in the middle is is for is for/shirts/blue. And maybe you want that to rank for blue shirts. And so then you would put blue shirts or you you would put Yeah, you put blue shirts in the page title. You'd put it in the H1. You would have a a selection of blue shirts on the page and then you would click on the blue shirts on each page and then it would go into either a single product landing page or multiple products. And that's what I'm saying that that middle subfolder is also its own page that you want to rank. It is, but it can also have a another more complicated document with its own name, right, within itself. I think that's where this would be much more useful if we had a Google sheet. Yeah, maybe we should have a Google sheet. We could I guess we could do it. I mean, you could literally start sharing your screen right now if you wanted to if you wanted to open it up. So David is now sharing his screen if you are listening on Spotify or Apple podcast and I'm going to narrate. Okay, you tell me what you give me the the folder structure you want. Okay, we have shirts and then and then we have blue and then we have sparkles. So that's so that's three. So that's so forward slash like this. Yes, exactly like that. Yeah. Yeah. And what's the what what product are we selling? Give me a page that would exist here. Okay. In the middle we are selling blue shirts. I mean every one of them is its own page. So So shirts blue blue and we have all the blue shirts in here. Yep. And it Yeah, basically they'd have all the my blue shirts. Yeah. And that would be its own page. And so you would have you would have the page title be be blue shirts. You would have the H1 be blue shirts. Yeah. Pretty much. I'm going to do this, right? I'm gonna go shirts. Oh, okay. So, you have So, you have forward slashshirts slash t1 and then forward slashc1 and then you have then you have uh how come you're using underscores and not dashes? I've always preferred them. But I think Google says use dashes. Okay. So you So you have So you're putting the vinyl blue shirts with sparkles nested within two extra subfolders T1 and C1. Yeah. So like um table one category 1. And so also I can then take forward slash shirts slasht1 and have okay but what if you also wanted just you just wanted blue shirts and you can just do this. What I'm saying is that the the the t1 here and the blue here aren't helping like this page. Uhhuh. like the sparkles page blue here and shirts here isn't helping this page. I understand. So is there are there are there instances where it becomes redundant the amount of of because you can always go deeper within within keywords. You can always make a keyword more longtail. And so let's say that you are nesting different products within within subfolders for for for example in forward forward/shirts which is a subfolder. Are there instances where you could be too redundant or not really? Um I'm sure there are. Um but but but it's hard for you to think of them. So it is harden for me to think of them I think. So basically so basically what you're saying is if you take shirts slasht1 slc1 and then you have your blue-ashirts with-sparkles. You could can you can you click on that and then and then click on a put in another forward slash into the URL and then write like blue shirts with sparkles and sunglasses and that would be a perfectly acceptable uh No, no, no, no, no. You have to add it. No, no, no. First, first put a forward slash. Okay. And then and then put in again blue shirts with sparkles and sunglasses. And so like deal. Yeah. And so basically something like this actually would be better for SEO is what you were saying. No, I'm saying that the blue shirts with sparkles here. So this becomes Yeah, but Okay. Yeah, but let's say you want let's say you want to rank for blue shirts with sparkles and then you also want to rank for blue shirts with sparkles and sunglasses. The thing is blue shirts with sparkles like you want that's just a little bit you need to So basically you're saying you need to repeat it here again. Yeah. Well I'm I'm not saying you I'm not saying it either way. I'm I'm asking you I'm asking your Yeah. I'm asking your take. I I got you. I got you completely. In other words the this But why are you why are you doing 105 C9? I'm saying I'm saying that you that you want to rank for both blue exactly I'm saying that these two URLs are exactly the same. Oh yeah. Okay. That's what I'm saying. And and this is not the same. This and this are not the same. These two are the same. And this and this are not the same. In other words, sparkles and sunglasses isn't inheriting blue shirts. I understand. I just wanted to spell it out in case someone else didn't see that. But you're and you're saying that you can't actually really be too redundant with URL structures with the slug name. No, you can't with the slug name and Yeah. Okay. With the and the name of the the subfolders. You can't really be too redundant. I don't think so. So, so, so I'm going to I'm going to give an example and like let's say let's say that uh you were auditing an e-commerce site and you had basically like someone who someone who was doing this. This is kind of what we were saying earlier. Um, blue shirts and then and then and then blue shirts sparkles and and then so so we're being super redundant here in the URL structure and then even and then even another one which is let's say base ball cat. So, you're saying that this would be a perfectly acceptable URL structure and that this wouldn't be too redundant. So, yeah, from an SEO point of view, it won't make any difference, right? It would it would actually be advantageous because like let's say that each one of these subfolders you want to you want to rank you want to rank for blue shirts you want you want to rank here for for excuse me for you want to rank for blue shirts with sparkles and sunglasses and this would actually be a fine URL slug it could be what I was trying to say is that you don't actually have like so if you take this Right. The the where you have these folders. What I was trying to say earlier is that you don't have to have this doesn't have to be a file name. in this case, blue shirts is T1 and blue shirt sparkles is C1. You can still have C you can still have T1 rank for blue shirts, right? being sure it's T1. Yeah, but you're saying that the that the URL slug is very important. What exactly? But you can still do this. it what I mean is No, I understand the I understand you're saying you're saying that you could you could obscure it with more subfolders in between and that's not going to affect the relevance of the main page. This isn't this wouldn't be a subfolder. This would be a page of T1. This would be a page of T1, but if you added if you added a forward slash and then added another page after it, then it becomes a Yeah, but then you but you don't have to add a for No, you don't you you don't I'm I'm talking about I'm talking about people who want to have like a clear who who are using hub pages and just want to have like an organized URL structure, right? But you can you can what I'm saying is this you can still have blue shirts rank. You don't have to have blue shirts continue out as well, which is what you're saying here. You don't have to. No, absolutely. You don't have to do that. I I'm trying to give I'm trying to give the craziest example just just to to make sure just to fully understand what you were saying. And I just want to make sure that people understand that you can still um have T1 here and blue shirts here and then still have forward slasht1 slashz1 slashgroup five right slash blue shirts sunglasses and sparkles right and but and the most important thing is the amount of clicks that it actually takes to get to each page because that's how you are spreading your authority, right? And the best way around it is to build blog picnic ideas and then say for your holiday weekend, buy our blue shirts with sunglasses and combo deal. And then by linking this part great the whole thing by linking this to this page you forget about like the hierarchy and you just go from this page direct up to this page right all right that's the shortcut David Quaid if you inherited a 1 million product site tomorrow what would be the first three things that you would check good question um I think I would started looking at um how many pages aren't indexed anymore or how many pages are in discovered and just sort of use that as like an idea of scale of problem right because discovered and crawled are two different issues right discovered means that the page the site is aware or Google is aware of the page like it's in a sitemap but it means it actually didn't care enough to go crawl it right and I think that's another thing we didn't touch on I think a lot of um engineers think that if they build XML sitemaps that somehow it overcomes all of this and it doesn't. Right? So discovered not indexed is actually worse than crawled not indexed because it means that Google found the URL lots of bots found in other pages but it didn't request fetching of the page at all. Like so the the the crawl managers were like not interested in this page. The only data that they have at this point is the slug, right? And that's why the slug is so important and is still so important 30 years later is the only data they have is the page is linking to it and the slug. So, for example, let's say your page was called the reverse of what we just did, right? Let's just say the page is called T137A, right? For blue shirts. The slug isn't telling Google any giving it any helpful information, right? These used to be called unclean URLs or nonsafe friendly non-safe URLs. That means that Google's like, you don't have this T13A in your topical authority. I'm not even going to bother. I'm not going to fetch the page. I have no interest in it just based on the document name. That's how important that's why the document name has so much importance. So, you've put it in your site map and Google's like, I don't care. So, to all of the people on Reddit who feel compelled to say, hey, build a site map. it'll solve it. This is why. So I would look at all of the the the nomenclature in the discovered knot and I would say okay here we've got a we've got a naming problem or we have a topical authority problem. Either we fix the names or we fix the topical authority. Then I would look at what are the pages with the highest amount of traffic. Are they ranking or losing rank? And I would want to fix those. I would want them all to be stable or gaining traffic. And then I would look at how many pages do we have that are indexed gaining impressions and not gaining clicks and try to fix those. Um, and then I would say look how many of these pages do I need people to land in or how many of these pages are people going to navigate towards. Right? So if I'm a BMW dealer and I need to buy lights, I only need to you to land on like the category manager and then you're going to do a search and I don't need you to find the page. I need you to find the parent page or is there a lot of impressions for that particular um product and the only way to win is to have the product page land then I'm going to focus on those product pages. So any and then I'm going to look at go back to my keyword universe and say have I got products that have a huge amount of search have a high cost per click value and a high basket value that aren't ranking those need to rank. if nobody's searching for them, they don't need to rank. And then I might even pair back and say like, I don't I just don't care about these pages. Um, that would be the easiest way to do it. You said you said at the beginning of this podcast that Google is changing how authority flows between pages on a site. How are they changing how authority flows? What what what did it look like before and what does it look like now? Such an interesting question. um very hard to def to describe because I'm still trying to sort of like find what the parameters are and the the sort of like end points and start point. So these are the parts of the system that we don't know, right? We don't know exactly how Google um relates. But what I'm seeing is that in in so a lot of people have been saying I've been losing traffic, right? Or I've been losing clicks. Um is it AIO's? Is it because um I'm not I've been penalized or I've been hit by something it but what's actually happening is their pages used to rank for a lot of keywords, right? Right? And there's a lot of unknown keywords in that makeup. And every month they're going down. I call this like a step change. They're just going down. They're not hitting a wall, right? A penalty is typically hitting a wall. It's like that. Bang. Your traffic goes straight down. Within like 24 hours, it's just straight to zero. If it's a step down, it's because your pages are no longer ranking for like these high root keywords and they're just going down and they're not as relevant anymore. So that's what I mean by Google changing it and it's very very difficult to describe like it's either happening to you or it's not right. Well, we've talked about this I think on the show is that Google has tightened its algorithms around topical authority and so and so actually the the keywords that you're losing rankings for might not even be beneficial to you. you might not be getting conversions from from ranking for these keywords or you might but they they're just not big enough to show up right in in in the sphere. And so now you've got a bit of work to do. You've got I think pages that are going through step changes need to be republished. They need to be rehoused. You need to say well maybe I've got keywords in here that I that are outside of my relevance and so I'm no longer getting that traffic. Why have that keyword in your slug? Remove it. Repub. And the only way to remove it is to republish it. If it's in your page title, that's easy because you can just take it out of your page title, right? But if it's in your slug, then then you have to remove it from the slug. Exactly. If it's in the page title, Google will ignore it. But if it's in your slug, Google can't often ignore it because it's part of your document name. And so um if you're seeing stepped down changes even if even if it's impressions and not clicks then and also I think people who bought a lot of backlinks especially sites that relied a lot on and and I think we're seeing more and more where um people are looking at their like backlink makeup and they're seeing that the number of lost backlinks and the number of one backlinks are starting to become mirror images of each other. So if they're gen creating a 100red a month and they're losing 90, something's something's going wrong. In other words, that same step change that's happening to you is happening to websites that link to you. So even if you're not buying links and you had a link from someone else's blog post and they're losing authority, their fringe pages are becoming de-indexed, you're losing back links. And so that's a sign that your topical authority has taken a massive step change down. And so you may not have had to worry about precision, right? Because you're just getting so many backlinks. So you'll typically see, for example, people saying, "Oh, Google's really smart. It understands semantic search. So I could type in this or I could type in this and they loosely mean the same thing and I get both clicks." That actually only happens for very high topical authority sites, not very high DA sites, very high topical authority sites, right? Where you start to lose authority, the bridges between semantic web actually roll back. That's the tightening effect that we see, right? Whereas people who think that Google's smart at semantic, that should apply equally across the board. And because it doesn't means that it's not actually semantic, it's actually topical authority related. Do you think that that Google is asleep at the wheel as so many SEOs have have claimed? No, not at all. Not at I I I I see some remarks about SEO and like let's say like I don't like I don't like pages that are prescriptive and saying like you must have a money page and you must you know one of the things I hate is that like people say like oh your blog post must be informational and your money page must be salesworthy right why can't your blog post be salesworthy I don't understand that right I don't I think that would I'd assume that you would think the same as that right why I believe the same. No, it's more so that typically speaking blogs are informational, but you can have a blog that sells, right? But there's nothing in inside Google and there's nothing inside consumer's mind that says the other way around. And in fact, that's the way it used to be, right? So when people give that very prescriptive, right? I think that's like a that's that's part of where people don't get subjectivity and objectivity, right? That's where it's it's too blurred in their mind. So, let's just say I didn't like that idea and I said, "Look, I think Google's going to get rid of people putting in this prescriptive ideology, right? That's childish. Google's not going to do that. Google's not going to come up with all of these like, oh, we don't like prescriptive advice, so we're going to get rid of these." That doesn't make any sense. Google doesn't work at that level. So, if you're hoping that some spam tactic that you don't like, maybe it's listical pages, right? Maybe we're all sick of seeing people put their own name in the top list of top SEOs, right? I know I was sick of it when I saw it. That's why I started doing it, right? Um, and I I openly admit to doing it. Like, I don't think I'm the world's best SEO. I I created those lists to show that LLMs will accept the reality they're given. That's why I think LLMs are a bad place to get SEO advice because you're getting people who give prescriptive advice. You get people who are giving advice that's ideological. In other words, I want you to believe this whether it's true or not. I want you to believe this because that's my preference, right? And I'm guilty of that as much as the next guy. So, let's say I don't like listicles or I don't like people um giving a particular advice and I think it's spammy, right? Google's not going to go and get rid of pages that look like that. It can't do that. the the algorithm would become heavily bloated, right? It would you'd have all these if then forks, right? The same pages from CNN have to go have to the same algorithm has to has to be passed over pages from CNN and link farms and spammy websites and your competitors websites. So, it has to deal with the mechanics at a basic level, right? It's kind of like if you're driving your car and it starts making a clocking noise, a knocking sound. Mercedes can't go and put soundproof paneling in the engine, right? cuz that'll cause it to overheat and cause the car to weigh too much. You got to fix the fundamental root problem and that's typically authority and authority. Authority and relevance. Exactly. Yeah. What's what's your take on uh I think a lot of people want to know what's your take on AI generated product content with e-commerce sites? Um I've always I've always seen a like I've always seen a positive to that, right? like rather than people like creating like lots and lots of cookie cutter elements and copy and pasting them in if you had a really good AI tool that could potentially give you a better product and say hey you know what yes this is the same part number as this or here's a link to a video that shows you how to install this that could be really really useful right and again just I know that people don't like a lot of AI content and I see people giving a lot of advice saying oh Google's going to clamp down on AI content it's not right and and and you can tell because that same advice that's being indexed in Reddit is being indexed alongside his other advice that you are seeing though YouTube just announced that it's putting AI labels on on videos using AI and it's prominent AI labels. Right. Right. I think that's also a great idea. I think some people are allergic to AI content. Well, it's also very helpful for the amount like you and I have talked about these anonymous AI channels putting up fake reviews and then they use the then they rank for the the they're put they're doing fake reviews of competitors. They rank for keywords like competitor plus review and then in the video and in the description they suggest their product as an alternative product. This is getting cited by chatpt. Chachi PT doesn't know that it is a fake review and and all these videos they literally their entire channels are review channels and the entire channels are also AI and so maybe well pro I mean probably pro this is probably wishful thinking but like maybe seeing a big fat AI label on it will actually make you leave the review and then and then pogo stick and then it won't do as well in Google and then it won't get cited as much in Chacha PT But I think that's probably a bit of wishful thinking. But yeah, there Google is it may be it may be because like the look they're two different engines, right? Google is a relevance engine and requires two different types of authority whereas YouTube generally allows requires um percentage view watch, right? And so it that's where the labels may help, right? Um I also think that it's also their way of dealing with like scaled YouTube content. Yeah, that's what I'm talking about. That's what I'm talking about. Um, and I think that's very very helpful because like you said, I think that is a very big play because YouTube is one of the is like one of the best parasite platforms there is right now. 100%. And it's very it doesn't have the same spam standards as Google um content has, right? It it gets away with a lot more. So, I think it needs a different um a different mechanism to deal with it. And and I'm hoping that if people see that I I think I would love to think that people would abandon that content faster and that would be amazing. Yeah. I mean, but it has to has to stop working. Oh, you're saying you're you would hope that visitors would abandon that content. Would it? Yeah. Yeah. I agree. I agree. But but I don't know if it's wishful thinking or not. I I think it stands a good chance because like I think if if you look at what's happening, a lot of people love AI advice for SEO, but then if they're reading a page about something else, they don't like AI advice, right? Um if you know it's AI advice and you've asked AI for advice, you love it. But if you if you're looking for human advice and you get AI advice, you hate it, right? That's the difference. And I think um you know I was reading a thread this morning where people were talking about like oh I think AI is good at giving this type of advice and not that advice and I I've started to see that people are see LLMs as univocal and I don't ever see LLM as univocal. I see LLMs as uh very much driven by the pages they're synthesizing at that moment that could literally contradict each other on the next question. And that's why I don't see it as univocal, right? Cuz like if you and I are talking about something, you'll tend to stay on track unless I manage to convince you otherwise. Whereas an MLM, it'll just wander. It'll just meander through, right? It'll it'll just contradict itself happily all all day because the pages it's reading. And that's why I worry about people getting advice from LLMs is they think like and and if you look at the geo propaganda that's coming out, they they go, "Oh, well, it it it must know that this blog post is written by someone who's not really good at SEO. whereas, oh, look, this blog post is written by um someone else who's much more knowledgeable and has much more trust. Those things don't exist and they're actually not good ways of judging content. And that's why I think it's terrible when people say, "Oh, well, Google has LLMs. It can determine whether content is trustworthy." It's not about trust, it's about opinion. Um, you know, I if if if you trust the person's opinion, you'll apply it. But don't have opinion. Can you imagine the eyeballs on the search team right now? Like the search team are funding 99% of the company's ambitions. What do you mean? Can you imagine the like so if you look at Google, right? Oh yeah, yeah, yeah. I Yes, I know exactly what cloud team is not funding AI, right? The the the search team is funding the ads team which is being cut down faster than like meta. Um the search team is funding the search team. It's funding the shareholders. It's funding the executive team because like they don't do anything productive, right? Which I don't mean that in a negative way. The search team is funding all of the nonprofitable parts of Google. It's funding the data centers, the IT team, the legal team, and the HR team. And it's funding 40% of AI because they're going to market to borrow money, right? Can you imagine the eyeballs on the on the search team? If the search team says, "We want a rooftop bar." I imagine this search team gets a rooftop bar. Yeah, that's why the claim that that that that's why the claim that they're asleep at the wheel is kind of crazy. It It's very crazy. And I know that the I know that there was a lot of talk about the PPC team taking over the search team. I would imagine that's there's a lot I I would imagine that a lot of what happens in the search team arrives on the CEO's desk by 11:00 in the morning. I'm pretty sure Sunday sandai is sitting in and the CEO of Google LLC is like his best friend is the CEO of Alphabet. Last question. This is uh we'll just wrap with this which is it's kind of everything that we've talked about on this episode. What makes Google decide a product page it just within like a few sentences? is worth indexing and what makes it not get indexed? It's very simple. Um, does the product does the document name match the topical authority of the site or the incoming links? That's it. End of story. Nothing else. David Mcuade, once again, thank you for coming on the show. I'm sure we'll have a lot of questions. Drop them in the comments. Drop them in the comments or send me an email. Preferably drop them in the comments. I will try to answer all the comments in on the YouTube videos. Please, please comment or please ask us new questions. We will make new videos from new. We'll make new we'll make new videos for questions and we'll answer the questions and uh yeah, David, always always great having you on the show. Thank you again. And thank you to Greg who asked who asked this question which inspired this episode. Yeah, thank you. Great question. And quick congratulations to friend of the podcast Harpreit Singh who just Congratulations. Yeah. Who just had a baby boy. Fantastic. Congratulations buddy. That's great. Congratulations to Harpreit Sing. They're amazing. And Harpreit's actually going to be coming back on soon. We have a show scheduled on Google Merchant Center. It's me, Harpit, and Goagan for that one on on Google Merchant Center. Best practices and mistakes that people make. Congrat. Yeah, it's going to be good. Congratulations to Harpit. Wow, baby boy. It's amazing. It's amazing. This is episode 1,63 of the Edward Show. 1,063 days in a row doing this podcast. Can you imagine doing having conversations like this every day or doing solo episodes every day for 1,063 days in a row? And I thank all of you. I thank David for being part of this. I thank all of you for helping me on this journey 1,063 days in a row. If you watch this on YouTube, thank you so much for watching. If you listened on Spotify or Apple Podcasts, thank you so much for listening and I will talk to you again tomorrow.

Get daily recaps from
Edward Sturm

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.