How AI Search Engines Work | 1.1. AEO Course by Ahrefs

Ahrefs| 00:08:33|Apr 29, 2026
Chapters7
AI search uses both training data and real-time retrieval to answer questions, with training data providing static knowledge and retrieval supplying fresh information.

AI search blends training data with real-time retrieval, expands queries into many subqueries, and ranks content probabilistically based on consensus, freshness, and authority.

Summary

Samo breaks down how AI search engines actually work, emphasizing that understanding the underlying mechanics is crucial for meaningful optimization. He explains two information sources: training data, which powers static knowledge, and real-time retrieval, which fetches fresh pages via APIs when needed. The course then introduces retrieval-augmented generation (RAG) and the concept of fan-out, where a single query is expanded into multiple long-tail subqueries that the AI searches in parallel. Samo highlights that AI responses are probabilistic rather than fixed, so consistency depends on patterns, consensus, freshness, and authority. He notes that content can influence AI by being widely mentioned (training data) or readily retrievable (SEO for real-time queries). A key shift is from one-to-one to one-to-many queries, making topic and niche coverage essential. The video also covers how to observe fan-out queries through tools like the AI responses report in the HFS Brand Radar, while warning that fan-out queries are synthetic and not a reliable keyword list. Finally, Samo teases a comparison of AI platforms in the next lesson, promising surprising differences in how AI overviews cite sources across ChatGPT, Perplexity, and Google’s AI mode.

Key Takeaways

  • AI search uses two information streams: training data and real-time web retrieval, combining learned patterns with fresh pages.
  • Retrieval-augmented generation (RAG) kicks in when training data is outdated or the query is too specific, pulling pages via APIs for synthesis.
  • One query becomes many in AI search due to query fan-out, which expands the prompt into dozens of long-tail subqueries simultaneously.
  • Fan-out queries in practice show 9–11 subqueries on average, with some prompts triggering up to 28, revealing topics the AI considers important.
  • Content ranks in AI citations through consensus, freshness, and authority; 76% of AI overview citations come from pages already ranking in Google's top 10.
  • SEO remains foundational for AI-enabled search (AEO), but 14% of cited pages in AI overviews don’t rank in Google’s top 100, creating opportunity for non-dominant players.
  • AI visibility is probabilistic rather than a fixed ranking; same question asked multiple times can yield different sources and weights depending on temperature and signals.

Who Is This For?

This is essential viewing for SEOs and content strategists who want to optimize for AI-driven search beyond traditional Google rankings, including brands aiming to appear in AI-generated answers and understanding how to influence AI visibility.

Notable Quotes

"AI outputs are built on probabilities on top of probabilities on top of probabilities."
Samu emphasizes the probabilistic nature of AI responses and why rankings aren’t fixed.
"The AI doesn’t just search for the exact thing you typed in. It fans out into dozens of smaller longtail subqueries."
Highlights the fan-out mechanism that expands a query into many subqueries.
"There are two sources of information for AI search engines: training data and real-time retrieval."
Key foundational distinction driving how AI generates answers.
"76% of AI overview citations come from pages already in the top 10 of Google."
Shows the strong link between traditional authority signals and AI citations.
"Despina from HFS wrote that fan-out queries aren’t traditional long-tail keywords; they’re synthetic and inconsistent."
Cautions about treating fan-out lists as a fixed keyword target.

Questions This Video Answers

  • How does retrieval augmented generation (RAG) influence AI search results for brands?
  • What is query fan-out and why does one query expand into many subqueries in AI search?
  • Why is AI visibility probabilistic and how does that affect SEO strategies?
  • What percentage of AI citations come from top Google results and why does that matter for SEO?
  • How can brands optimize for AI-based overviews if they don’t rank in Google’s top 100?
AI SearchTraining dataReal-time retrievalRAG (Retrieval Augmented Generation)Query fan-outAI citationsConsensusFreshnessAuthorityHFS Brand Radar
Full Transcript
Hey, it's Samo and welcome to the first module which is on how AI search actually works. In this lesson, I'm going to walk you through the three things you need to understand about how AI search engines find, evaluate, and site content. Because here's the thing, if you don't understand how AI search works under the hood, every optimization tip you hear is just going to feel like a random list of tactics. And I don't want that for you. I want you to understand the why behind every strategy we cover in this course. Let's get started. So where does AI actually get its information. This is probably the most important distinction to understand. AI search engines have two sources of information and they work very differently. The first is training data. This is the massive collection of text that the AI was originally trained on. So books, websites, PDFs, social media, YouTube transcripts, basically a snapshot of the internet. So when you ask ChatGpt, "Who is the CEO of Apple?" and it instantly says Tim Cook without searching anything, that's coming from training data. It already learned that pattern. But here's the problem with training data. It's static. It gets updated maybe every 6 months or so. So if you launched your product last week, the AI doesn't know about it yet. not from training data at least. And that's where the second source comes in, real time retrieval. This is where rag or retrieval augmented generation comes into play. It sounds complicated, but Patrick Stock said it best. Uh then you've got the the retrieved pages, which is like a secondary process. So you've got the the trained LLM data and then you've got the data where it goes out and fetches a bunch of relevant pages and those come with other probabilities. For example, when ChatGpt or Google's AI mode needs fresh information or when the question is too specific for training data alone, it goes out, it searches the web using APIs, it pulls back a bunch of pages, reads through them, and then generates a response based on what it found. Now, why does this matter for you? Because it means there are two ways to influence what AI says about your brand. The first is to be mentioned so widely across the web that you're baked into the training data itself. And second is to make sure your content shows up when the AI searches the web in real time. And guess what? We already know how to do that. That's SEO. The skills you already have from traditional SEO, ranking in Google, earning backlinks, creating quality content, those directly influence whether AI picks up your pages during real-time retrieval. Now, here's where it gets interesting. The AI doesn't just search for the exact thing you typed in. Let me explain. Search engines used to work one to one. One query, one set of results. Then they evolved to many to one where different queries like Sydney plumber and plumbing service in Sydney could return the same results. But AI search has flipped the model to one to many. One search gets expanded into many. And this technique is called query fo. For example, when someone enters a prompt like, "Plan me a 5-day trip to Japan in November." The AI fans it out into dozens of smaller longtail subqueries. Things like, "Best neighborhoods to stay in Tokyo, November weather in Kyoto, Japan Rail Pass worth it?" All running simultaneously behind the scenes. It then pulls information from multiple sources across the web and combines it into one complete answer. In fact, research from Seir Interactive in Nective found that the average prompt triggers 9 to 11 fan out queries with some going as high as 28. And ChatGpt's deep research mode ran 420 searches for a single query about buying a red phone case. So, if your content ranks for those niche specific queries, your brand has a much better chance of being included in the AI's final response. And this is a huge shift from traditional SEO where you could optimize one page for one target keyword and call it a day. In AI search, you need to be relevant across an entire topic and I could even argue across an entire niche. Because if your page about how to start a podcast only covers the basics but doesn't mention equipment, hosting, or promotion, the AI is going to find someone else's page that does. Now you might be wondering, can I see these fanout queries? You sure can. In the AI responses report in HFS brand radar, you can see the fan out queries for chat GPT and perplexity props. But there's an important caveat. Despina from HFS wrote in her guide on query fan out that these aren't like traditional longtail queries. They're synthetic, generated by AI in the moment. They're inconsistent. The same prompt can trigger different fanouts every time. and over 95% of them have zero search volume because real humans would never type them. So don't think of fano queries as a new keyword list to optimize for. Think of them as a window into what topics the AI considers important for a given question. We'll get into exactly how to use this strategically in module 2 because for now we need to talk about how AI decides who to actually site. In traditional search rankings are relatively stable. Like if you're number three for a keyword today, you're probably going to be somewhere around there tomorrow. But AI citations are probabilistic. Patrick Stocks explained this really well. He said, "AI outputs are built on probabilities on top of probabilities on top of probabilities. The training data creates patterns. The retrieve pages add their own signals and then there's a temperature setting that introduces randomness so the AI doesn't generate the exact same answer every time." Now, what that means in practice is that if you ask the same question five times, you might get cited three out of five. Or the AI might mention your competitor twice and you twice and someone else once. There's no fixed position to rank for. This is why we talk about AI visibility rather than AI rankings. It's more like a probability distribution than a leaderboard. Now, that said, there are patterns in what gets cited more often based on the data we've studied at HRES. Consensus matters. If multiple sources on the web say the same thing about your brand, AI is more likely to repeat it. And the more places your brand is mentioned in a consistent way, the higher the probability the AI picks it up. Freshness matters. AI cited content tends to be about 25% fresher than what you'd see in a traditional SER. The AI is actively looking for recent information, especially for topics that change. Authority still matters. Pages that rank well in traditional search have a major head start. Our data shows that 76% of AI overview citations come from pages already in the top 10 of Google. So, I'll hit that gong one more time. SEO is the foundation of AEO, but it's not only about Google rankings. 14% of pages cited in AI overviews don't rank in Google's top 100 at all. And for platforms like Chad GBT, the overlap with Google's results is even lower. So there's real opportunity for brands that aren't dominant in traditional search to still show up in AI. So let's tie all of this together. AI search engines pull from training data and real-time web search. They expand your one query into dozens of subqueries through fan out. They merge and score results from all of those searches. and then they generate a response that's probabilistic, not fixed, based on patterns, consensus, freshness, and authority. Understanding this process is what makes every other lesson in this course make sense. So, when I tell you to earn more brand mentions, it's because of how consensus drives probability. When I talk about topic coverage, it's because of query fan out. When I say that SEO is the foundation again, it's because 76% of citations come from pages already ranking well. Now that you understand the mechanics, the obvious next question is, do all AI platforms work the same way? And the answer is not even close. Not even close. In the next lesson, we're going to compare AI overviews, chatbt, perplexity, and Google's AI mode side by side. And the data on how different they are is pretty surprising. I'll see you in the next lesson.

Get daily recaps from
Ahrefs

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.