More Cores, Less Cache — And It Still Got Faster | Cloudflare Gen 13
Chapters19
The episode opens with a recap of Cloudflare’s latest posts, highlighting post-quantum encryption, the new ability for agents to create Cloudflare accounts, buy domains, and deploy, plus coverage of the Q1 2026 Internet Disruption Report and Rust-based improvements to workers.
Cloudflare’s Gen 13 servers double compute density while trimming L3 cache, enabled by FL2 and a tighter hardware-software collaboration that cuts latency and power usage.
Summary
Cloudflare’s GK Lao and Victor Guang walk through Gen 13, a hardware-software collaboration story rather than a simple hardware refresh. The team chose high-core CPUs with smaller L3 cache to maximize throughput, enabled by a rewritten request-handler stack (FL2) that’s memory-efficient and faster than FL1. Cloudflare’s Gen 13 delivers roughly 50% more throughput and up to 2x compute density over Gen 12, with stronger per-watt performance that helps scale at Cloudflare’s global footprint. The engineers detail how memory, storage, and networking were redesigned alongside the CPU choice, including 768 GB of memory, 24 TB internal storage, PCIe Gen5, and support for two PCIe GPUs to accelerate AI workloads. They emphasize the inseparability of hardware and software roadmaps—FL2’s improvements reduced latency penalties that were previously tied to cache size, validating a joint design approach. Security features evolve with memory encryption, PCI encryption, chassis intrusion protection, and post-quantum readiness discussions. Austin lab experiments and AI-assisted tooling helped accelerate decision-making, underscoring Cloudflare’s focus on scalable, cost-efficient infrastructure at scale. The takeaway is clear: the Gen 13 win comes from cross-functional alignment, not just faster CPUs.
Key Takeaways
- Gen 13 uses high-core CPUs with smaller L3 cache to boost core density (from 96 to 192 cores) while leveraging FL2 to reduce latency and reliance on large L3 caches.
- FL2, a Rust-based rewrite of the request handling stack, dramatically lowers latency and improves efficiency compared to FL1, enabling better use of Gen 13 hardware.
- Memory expanded to 768 GB and storage to 24 TB, with PCIe Gen5 and dual Nvidia GPU support, enabling larger AI models and faster data movement.
- Security and safety features improve with memory encryption, PCIe encryption, PCI intrusion protection, and upgraded chassis design to deter tampering.
- A joint hardware-software design approach, tested in the Austin lab and guided by AI-assisted analysis, yields higher throughput and better power efficiency, illustrating the importance of cross-team collaboration.
Who Is This For?
System architects, data center operators, and developers curious about how Cloudflare scales infrastructure with Gen 13—showing why software rewrites and hardware choices must align for real-world performance and cost efficiency.
Notable Quotes
"The more u the higher the fan speed is, the more power it consume and it's exponentially more."
—Illustrates why cooling and power considerations are non-linear and impact hardware choices.
"The best takeaway is that don't evaluate the hardware design in isolation from the software road map."
—Emphasizes the core message: hardware and software must be planned together.
"FL2 matured, we tested it on the Gen 13 hardware and we found out that the latency penalty dropped dramatically."
—Shows how software rewrites unlock performance even with high-core CPUs.
"Security is something we take very seriously at Cloudflare."
—Sets the tone for Gen 13’s security enhancements (encryption, intrusion protection).
"We doubled compute density in terms of hardware and reduced cache, while the software element pushed performance further."
—Summarizes the combined hardware-software gains of Gen 13.
Questions This Video Answers
- how does Cloudflare Gen 13 differ from Gen 12 in terms of core count and L3 cache
- what is FL1 and FL2 in Cloudflare’s server stack
- why did Cloudflare choose smaller L3 cache with more cores for Gen 13
- how does memory encryption and PCIe encryption improve server security
- what role does AI play in evaluating new server hardware at Cloudflare
Cloudflare Gen 13Gen 11/12/13 serversFL1 vs FL2Rust-based server rewritememory encryptionPCIe Gen5NVIDIA GPUssecurity (chassis intrusion, post-quantum)throughput per wattAI workloads in data centers
Full Transcript
But that that one is really non-intuitive when you think about it. Adding one more fan should technically push the fan power up. But it turns out because fan curve is nonlinear technique is it's super linear. The more u the higher the fan speed is, the more power it consume and it's exponentially more. So in order to use if we use four fans to cool the CPU, it would have pulled 50 watts, 100 watts versus if you use five fans, it turns out to be just 30 watts. It's counterintuitive because we only need to spin the fan at 10% to 20% versus at 40% or 50% of three cycle.
That's definitely interesting. Hello everyone and welcome to this week in net is the May the 1st 2026 edition and of course many places are celebrating. a holiday uh which is the case uh for me that's why I'm recording on a Thursday. Uh and this week we're going deep into something that looks like a hardware story and it is in a way but really isn't just that. We're talking about Clawler's Gen 13 servers and why this wasn't simply a new generation of machines. It only really worked because the request handling layer underneath was rewritten. before that and that conversation.
Let's do a check on the latest in the clth third blog and there's quite a bit from postquantum encryption now going generally available for IPSAC. So the postquantum encryption is really more important than ever. We had an episode only about that two weeks ago. You can check that out. This is another layer that was added. There's also something quite new this week. Agents can now actually create Cloudflare accounts, buy domains, and deploy a cool blog post you can read. So we also look in the cloth blog this week to the Q1 2026 internet disruption report.
That means shutdowns. For example, the Iran one that is now over two months of complete shutdown in the country. Just a few uh white listed equipments can access the internet. There's power outages, even attacks on infrastructure. There's a lot to dig there. Plus making Rust workers more reliable. That's also a blog post. and a rethink of bots versus humans on the web. A very cool blog from Tibo from our research team. There's also of course last week a full recap of everything launched during engines week. And there's also a this weekendet episode about that. And now without further ado, here's my conversation with GK Lao and Victor Guang from Cloudfare's network and infrastructure strategy team.
And as usual, I'm your host Ronto based in Lisbon, Portugal. Hello GK. Hello Victor. How are you doing? Doing good. Doing good. How are you? I'm good. And for those who don't know, where are you based? I'm JQ. I'm based in Austin, Texas. Victor. I'm Victor. I am based in the San Francisco Bay area. And can you give us a run through? I always like to start here of your job at Cther. And when did you start? Victor, want to start? Sure. I started in June of 2024. So, it's been almost two years for me here at Coughlair.
And your role? I am a power system engineer in the Howard team. Yeah, I have been at Cloudflare since November 2021. Um, same thing how system manager has been helping cloud fair to launch the gen 11, the gen 12 and now the gen 13. One of the things this area is also important is uh servers are quite important to make the cloud and clare is really well known by its network its global network that is always expanding in a sense. Can you give us a run through on the importance of the the actual servers gen 11, gen 12, gen 13?
Sure. Yeah, gen 13. So Cloudflare has about 330 pops worldwide or more and all these serve customer requests, right? So in order to serve these customer requests, the software stack needs to run on some hardware. Instead of renting from the likes of AWS GCP, Cloudflare as a security company wants to own its own hardware and make sure we have control and make sure that we serve our customer requests securely and that's when the hardware team comes in to design our server specifically for Cloudflare workload. Our gen 11 servers was based on AMD Milan CPUs. Uh and they were per the best of performance for cloudare workload at that time and and as time goes on we have gen 12 as the CPU refresh happened.
Gen 12 is based on AMD genua X and now uh in 2026 we have launched gen 13 to serve um to better serve Calare workload with more efficiency. Uh one of the things that uh some of the blog posts we wrote two blog posts mention is how the processor this processor uh doubled the computing power but also cut a key resource by 83%. These numbers are relevant numbers especially now that AI is is around and people really value compute more than ever I would say what are the the main efficiencies uh from uh and also computing power from Gen 13 that we could highlight.
Yeah. Um when we looked at Gen 13 we we have a couple options available in the market. Out of the CPU options available to us, the one that offer highest core density come with significant smaller L3 cache compared to what we have in Gen 12. For everyone, L3 cache is sort of you can think of it as a fast memory that the processor have access to. For context, um the we're looking at AMD to CPU for our gen 13 servers as well as the Intel variants, Intel um Emer Rapids. The one that offered the highest core increase going from 96 core to 192 core come with significantly smaller L3 cache from 12 megabyte to 2 megabyte.
Other options at 4 megaby but with lower core count. What happened is that our software the request handling layer software FL1 specifically that's a that's our major workload is sensitive to the size of this fast memory the L3 cache because it's a 83 center it's a one6 of it right. So uh it has significant impact um but it turns out to be as we as we design the hardware and and work with performance and software team it's not just the hardware it also is about the software architecture and today we'll talk a little bit more about how the hardware and software code design have an impact overall on the 13.
One of the things that I think a general audience will value is the this perspective of of course we are present we have data centers in over 330 cities around the world but in what way the CPUs are a important part of that in what way the CPUs are relevant and these new generations allow to have other capabilities that weren't around before what can we explain there so the new CPU actually it's gives us a much more throughput and is 50% uh more performance compared to our last generation in gen 12 and that serves our workload worldwide.
So that is the main the heart of this the server that processes all the requests that come in through our networks of course and if that's more it has more computing power and is more efficient it means that we can have more requests we can have more not only more customers but we know that the internet usually it's growing in terms of usage it uh allows us to have that capability of more usage by user by companies but also with AI there's more bot traffic, more automated traffic around those capabilities are quite important in this situation.
Right. Exactly. Yeah. As as technology advances, the CPU vendors will will keep pace with it. So when we refresh the hardware Rosa make sure that uh make sure that our hardware road map follows the industry and is capable of doing the latest and greatest things in terms of whether it's raw CPU power to process more requests or just technology update in terms of like encryption algorithms different type of encryption and making sure that all the buses are secure making sure that we have the updated memory speed upgrade of memory speed to support the larger amount of processing power that the that the GPU has and things like that.
One thing that is mentioned in the blog posts uh the two blog posts that came out that I suggest anyone to see but there's diagrams there's a lot of details there for those to want to understand is this part of in terms of trade-offs between more cores and less cash and what does that actually mean and why is that a a hard question to know or decision in this situation. What can we explain for those even that are not hardware engineers about this trade-off about more cores, less cache spec perspective? Sure, I can talk a little bit about that.
That's a very good question. Just I mention a little bit about how Gen 13 versus Gen 12 CPU options have smaller cache or faster memory. Think of it as like a scratch pad. If you have a large L3 cache, you have a large scratch pad. If you have a small L3 cache, you have a small CR scratch pad. If you're doing work, uh, in thinking like CPU as a worker doing work, if it needs to go retrieve information, if it has a big scratch pad of like a lot of notes written on it, it's very fast for him to go find information and come back to continue to work versus if you think of it like a memory, the DRAM modules on the server.
Think of it as like a bookshelf with books. You have to go find the book, find information, come back, do the work. So that's sort of like the analogy that that we can think about. So for for CPU they are operating at much faster pace than a human does. So cache like a scratchpad access is 50 nancond. So that's a billionth of a second. Memory access is seven times of that. It's 350 ncond. So every time if the worker the workload needs to go fetch information if it can fetch from cache it's much faster. If it needs to go out to memory it's a seven times delay for each memory retrieval.
And each workload can have thousands millions of memory accesses. If any one of them is delay, it impacts overall experience. Right? So when we compare our when we're designing Gen 13 and comparing workloads, when we look at FL1, our core request handling layer prior to 2025 is that it's based it's written based on enginex and logic. It has a heavy reliance on a big scratch pad. When we put gen 13 CPUs or when we put the FL1 workload on gen 13 CPUs what we notice is it has significantly higher cache miss and what it it means to user is a higher latency where you load a web page it may take slower when you go to any links true cloud it may take slightly longer time through our research Victor can share more when we look at three different CPUs what do we see from comparing core counts and latency so um the safer path of course would be to go with the 128 core Turin CPU because that one actually has a higher cache for the CPU.
It actually provides 4 megabytes of cache per core compared to the 192 core only provides us with 2 megabytes. But uh overall for us when we were evaluating our workload when we picked the 192 core for the gen 13 the decision came from that the fact that our FL2 which is a new rustbased rewrite of the software stack that JQ mentioned earlier the FL1 was already in progress. So as FL2 matured, we tested it on the Gen 13 hardware and we we found out that the latency penalty dropped dramatically and it was it improved very very well.
So this gave the team the confidence we need that we are less reliant on the L3 cache compared to the FL1 workload. One of the things that one of the blog posts we wrote mentions that I think is quite interesting is the Gen 13 business impact. For example, up to two times throughout versus Gen 12, 50% better performance in terms of what and also six 60% higher rack throughout versus Gen 12. Those will be impactful for businesses that depend on Clawther of course for the Clother business and for general users as well, right? Because we mentioned latency and how users are impacted by that because things are just faster and there's more compute power as well.
There's a balance between latency, compute power, what we want specifically, right? That balance is quite important, right? Yeah, the balance is very important because designing a hardware or designing a lot of things in general is is all about trade-off. uh you cannot uh there there typically multiple axes of things that you can improve on. So for server case or for server hardware case it would be can I can I in this new hardware serve a lot more requests can I increase requests a lot more typically that means that you need to burn a lot more power or you may have to compromise on latency that tolerate a little bit latency to serve more requests you typically don't get all three or all your axis perfect you may have to compromise on some or or evaluate the trade-off for our case fortunately with soft Um for with hardware we get more requests and more power a little bit more power to serve more requests.
So we improve on both so efficiency and and throughput uh but with FL1 we compromise on latency. Fortunately we have a FL2 rewrite that is already undergoing for uh efficiency region for modularity reason as well as for um for software team to continue to offer new new uh product and new services to customer. that rewrite helped um improve on the axes that hardware improvement cannot help. So latency it helped drop the latency. So combining both the hardware and software code design actually help us to achieve improvement in all three axises. So thinking about the trade-off and working with cross functional team to help provide support in areas that just your own organization or your own design could not help improve.
Maybe two different workstream co-working together could potentially cover the shortcoming of other workstream and at the end you get a win-win. We've been mentioning FL2 and FL1 in terms of software and that's as you mentioned it's a version of software that handles requests of course in these data center in these servers and this new version correct me if I'm wrong it's rebuilt to be faster safer and also less dependent on the huge CPU cache right am I correct yes that's correct um yeah I think with uh as cloud first scale as cloud evolved as a company uh fell one serve as well uh uh for 10 years uh on Calfare from from a startup all the way through a uh company that does billion dollars of revenue on an annual basis.
As Coffer look for the next 10 years, as we need to continue to scale offer the best-in-class services for our customer and be able to serve our customer efficiently, the software team has put together effort to rearchitect our request handing layer to be able to scale even further to be able to serve our customer securely and that comes with RAS rewrite switching the language from logit and ngx to a rasb server that itself help improve sort of the security. It's a it's a memory type safe language. But not just the language change that is everything, right?
It it also means rearchitecting, thinking about how we design the software pieces to be able to scale efficiently and serve our customer. And because that thorough thought process due to due diligence, it made the software more efficient after a rewrite and be able to work utilize hardware resources more efficiently. One thing that I find always interesting in this part is the marriage between hardware and software on servers and the importance of them both being in a sense one how how the the road maps both for hardware and software evolved in different paces in terms of how our teams work to to try to make the best of those two put together in a sense.
Sure. Of course. So going forward the lesson is that we know that we cannot design hardware or we cannot choose have a hardware selection that is in isolation of the software road maps and vice versa. The same thing for software when they develop it they have to keep in mind how the hardware is changing. So when the team evaluates a new hardware against software, they have to evaluate both against the current software stack and anything that is planned on future releases so that uh there's there's no um the collaboration between the two teams can surface any constraints or any issues that we'll have to solve together.
Um, one of the things also important in this type of uh thing and it's mentioned in the blogs, it's the the components. The CPU gets most of the attention, but the team had to redesign almost every other component uh there. That means memory, storage, networking, power, that means a lot of things in in what way that redesign was uh important but also we learn lessons from it. Sure. Memory wise, we actually doubled in our total capacity. We went from 384 gigabytes to 768 with 12 memory channels populated for maximum bandwidth, we actually were able to increase um at its peak a 33% increase over our previous generation.
While we were still m able to maintain a 4 GB per core ratio and since FL2 uses memory a lot more efficiently, we were doubling the capacity provided a lot of headroom for growth for future workload growth. Storage wise, we expanded uh the the store internal storage from uh 16 terbte to 24 terbyte by adding a third drive, upgraded from a PCI gen 4 to PCI gen 5, which provided lower latency and better bandwidth with additional storage. We provided supports growth in our many of our workloads including like the CDN cache, durable objects, containers, etc.
And then for a note those have been growing a lot with AI durable object even agents week we had agents week now people are you not only internally in products that are from Calfare using that but also externally which makes sense that's correct and we also added the capability of a front drive bay. So the Gen 13 chassis actually supports up to 10 uh U.2 PCI Gen 5 MVME drives in the front. That means is that we can use the same chassis to support both compute workload as well as storage workloads. Even if we need to in the future, if we need to do a field upgrade from a compute node, we can just use the same chassis by populating the drives in the front and it can very easily expand it storage and become storage server.
Yeah, on the networking side, as you could imagine, um, as the CPU now process more requests, it does mean that there's more, uh, input, more network, uh, traffic coming into the server. At the same time, as the requests get processed, there's more network traffic going out of server. Um, so we have looked at our Nate uh, we we on Gen 12, we have 2x 25 gig uh, networking cart on the server. um as a and when we look at production metrics we are seeing uh even on FL1 it's already running at 50% utilization at P95 P95 is 90 95 percentile and so as we imagine if FL2 become more efficient that utilization will go up which should be good on gen 12 but as we think about gen 13 gen 13 is going to be up to two times of gen 12 performance right so what that means is the 25 gig port will be saturated if we continue to stick on 25 gig so I've looked at the industry and look at is 50 gig the the size we upgrade to or do we need to upgrade to even higher for future compatibility so that we don't have to re-evaluate every every year or every generation.
When we survey the market 50 gig is actually not a common industry standard. A lot of people a lot of companies in in the world or today shipping today have jumped straight to 100 gig ports. So we have looked at 2x00 gig option 2x 2000 gig option and and beyond. This is coming back to cost versus throughput trade-off. uh you can get something much bigger but it comes with significant cost. The sweet spot turns out to be 2 by 100 gig because we have changed a port on the nick the upstream networking gears the tours the routers have to be upgraded as well.
So uh we have worked with network hardware team as well as our network team in cloudare to make sure uh the entire stack is upgraded to support gen 13 as well as the generation going uh new generations going forward in the networking sorry and that's networking right that's the way the the server does networking correct yeah that is networking um that any request coming in going out that traverse the network coming to the server that's networking in addition to sort of networking. So also um we also added additional PCIe card support in Gen 12. We have uh support for one PCIe card.
So that's uh what we used to enable installing Nvidia GPUs to support our worker AI um project. In Gen 13, we expanded that support to two PCIe card. So now it can support two high powered Nvidia GPUs in order to serve larger model serve models with lower latency. not just for GPU for work, it also enable flexibility for uh supporting future accelerators that we don't know of today or for us to consider other options say time card uh say uh DPUs um or or smart nakes to accelerate um our software stack if we were to encounter some of those product or look into them further in the future.
I'm I'm showing here in the image uh image from the blog which is the gen 13 server specifically and storage memory CPU memory there there's a lot of the things that we've been talking it's all here any any guidance in terms of the difference uh visually that from gen 12 to this one any changes that we can spot here yeah I think the big one would be the one that Victor mentioned on the front drive Victor do you want to talk a little bit more about that yeah I the thing I just mentioned is the from the picture you can tell the the difference in the chassis is we had the the 10 front drive uh drive base that supports uh storage and uh for storage uh workloads as well as uh that helps us reduce the number of SKUs that we have to um serve across uh the global supply chain right we no longer have to buy a storage server and then another one for compute server now it's merged into one you can just buy the same chassis populate what you need for your workload and then deploy it that way.
The other thing you can probably see in the picture is we added an extra fan um to help cool the 500 watt CPU. Um I think in Gen 12 it was uh just four fans across uh with the increased um thermal design power on the CPU. Uh we added an extra fan in there to increase the power efficiency uh that's for the for the server. Yeah, that that one is really uh is it's super linear. The more um the versus at 40% to 50% of three cycle. That's definitely interesting. That's definitely interesting. This is the blog we were mentioning.
There's the other one also launching Clar's gen 13 servers trading cash for C cores. Uh one one question that I think also relevant in these type of of hardware issues is security. Security is often invisible to end users, but Cler is protecting huge chunk of the internet. So what's new in Gen 13 security story and how does the physical design of the server play into that? We we thought about that, right? Yeah, that's a very good question. That's like you say, Jo, that's something that we take very seriously. Security is something we take very seriously at Cloudflare.
So when we design the server, we also look at how we improve on security posture at the hardware component level. um AMD CPU the gen 13 options the toin CPUs offer more protection. So in the past since gener 10 platform that's based on AMD Rome that was introduced in 2020. We already have memory encryption fleetwide today Cloudflare has memory encryption. Um in Gen 13 AMD added PCI encryption to their CPU. For those that are not expert, memory encryption means that the memory that we have it's fully encrypted. That means more security of course. Correct. Yeah.
Data in trans. So think of it of data in transit and data at rest. Data at rest are the data that stores in the SSD. So things that you don't access actively as you work on workload but you do want to store it securely. So those are data at rest with encryption SSD encryption or storage encryption. Data at rest has many components. Rest sorry data in transit has many components things data that transit through the network those are encrypted by network technology when it land on a server going from the nick through the CPU those are still technically encrypted because those packets just pass through once it's getting processed on CPU CPU would need to look at the actual data so decrypt it work on it as it work on it it may need to move some data from its cache to memory store temporarily and then later fetch it again to process portion is taken care by a memory encryption and make sure that data is secure.
No one no thread actor can intercept in the middle and go figure out what is going between the CPU and memory. Uh what is happening in Gen 13 and that parameter extended think of the days when we have GPUs DPUs uh the traffic going through them from the CPU to the through the PCI bus to the GPU in the past is not encrypted. They are just plain data unless the CPU unless a kernel have enabled support to enable to encrypt every packet. But in the past there's no hardware support. Today AMD has enabled it. Every data that is going onto the PCI bus whether it's MVME whether it's the GPUs where it's DPU regardless if other supported coming out of the tour CPU it's encrypted.
So now the entire internal bus systems are encrypted. All data flowing within the server is encrypted and there's intrusion detection as well, right? Yeah. What is that? What does that do? That's a very good question. Because Cloudflare has pops worldwide, some in very remote location. These pops may have different security posture. Some has very high standard. Some has a security that is good enough. But inherently that's just the data center security posture. You have threat actors especially nason state that can break into any high security facility. So we cannot trust even the workers the attack that work at that location.
We want to know when any of our server get open up. If it is a schedule maintenance we can compare it to to a maintenance schedule and know this server is supposed to be open up when we get the signal. Um but if it is not a schedule maintain we want to get that signal. So that is why we introduced uh chassis intrusion in our gen 12 platform and in Gen 13 improved the posture further. In Gen 12 when we did pentest pentest is a penetration testing where a sort of like a white glove hacker sort of hacked the system in order to identify what can be improved.
One of the thing that was identified is that the intrusion switch is too close to a flat edge of the server where someone can slide a credit card in to keep the the intrusion switch pressed down and then just open the server without triggering the the intrusion switch. So we now move it to a area where it's a the cover is curved so there's no way for this slide credit card to to block it and we have it on both sides so you cannot just open one side versus the other and things like that. So those are discovered as we do penetration testing as we look through the mechanical design to to identify how we can secure the server further.
That's definitely interesting. One we we already mentioned before actually that one of the blogs mentions that there's 50% better performance per watt. How important it is for a company running infrastructure at clair scale of that number? uh what is the the the result of that in terms of of improvements? So pick you want to take that you're talking about in the software perspective. Exact exactly in in the sense of how important it is for a company uh of uh running infrastructure at scale of having that savings in a sense. Yeah, I guess I can briefly talk about it.
Um so uh when we design hardware um the reason CER want to bring on hardware design not only because we can control how the hardware design is but also to make sure we have better control on the cost structure have more visibility into where we spend our money. And as we think about cost structure for a server there's two components. There is a capital expenditure where the cost of the server itself and there's the operating expenditure is operating the server. And this include the data center space and power. And when we talk about power, if the server consume say 300 watts to serve 3,000 requests.
If you can do 3,000 requests in half the power, say 150 watt in a new generation, that's a win. That would save a cloud for money long term. So that's sort of the area where uh we think about so when we say 50% per watt improvement. What that translate to in the real real world is that cloud for would spend less money operating the servers to serve the same amount of requests. So if cloud continues to scale up we need to serve more and more requests but we we make sure that the operating expenditure don't go up linearly with the number of requests we serve can um sublinearly increase or and with AI that will be that is relevant because people are definitely doing much more with those tools in terms of requests in terms of websites so that's really interesting zooming all the way out what's the single biggest lesson lesson from the Gen 13 program and what would you want someone outside Cler to take away from this story specifically?
I suppose the the biggest takeaway from designing this generation hardware is that don't evaluate the hardware design in isolation from the software road map because when you to unlock the best performance is usually where the two intersects and that's when the two teams are collaborating that's when you get the best results. Makes sense. Yeah. neither the hardware or the software alone would have uh produced a result. It's a uh collaboration. Um and in order to would uh to to get to a win-win situation where you can win on all axises that you're trying to improve typically takes uh more than one team or one area of focus to work on.
And in this case the results are quite important right the way that we double compute density in terms of hardware reduce cache and also the software element you can also see that without the software part the hardware wouldn't be as performative and relevant here in this situation which is interesting exactly it's not just uh metrics that we're tracking that improve this has like you mentioned just now real world implation where cuffer now need to spend less money to do more and so that's that's the big part for Culver is that um as as the new hardware generation come out uh efficiency improve we can serve customer more effectively at at a much lower cost basis compared to the previous generation.
One of the things that I I also find interesting is while building this of course teams we have remote teams in many parts of the world but we also have a lab in Austin right how does that lab was relevant in this case? Yeah, we have a lab in Austin where we put evaluation servers into it. So when we first got some servers from our vendors, we can take many samples and put it into Austin lab and bring up for initial evaluation and run some initial benchmark. So lab in that case is very useful for us because we don't have to put a untested server immediately into production.
We get a sort of a staging environment where we can do initial testing, initial benchmarking as well as just look at things that can be improved before we put into production for final confirmation of the performance and to make the final decision. Anything you want to add there, Victor? Yeah, so the lab is super useful in terms of when we want to test the new hardware, we want to try things out. We don't even know if that is something that we want. It is a safe space for us to play around with with the hardware and and it allows us to brainstorm and come up with designs that wasn't probably originally we even thought of.
It's something that that ideas when you are able to play with hardware is in the lab. Interesting. Interesting. I need to go there one of these days. Now I'm curious more curious. One thing before we go I think maybe is relevant in terms of the planning of this of design of this. execution of this how relevant it was in terms of leveraging AI and LLMs people are using more tools to build stuff so they're building more in what way that was important and do we have like numbers in terms of growth that we could share potentially for AI work uh when we designed this this uh hardware we did use AI a little bit to help because we run a lot of experiment to gather the result of those experiment to know what knobs we can turn to improve performance One thing that AI has been effective because this is uh 2025 this is considered quote unquote the baby age of AI at that time it already helped improve analyzing data trying to summarize the data effectively for us to know where to focus our attention on so it is definitely a speed up of say I think 30% to 50% of the time today with significantly more capability better models available to us uh it can definitely do a lot more in some cases it's a speed up to reduce their work by 80% right you can do a lot more today because it it has all the memory of what what you have told it to do learn learn along the way and then because it now can see a lot it can consume a lot more data and be able to make sense of it properly and have a lot more accesses to different tools within Cloudflare to be able to understand what the data represent it is definitely a big time saving Victor do you have anything to share in terms of using AI for hardware evaluation yeah And AI definitely like you mentioned AI allows us to u start building tools that you used to we used to have to run very very manually.
You have to run one step after the other. Now with AI we're able to develop tools for a lot of automations. A lot less manual work. It definitely speeds things up and and help us look through in a lot more details than we we were able to before. Makes sense. Makes sense. And even there's a a blog post that is not related to hardware in particular, but Cloudflare's network recently passed a major milestone. We crossed 500 terabits per second of external capacity and I guess that the servers there are also playing a role in these metrics uh because things there have been increasing a lot as well and even DOS attacks also being really big these days.
Yeah, I think as the internet grow and its capability grow, you have a more powerful server is sort of a double-edged sword, right? Either you can process more, but at the same time, the attacker can also do a lot more using it. DOS attack can be amplified much more effectively as well. But fortunately, Cloudflare always plan with that in mind. Um, amazing that we have achieved 500 terabits. I'm pretty sure the next 500 terabit will be much shorter time compared to where where we started to get to 500 terabit today with much more capable server.
We definitely always plan in mind to make sure we have enough capacity to serve customer growth globally as well as to absorb any DOS attack that is targeting us. We are very confident that we have enough throughput and capacity worldwide to be able to withstand anything and be able to serve our customer effectively as we grow. thinking maybe just ending on the projecting the future what in this area may that be hardware or software is coming what are the next steps even for your your teams here what can we say yeah on the hardware front there's definitely a lot of growth even recently as we looked at um gen 11 gen 12 gen 13 gen 11 we only have two CPU option one from Intel one from AMD as we move to gen 12 there is three options from AMD and then two to three options from Intel and then there's ARM variants coming up.
Similarly on Gen 13 there proliferation of options because everyone there is so many points of improvement that can be made. So companies or vendors are coming up with ways to improve on all the axis. So as we move to Gen 14, there will be even more options for us to evaluate and pick the one that's best for Cloudflare. So for us it's very exciting time for us to also know what changes in CPU actually benefit us the most so that we can sort of spec out what a ideal cloudare hardware would look like. software front of things very exciting changes as well as as think about growing to the next level a lot of architecture or thought up uh thought of how do we scale further and there is discussion about rewriting stuff that that is bottlenecking us and things that so with AI world as well cloud is doing a lot of work to make sure that we support the growth in agents and MCP and make sure we enable build the tools to allow developers to build on cloudflare and make the internet better as a overall.
Yeah. And just to add another thing what Jake mentioned on the hardware side, we're no longer just thinking about the server level design. We're actually starting to look at the rack scale design. We want to deploy it in racks. And so the the selection logic here is is going to be driven by the performance per watt, the throughputs per rack and supply chain reliability etc. So that allows us to scale a lot bigger for the future growth. Yeah, software modularity will play into that and make sure that when we think about rack scale, the software can be selected to run a specific rack specific hardware types and make sure that they can we can easily move that around and scale effectively.
There has been some discussion in terms of f first you mentioned something that is relevant which is there's apparently more interest in terms of operators that we can count on for hardware which is good and we can see that the compute is really important these days so there's more players around that's important but there's also some concerns in terms of availability and some of components any concern for the future in any of those there's mention of memory for example any concern Yeah, today in 2026 there's definitely a um supply shortage or um a lot of consumption as company build up their their AI infrastructure for CER what we typically look at is we know this is going to happen we have seen it at co this is uh today with AI infrastructure buildup when we design hardware and we qualify how we also make sure that we have at least two vendors for each of the components or each of the design that we have so that that makes sure that helps in ensure supply continuity.
But beyond the two vendors as the need comes around we can qualify more. We work very closely with our supply team, our logistic team to make sure that we have a pile of if we need server in six months, how are we at sourcing those components? We have direct relationship with the memory vendor say Samsung, Micron, Skhinx and the likes, SSD vendor, nick vendor, PSU vendor to make sure that we have have visibility into securing the supply for our capacity. So it's a lot of planning uh and a lot of collaboration with our partner teams as well as partner vendors to make sure that we continue to be able to fulfill our capacity demand to planning with time is really important and it's on the basis of the team's work and co is definitely an example of that in many aspects which is interesting as well right yeah co uh co uh AI infrastructure build up what what we learned from it is that we need to plan further Further we have better relationship with vendor um and not just signaling what we're building now but also signaling what we're thinking for the future.
We would like to see technology improvement in this area that area or signal to them 2026 this is expected capacity 2027 is expected capacity help us plan better on your end as well. we can work collaborate together to secure and supply as much as possible or design new components that help improve efficiency not only at Cloudflare but to all their customers as well. And not only that is great we also have to be very involved in where the industry is going. We have to understand industry trends. We have to know where what's the next big thing.
What is the thing that people are talking about? What what today is AI that's the buzz word. So what is it that is going to drive the industry is going to help us in decide what we're building and what to expect in the coming years. Exactly. Yeah, in our case we have different products with that and with that growing it means workers AI a bunch of products that were already mentioned here durable objects sandboxes those will need specific specific use cases if people are using th those more you need to adapt to that right correct building something that is a little bit modular like the gen 13 where we can add in front drive pay if we need to add in a GP when we need to is is critical another aspect that we think about also I think recently ly with Google announcement that quantum day is sooner.
We have also uh in our gen 14 road map also not included to make sure that we are quantum um compute secure or quantum ready on the management layer in addition to data data data plane layer where cloud has committed since 2022 or earlier that everything needs to be pe uh postquantum cryptographic um ready. So we are also looking at the hardware layer to make sure that the the hardware have a native uh postquantum support as well as the firmware uh is postquantum secure to make sure um any tread actor in the future cannot break into it with quantum compute capability.
That's quite important. I did a full episode last week with the past Westerbound from our team about that and and he's a researcher and usually he's not the concerned type of person. Now he is because quantum computers are definitely with a a due date in a sense for 2029 earlier than expected. So the hardware there is quite important. It's no longer fictional. It's becoming real really quick. Exactly. This was great. Thank you Victor. Thank you GK. Thank you very much Joe. It was a nice conversation. And that's a wrap. It's done. Thank you.
More from Cloudflare
Get daily recaps from
Cloudflare
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



