The library that will save your scrapers.

John Watson Rooney| 00:09:14|Feb 18, 2026
Chapters9
Explains why retries are useful for flaky HTTP requests and previews two approaches to implement them.

If your scraper trips on one bad HTTP request, learn two practical retry patterns in Python using decorators and the stamina package.

Summary

John Watson Rooney walks you through resilient scraping techniques that prevent a single failed HTTP request from tanking your whole run. He starts with a handcrafted retry decorator, showing how to loop a function call, catch exceptions from HTTPX or requests, and pause between attempts. He then elevates the approach by wrapping the decorator in a class to add max attempts, delay, and a fail-safe return of None to signal a retriable URL. Rooney also demonstrates logging strategies with strrux and explains how failed URLs can be queued or logged for later retry. Moving to a more advanced path, he introduces stamina (built on tenacity) for async contexts, using a context manager to retry only the HTTP requests. He even experiments with a custom status-code error to make failures explicit without crashing the program. By the end, you’ll see how to collect successful data while cleanly handling or storing failed URLs for later processing. If you’re building scrapers that need to survive flaky networks, Rooney’s dual approach (decorator-based and stamina-based retries) gives you concrete patterns you can adopt today.

Key Takeaways

  • A handcrafted Python decorator can implement a three-try retry loop with a sleep delay between attempts for robust HTTP requests.
  • Wrapping the decorator in a class lets you enforce a max_attempts (e.g., three) and prevent endless retries that exacerbate failures.
  • Returning None after exhausting retries enables the caller to log, queue, or skip failed URLs instead of crashing the program.
  • Using strrux for logging provides a lightweight, simple logging solution without configuring a full logger setup.
  • Stamina, built on tenacity, offers an async-friendly retry context manager ideal for async HTTP clients like ARNET with a maximum attempt limit.
  • Decoupling retries from business logic helps you handle failures by queuing URLs or applying different proxies/fingerprints on retries.
  • Creating a custom status-code exception can make failure handling clearer than using bare exceptions.

Who Is This For?

Essential viewing for Python developers building robust scrapers who need reliable retry logic—especially those using asynchronous HTTP clients like ARNET or HTTPX.

Notable Quotes

"If you've ever had your scraper fail because of one failed HTTP request, you'll know how much of a pain it can be."
Sets up the problem Rooney aims to solve with retries.
"This simple decorator is just going to go through three iterations. It's going to sleep after each one and yeah, it's going to say failed and then yeah, nothing else is going to happen."
Demonstrates a basic three-attempt retry loop.
"We're going to have a max number of attempts. Now, this is very important because if you don't have this, your code will retry over and over and over again."
Emphasizes the necessity of a capped retry policy.
"Then I have the decorator function itself. This is a wrapper and we're going to say no for attempt in range one to our maximum attempts..."
Describes how the max attempts are applied in the code.
"Now retries and logging really do go hand in hand."
Highlights the companion role of logging in retry strategies.

Questions This Video Answers

  • How do I implement a simple retry decorator in Python for HTTP requests?
  • What is Tenacity and how does Stamina compare for async retries in Python?
  • How can I retry only specific parts of my scraper without wrapping everything in a decorator?
  • What are best practices for handling failed URLs after retries in web scraping?
  • How does ARNET support retries and impersonation for scraping tasks?
Python decoratorsRetry patternsHTTPXrequestsTenacityStaminaARNETasynchronous scrapinglogging with strruxweb scraping reliability
Full Transcript
If you've ever had your scraper fail because of one failed HTTP request, you'll know how much of a pain it can be. And in this video, I want to show you how we can build retries into our web scraping logic and you can easily implement them into your own code. I'm going to show you two different ways. But first, we'll talk a little bit about what a decorator is. We'll zip through that and I'll show you two different ways that you can add it using a handwritten decorator that I'm going to write and also then using a cool Python package. So this is my initial retry decorator that I have written to demonstrate how a decorator and how this logic is going to work. Essentially, we have a function down here where something could go wrong. And this is going to be our HTTP request. Now, if you've ever used HTTPX or request, you'll know there's a raise for status. That's a neat way of saying, you know, if the status is a bad status, it's going to raise an exception. And it's that exception that we can use to then activate our retries through our decorator. So this function here is going to raise a value error every time. What we're going to do here is write a new function that's going to wrap our function and run it each time and then handle the exception based on what we choose. So this simple decorator is just going to go through three iterations. It's going to sleep after each one and yeah, it's going to say failed and then yeah, nothing else is going to happen. So if I run this one, we can see, you know, we're getting our error something trying to run trying to run failed after three attempts. So this is how we're going to build our initial decorator. So I'm going to come over to a different part of my code and I have this file here. Now, this looks a lot more complicated, but essentially this part here where I've got the decorator and the wrapper is exactly the same pretty much as what we were just looking at, the simple version. I've just wrapped it inside a class. So, my extractor class has a decorator class and a class method to include that decorator. So, what we're doing here is we're saying we're going to have a max number of attempts. Now, this is very important because if you don't have this, your code will retry over and over and over again, causing you even more issues than if it just failed in the first place. We also have a delay in here. This is a time sleep delay in between each request and we can choose what these numbers are however you want to do it. Then I have the decorator function itself. This is a wrapper and we're going to say no for attempt in range one to our maximum attempts and I've done plus one here. We're going to try to return the value from the function. Now our function down here that we're going to put this on is our fetch URL function. This one's going to take our URL. It's going to make the request and we're going to say if response. And this is from the arnet Python package. Um we're going to raise an exception. This exception is going to then be intercepted within our wrapper up here, our exception. I'm going to say, you know, failed. And then we're going to say if it's um still underneath our maximum request amounts, we're going to implement our sleep our delay. And then we're going to try again. And once it's equal to that, we're going to return none. Now, this is important because what can happen if you are just using a standard retry decorator is that it will try however many times you ask it to, but then nothing will happen afterwards. And that's not going to do anything for us. We want to retry x amount of times and then we want to decide, hey, we're going to store this URL for uh trying again another time. We're going to put it back into a queue if we're using a Q system or we're going to, you know, just skip over it all together. That's what's doing here. Here when I say returning none what that means is when we actually call this function down here within our uh the actual part that runs our code we can say you know if the response exists i.e. if the response is not none we can do what we need to do with it. In this case I'm just logging out the uh the URL and the headers. Otherwise we're going to append the URL to a failed URLs list and then return that back out. So if I do this one and run my function here, we're going to see initially we get our successful attempts and that's because I'm reaching out to HTTP bubin at a 200 response. When I try 404, we can see we're doing 1, two, and then three attempts and it's failed. Because it's failed and then it skips on, we're storing that URL by returning none and then caching the URL afterwards. So I'm just going to let this finish and I'm going to go through and then we'll have a list of the URLs that we failed three retries on at the end. So you can see them down here under this warning response. This is a list of the URLs that failed for us. Now within real world code, we could then you do something with these URLs. But what's happened is we've tried to handle whatever we're getting, whatever status code we've decided that we want to catch and and work with. We've tried to handle it in the best way that we can. We've tried to retry it and maybe we would have put different proxies in. Maybe we put different fingerprints in for each one of those retries. And if it still fails, then we're going to log that out so we can do something with it later so we're aware what's going on. Now retries and logging really do go hand in hand. And uh I think I put it in this one if I'm using strrux log which is a cool package that I've just started using. It's a very very simple way just to you know handle not having to set up your own loggers. So if if that's something you want interested and I would definitely check out struck log. It seems pretty cool so far. Let's move on to a different way of working with the decorators. I'm going to introduce you to a Python package too. It's called Stamina by a guy called Hinek or Heneck. Uh he's very active in the Python community and it's built on tenacity. So let's have a look at it over here. So I'm importing it in. Now what stamina is, is I said it's a wrapper around tenacity, but it gives you some sensible defaults. makes it just that little bit easier to use and also adds some extra features. Um, which I'll link to the package down below so you can check out those. Anyway, it's been pretty cool. But one thing that I really wanted to try was working with an Async client using it to handle the retries in that respect. And the way that I sort of uh came up with having to do this is actually using the retry inside a context manager rather than a decorator. Now, this has um positives and negatives. Obviously, a decorator is easier to decorate any function that might fail that we could retry, but in this instance, the only ones we're going to want to retry are going to be the HTTP requests. So, using this uh context manager to do that, I think is is a pretty good option here. So, it's going to be inside my main fetch function. And I've got max attempts is equal to three. This could be whatever you want it to be. Then we're going to do async for attempt in our stamina. context. So if I go to the source code here, we can see that this context manager here yields iterators that allows us to retry the context within it. And this is it's pretty cool in my opinion. I think so far it looks really nice. So we've got this here and I'm going to say with our attempt and this is where our actual request code is going to be. And of course we're using arnet and we're using asynchronous code here. So this fits all in nicely. One thing that I did do here is I created my own status code error. This is when I was working with it initially. I wanted to try a few different things. This turned out to be not particularly mandatory or overly useful other than it's a more clear error of what's going on rather than a bare exception rather. So it's a bit of a better option. So you might want to consider writing your own exceptions when you get to that point. So if we look back into our async attempts, we're logging it here with the URL and we're saying if our not response is okay, we're going to uh return the status code. if our attempt number is equal to the maximum number of attempts. So this is just our way of returning something when we reach the maximum number of attempts. So in the last one where I had the decorator, we're actually returning none from that. We could have done that here. In this case, I chose to return the response status code. And I'll show you when this runs in just a second. And then we say else, we're going to return our status code error. And then we're going to return if everything's good, we're going to return the text. So what I'm going to do here is I have four URLs which we're going to reach out to asynchronously using our client. Uh we're going to be impersonating as well. We don't need to do that for this but you know I'm really enjoying using arnet at the moment which is good for impersonation for the TLS fingerprints. Then we're going to try and get the information. I'm just going to print the data. Now we know from looking at these URLs these HTTP ones that these are going to fail. So let's see how it's handled here. Let's clear the screen up and I'll do UV run main. py. So, we can see our stamina retry here. Now, there's lots of different ways you can manage the retry and the back off within stamina. I'm just using the defaults here just as an example, but I definitely recommend you check out the GitHub and he's done a video on it, too, which explains the differences from tenacity and stamina and what he wanted to include. It's definitely worth looking at. I think it's the retry uh package that I'm going to be using going forward. So, if we look at what has been returned from my code, we didn't fail. Our code did not crash just because we hit a 404 or a 504 whatever that is. I think I just made that up error there. But what we've done is we've returned the status code after the maximum number of attempts have happened. We can see that up here. So we can see that we're having retry number etc etc and it's failing on these ones. So maximum returns reached passing on this URL and then that's causing us to as I chose to return the status code. And now the information that I have out of my function includes the URL and either the data, the JSON data or our bad status code. So we can review this. We can review the logs. We can see what's happened. We don't have our whole code finishing and crashing just because we make one HTTP request that fails. We can retry them and then we can handle them however we want to. If you've enjoyed this video, you want to watch this one next, which goes into more detail about how I actually get data when I'm scraping sites.

Get daily recaps from
John Watson Rooney

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.