I Tried Laravel AI SDK with 5 LLM Providers: Speed, Cost, and Issues

Laravel Daily| 00:19:32|Feb 25, 2026
Chapters9
The creator outlines testing the Laravel AI SDK against a realistic CMS workflow to understand costs and time for frequent AI operations.

Laravel Daily tests Laravel AI SDK across 5 providers to compare speed, cost, and reliability for real CMS tasks like titles, tweets, translations, and images.

Summary

In this hands-on run, Laravel Daily’s author benchmarks the Laravel AI SDK against five providers to reveal real-world costs and latencies for common CMS tasks. He demonstrates using an article title suggestion flow, tweeting from article content, translating large passages, and generating featured images, all within a Filament-based setup. The video emphasizes that API calls to AI agents are not uniformly fast or accurate, highlighting the variability across providers, models, and even individual prompts. You’ll see an under-the-hood look at how logs capture tokens, costs, durations, and success/failure for each operation. The creator compares model families (e.g., Grok, DeepSeek, Gemini, OpenAI GPT-5, Claude) and notes that cheaper or faster options often trade off precision or speed with longer prompts, different formatting, or occasional timeouts. He notes practical takeaways, such as when to favor cheaper models for text tasks and how image generation can incur meaningful costs and occasional safety-related failures. The video also shows how system prompts and model choices affect output style—from shorter haiku-like titles to more expansive GPT-52 results—and warns about unexpected failures requiring graceful error handling. Finally, the creator points to the source code and project examples available for premium Laravel Daily members, inviting viewers to experiment with their own configurations. Overall, the video is a pragmatic guide to managing expectations and costs when integrating AI into real-world Laravel apps.

Key Takeaways

  • DeepSeek and Grok offer the cheapest text-generation options, often delivering acceptable results at roughly 0.1 cent per prompt for titles and tweets.
  • Grok image generation is priced per image and can be relatively expensive, with OpenAI image generation offering colorful results but higher per-image costs.
  • Translation tasks can time out or slow down significantly depending on the model, with 23-60 second durations observed for some providers, underscoring the need for UX cues like websockets or progress indicators.
  • Model speed and output quality vary dramatically within the same family (e.g., Gemini Pro vs. Gemini 3 Pro), so testing multiple runs and fallback strategies is essential.
  • Opus and similar high-end models may deliver strong results but at noticeably higher costs (e.g., translation sometimes around 3 cents per prompt).
  • Using Laravel AI SDK to unify calls and implement fallbacks can save development time, but you must manage user expectations about latency and inconsistency across providers.

Who Is This For?

Laravel developers and product engineers evaluating AI-powered features (titles, tweets, translations, and images) for client projects or SaaS apps. If you need cost- and time-forecasting across multiple providers, this video helps set realistic expectations and informs provider selection.

Notable Quotes

"Hello guys, in this video I want to show you how I tested Laravel AI SDK in realistic project in terms of how much it costs and how much it takes in terms of time to call the AI agents for typical operations like suggest titles in the CMS, generate featured image, create tweet or translate to a different language."
Opening overview of the test scope and tasks used to benchmark the AI SDK.
"Under the hood it's much more complicated than it seems different providers have different models different behavior different tricky parts to know about."
Noting the variability and complexity across providers/models.
"The duration of the same operation ... Gemini 3 pro is 22 seconds or for example I was surprised that GPT5 mini was much slower than GPT 5.2 two."
Illustrates unpredictable latency across models.
"This is the new log entry for claude haiku and the duration in milliseconds was 2 seconds which is relatively fast right but look at the result from other models."
Shows how performance varies and how logs are used for comparison.
"Be really cautious about AI models being too expensive sometimes for some operations taking too long, sometimes failing and just manage the expectations from your client."
Practical caution about costs and reliability when communicating with clients.

Questions This Video Answers

  • How do I compare latency and cost across multiple AI providers in Laravel?
  • Which AI provider offers the best cost/performance mix for title generation in a Laravel CMS?
  • What are realistic turnaround times for translation or image generation using AI in Laravel apps?
  • How can I implement fallbacks and error handling when an AI provider times out or fails?
  • What are the caveats of using AI image generation in production and how to budget for it?
Full Transcript
Hello guys, in this video I want to show you how I tested Laravel AI SDK in realistic project in terms of how much it costs and how much it takes in terms of time to call the AI agents for typical operations like suggest titles in the CMS, generate featured image, create tweet or translate to a different language. With that video, I want you to have the answers for your clients or for yourself if they ask you for some capabilities on their website and you wouldn't be able to reply and predict how much it may cost or how much it would take because the problem is for clients it may sound very simple like they use Chad GPT or Claude and they expect the results of any prompt come with very sharp time in a few seconds and always pretty accurate. it in reality when you call AI agents via API it's not always the case so let me show you AI SDK in action so when you call the agent with something like prompt with Laravel AI SDK or with Prism for that matter under the hood it's much more complicated than it seems different providers have different models different behavior different tricky parts to know about so I will show you the results of this log table that I have calling various providers for various operations and logging the tokens, the cost and the time of completion and whether it succeeded at all. So let's dive in. So first the overview of the features. This is a filament project but this video is not about filament. This was just my personal choice. for example to suggest the title for this blog article and this blog article by the way it's from my Laravel Daily one of my latest articles when I got back from Laron India and I blogged about the upcoming at the time Laravel AI SDK so I've put this article into filament CMS and then I will ask for example to suggest the title so this is the system prompt for the agent and I will show you the code in a minute and then there's a choice of providers and models and I activate created the API keys for five of the providers. So, topup XAI which I haven't tested earlier and DeepS seek as well. This is also new to me. And for each of the providers, I've chosen the models that are appropriate or the latest versions for each operation. So, for example, for text operations, the models are different than from image operations, which we'll get to in a few minutes. For now, I will just show you how it works. So if we choose entropic and for example haiku we generate the titles and it should take like a few seconds for haiku and then for example you can choose one of the titles apply selected title and then it is applied to the title. So yeah this is the behavior of pretty much all the feature that I will demonstrate. For example create tweet you also choose the provider and you may edit the system prompt for generate featured image. The models are different. So, Gemini Pro and Flash, OpenAI has just one and XAI has just one model actively supporting the images. And then at the bottom, I have another text area for translations. I'm natively Lithuanian speaker, so that's why I chose that language to test and also choice of model to provide the translation for a pretty long article. So, I will show you the results of all of those. Now, I will briefly show you the code. So this is the filament action button for generating those titles and this is the main thing with Laravel AI SDK. So we have an agent called title suggest and we have instructions and prompt with provider and the model and inside of that agent we have this instructions and we also have system prompt database table locally where if you for example want to edit that system prompt it will be saved for future uses. Now I've tried the same prompt for all the models that I showed earlier and this is the list and I thought to compare those suggested titles somehow but you know what they are all pretty good. So open AAI models got longer titles and GPT52 are probably more profound I would say. So for Hiku you immediately see the length shorter. Sonnet is also similar and Opus gets longer. Kind of logical. Then Deep Seek Chat is also good and cheap. We'll get to that in a minute. And this is the result with Gemini Flash and Pro. It's hard to even compare those titles. They are all pretty good. Even 3 and 3.1 Pro and Gro is also here and it also works. So my first impression and first kind of conclusion from all those models, they are all pretty good. So it doesn't make sense to use tokens and money for example to use Gemini 3 Pro or Opus 4.6 because for that text analysis with the titles you can use cheaper and faster models. But how cheaper and how faster? So this is the database table of logs of all those operations and you saw how it worked just a minute ago. So this is the new log entry for claude haiku and the duration in milliseconds was 2 seconds which is relatively fast right but look at the result from other models. So the duration of the same operation the same prompt may be very different among the models. So Gemini 3 pro is 22 seconds or for example I was surprised that GPT5 mini was much slower than GPT 5.2 two and then I try to estimate the cost of each prompt based on the official pricing which I have put in the config here and then what Laravel AI SDK does return is the amount of tokens. So those two columns can be taken from the result of AI provider and this can be calculated. So as you can see almost all the models don't surpass 1 cent of dollar value except for Claude Opus. So for that one prompt, Claude Opus took three cents. Which are the cheapest ones? Grock is the cheapest. And actually, let's order that. And in terms of price, you could compare Grock and Deepseek. And Deepseek in general was a new thing for me as I said. And this is my dashboard. In my whole testing, I didn't even use one cent of value. I topped up $2 via PayPal. So, I've heard of DeepSseek being cheap, but I didn't realize it was that cheap. So, if you're looking for really cost-effective model, my first impression about Deep Seek is good enough and much cheaper. Same with Grock. And I will show you other operations of the same models in a minute. And if we compare other prices, there's clear difference between mini models. So, flash mini and haiku. So, this is the price in 001 of a scent. And then there's another family of models like Pro GPT52 and Sonnet along the way. And OPUS is by far the most expensive. The next operation I want to show you is create tweets. We will have four operations in this video. For tweet, for example, let's choose XAI because well, it's Twitter and Grog. To generate the tweet, this is the system prompt. Generate a tweet from that article. And it should take a few seconds. And this is the suggested tweet. Of course, you can change the system prompt to, for example, do not use hashtags or emojis or stuff like that, but in general, the model does work. And again, quick look at the code with roughly the same pattern. Tweet writer agent with instructions with provider and the model. And then we shorten the response to 280 characters just in case. And inside of that model, roughly same thing, custom instructions on top and system prompt. And these are the general results generated by each model. And the first failure here was this. I don't have the screenshot, but in GPT5 mini yesterday while testing I had this unknown finish reason. And this is another kind of takeaway from this video that any model at any point with any prompt can just fail. Any provider may have bad days or just be down for some time. It just happens with LLMs. So that's why everywhere you need to have try catches and gracefully show the human language errors to your users. But then I tried again and it did generate a tweet. This was a tweet by GPT52. Claude Haiku started with emoji and ended with emoji. Sonnet added hashtags. Oppos was good at even having the code and the italic font in the suggested tweet. Deepseek chat was brief and M dashes everywhere in the text. You notice that? And I was surprised by Gemini models. So this is a tweet by Gemini Flash. So well structured tweets, not just emojis. I would probably exchange them to just dashes or list and I would probably remove those. But generally this is much more readable than one text. Same with Gemini Pro. But the downside was that it didn't obey the rule of 280 characters. So this was the link stripped out by my PHP function because it was too long for Gemini 3 and 3.1 Pro. But the format of the tweet I did like better. And then this is the tweet by Grock. But also in general, the models did deliver pretty well. It's just probably your personal preference and the system prompt how you want to format the tweet. Now let's take a look at the cost and the duration of that operation. So this is the same database table for tweet writer in this case the feature and look at the duration first for such summary it's at least a few seconds for some models and GPT5 mini was actually one of the longest again surprised by GPT5 mini being much slower than GPT52 and also Gemini Pro models are the slowest also surpassing 10 seconds. And in terms of cost, you see the column ordered by estimated cost. The results are roughly similar to the previous operation of suggesting titles. In fact, it's roughly comparable the operation of summarizing the text and providing some recommendations. It could be a tweet. It could be a set of titles, but the cost I would say is roughly similar. So again, we have Deepseek and Grock as cheapest. Then we have Flash and Mini and Hico models. And then the family of mid-tier models GPT and Gemini Pro. And then we have set and OPUS as again on top of the most expensive ones. The third textbased operation I tested was translation. And then we'll get to generating feature image in the fourth section of this video. And now translate to Lithuanian or it can be any other language. And again we choose the model. And this is where things started to get really interesting because this operation is much harder than summarizing. So not all the models could deliver on this. For example, let's try XAI with Grock and I will show you just how it works. We submit and the operation will take much longer. It's not 3 seconds, it's not 5 seconds. So the fastest model actually was Grog which delivered in 23 seconds if I remember correctly. And I will deliberately make no pauses here. I will just keep talking while it is loading with the idea that longer operations for AI and some operations may take very long. Should be definitely in the queue. Also, you may use websockets to inform the user. But yeah, in this case, I talked for long enough to have the translation. Of course, you probably don't speak Lithuanian and you cannot really evaluate that, but I can say it's pretty good actually. I would probably not publish it as is and will have like a few minutes of editing, but generally the formatting and even some terms like chat GPT AI SDK is not translated or API. So, it's pretty good. But here's my result like general notes on that task. how many models did not deliver in 60 seconds just timed out and this was just my local configuration of 60 seconds in local PHP and I thought I would deliberately not change that this is kind of like a benchmark that this model cannot deliver in a minute which means it's really slow and also I haven't even tried OPOS because I thought it would be really too expensive and these models did succeed interestingly Gemini 3 Pro succeeded but 31 Pro surpassed 60 seconds and timed out. For some models, it may be hit or miss even in the same model. Let's take a look at the duration and cost. So this is the list of models that did deliver and for Grog I tried twice. So one during this video one time and then before that yesterday. So these are the durations as you can see 24 seconds, 31, 42, 48 for Gemini Pro. This is the reason why Gemini 31 Pro failed because it is on the edge of 60 seconds. And if we compare the cost, the quality of translation, it's hard to evaluate. I didn't actually dig deeper. They all seem okay. But the price again, C for Grog and Deep Seek, it's.1 of a cent. Then we have Flash and Hik models roughly around 1 cent of a dollar. And then again, pro models are more expensive, like 3 cents for one translation. That said, if I had a CMS and I could use that translator for 3 cents, I would gladly do that as a user. And also, waiting for 1 minute for translation would be pretty fine for me as long as it's reliable. So again, if you do the Q version, the websockets, the UX to update the user with translation or inform if there was some kind of failure, then the cost for such operation is pretty much okay in my opinion, except again for claw us, which I didn't even dare to try here. And finally, we go to the feature of generate featured image. So I've Googled which providers have which models that support image generation. And for Gemini, the top models currently is Pro Image and Flash image, which are basically both Nano Banana, just Pro or Flash. For OpenAI, I tried GPT image because GPT40 was somehow restricted. It didn't allow me to use it for images, but this model did deliver. And then for XI, there's Grog imagine model. And let's try to use Grock, and you will see how it works. We generate the image. it will not actually take that long. So text translations are longer and this is the result. Well, the quality is questionable and this is definitely not Taylor art. Actually, it doesn't even try to depict Taylor. And also repeating the text on top is pretty cringe, but yeah, it delivered something. And I saved the other images from yesterday's testing. So this was the image generated yesterday by the same Grock imagine, which looks better. still pretty cringe and the text is not in the right place but it may work for some use case. So this is Grock. Then this was Gemini 2.5 flash and with flash model what I did notice it does make typos. So for example unveos I wouldn't use Gemini flash for anything more serious. This was the result by Gemini Pro which is I guess the best from them all but also I would have some things debatable here. So the dates are incorrect also too much texts here and there. I would probably focus on something one of those areas but it was okay. And then finally open AI GPT image. I did like that the most as like the most colorful and the most live. But this is definitely not Taylor, more like myself probably. But overall, not a bad job, I would say. Also, I've made a few notes here with Gemini Pro. It failed with the image a few times until it actually succeeded, but only once. Many times when I was prompting and reprompting, this was the log error. just saying the model could not generate the image based on the prompt provided. Then I retried it with the same prompt and once in five times it did deliver. So this is another thing with AI models again you're not guaranteed to get the results and specifically with images there are quite a few safety checks. For example, if you mention some celebrity, even probably Taylor Artwell name here may have triggered some kind of safety check on Gemini's side, which is probably good. But again, do not feel guaranteed that it will return something. And also one more note for Grog image pricing. I realize that it's priced per image and not for tokens. And let's actually go to see price and durations. So this is the table and probably two numbers stand out here. So duration of openigt image which is on the edge of 60 seconds timeout but you saw the image is pretty complex and also pretty complex image was by Gemini 3 pro which took 20 seconds to generate but cost 16. Generating of images with nano banana pro is actually pretty expensive. This is the official documentation. So the image cost per million output tokens is $120. And they also, let me zoom that in. They also calculated themselves equivalent to roughly 10 to 20 cents per image, which makes sense because in terms of AI, electricity and stuff like that, the operation is expensive. So if you suggest to your clients or to your projects to generate images, be aware of the costs. So yeah, this was my experiment about what's under the hood when you call something like this with Laravel AI SDK, Prism or in fact any other AI kind of client wrapper, whatever you call it, what is happening under the hood. Laravel AI SDK is a great tool to unify those calls and especially if you want to work with a few models or a few providers or have fallbacks, then yeah, Laravel AI SDK is very good at that. But still be really cautious about AI models being too expensive sometimes for some operations taking too long, sometimes failing and just manage the expectations from your client that it's not like chat GPT that you get the answers almost instantly. And if you want the source code of that project, so you would be able to try it out yourself, I will put it after shooting this video in the section of project examples on Laravel Daily where we collect a lot of GitHub repositories for various demos on this channel and elsewhere. This is available for premium members of Laravel Daily. In addition to have courses, we also have these code bases as part of the membership. So you would get the access to GitHub. The link will be in the description below. That's it for this time and see you guys in other

Get daily recaps from
Laravel Daily

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.