GPT 5.5 is NOWHERE NEAR Opus 4.7 for Coding (Proof)
Chapters11
The creator notes hours since GT5 drop, announces testing approach focused on backend coding with Harbor and OpenAI API limitations.
A candid, no-holds-barred test of GPT 5.5 versus Claude Opus 4.7 for backend coding, with live builds and strong skepticism about hype and pricing.
Summary
Income stream surfers’ video pivots from hype to hands-on testing of GPT 5.5, with a seasoned coder’s eye on backend work. The creator notes GT5's availability in Europe and immediately questions its value against Opus 4.7 and Claude, citing benchmarks and real-world coding experience. Harbor Build and Harbor Copilot figures prominently, including a tease for Harbor Build pricing and CMS features. The test runs include setting up a dev workflow, exploring Codeex and CLI prompts, and attempting to replicate Harbor Build with different models. Across the tests, Opus 4.7 consistently outperforms GPT 5.5/5.4 in backend coding, according to the creator’s live observations. OpenAI’s tools are praised for usage limits, but not for reliability in these scenarios, while Claude emerges as a comparatively smoother alternative in some prompts. The video closes with a sharp verdict: GPT 5.x is not beating Opus 4.7, and the hype around 5.x may be overblown. Viewers are invited to check Harbor’s free trial and follow for Harbor Build updates.
Key Takeaways
- GPT 5.5 is described as overrated by experienced users, with the creator arguing that Opus 4.7 still excels at backend coding.
- In live tests, Opus 4.7 reportedly outperforms GPT 5.4/5.5 across several backend tasks, according to the speaker’s observations.
- Harbor Build and Harbor Copilot are positioned as high-cost, ambitious tools with CMS features and Stripe integration that the creator teases releasing pricing for.
- Codeex and CLI workflows are used to compare GPT-5.x results against Claude and Opus, highlighting interface quirks and model behavior differences.
- The creator promises future content on Harbor Build and monetization details, including a free trial and pricing changes for early adopters.
Who Is This For?
Developers and AI enthusiasts who want real-world comparisons between GPT 5.x, Claude, and Opus 4.7 for backend coding, plus investors or users curious about Harbor Build’s ecosystem and pricing shifts.
Notable Quotes
"Okay, so it's been about two and a half hours since GT5 dropped and I have just got it in Ireland in Europe."
—Opening remark establishing the timing and regional availability of GPT 5.5.
"I can tell you right now that there is nothing that GBT 5.4 is better than Opus 4.7 at."
—Strong claim that Opus 4.7 beats GPT 5.4 for backend tasks.
"This model seems absolutely terrible and I've hit the usage limits guys."
—Direct critique of GPT 5.x performance and usability under real usage.
"Change the model plus front end to chat GBT 5.4. I cannot believe how bad this is compared to Opus 4.7."
—User attempts to force the test to use GPT 5.4 and contrasts with Opus 4.7.
Questions This Video Answers
- How does GPT 5.5 compare to Opus 4.7 for backend coding in real-world tests?
- Is Harbor Build worth the price for a developer workflow with CMS and Stripe integration?
- What are the limitations of GPT 5.x when working with Codeex or Harbor Copilot?
- Why do some creators prefer Claude over GPT for coding tasks?
- What are the key differences between Codeex CLI and Harbor Build in practice?
Full Transcript
Okay, so it's been about two and a half hours since GT5 dropped and I have just got it in Ireland in Europe. So, shout out to OpenAI for finally letting me use this model. I've been looking online. It does not look like a particularly good model, guys. But I am going to run a couple of tests. And specifically, I want to do some back-end coding because if I just do the standard front-end code, which I will also do most likely, but I want to do another one which is also backend code because otherwise people will comment and say you're not testing the model properly.
Now, with this still not being available in the API, and I already released a video on my thoughts on the blog post, but I I kind of called it in the in the uh video that I made before uh the video earlier today when I said that this model was most likely overhyped. They got me with the hype honestly. And then I came and looked at the like the benchmarks and just seeing that Opus 4.7 is lower than GPT 5.4, I already know that there's some [ __ ] going on, right? I don't care how good your benchmark is.
There is no way that GPT 5.4 is better than Claude Opus 4.7 at basically anything, right? Especially backend coding. Now, this is coming from someone with a lot of experience with Opus 4.7. Okay, I would argue I've done more coding than most people with Opus 4.7 since it dropped. I have implemented Harbor Build. I made over 100 backend and front-end changes to a client build that we are doing. I made Harbor Copilot with an overlay. I've done so much coding with Opus 4.7. I can tell you right now that there is nothing that GBT 5.4 is better than Opus 4.7 at.
And I can say that with some confidence. By the way guys, Harbor Build is coming. So go and subscribe to Harbor because we might be putting the pricing up with the release of Harbor Build because this thing is so damn expensive to run. If we do release it, I don't know how exactly well when we release it. I don't know how exactly we're going to release it. It has CMS built into it. So like you can literally add your own pages inside Harbor, which is absolutely crazy. Obviously you can blog directly from Harbor as well.
You can control your products from Harbor. It will automatically add Stripe. You can sell things with this. Yeah, I I don't know how to release this, guys, but this is what Opus 4.7 is capable of. So, what I want to know is how good is this new 5.5 model? So, just go get a trial for Harbor, guys. If you lock in the pricing, I will give you that pricing forever. Like, it doesn't matter founder, non-founder, whatever. It doesn't matter. just go and sign up, get a free trial, and then when we do increase the the price of Harbor, which we most likely will have to with Harbor Build coming, we will not include that for users who have already subscribed, right?
So, definitely go and check out Harbor. Go get free trial. Let me know what you think, guys. People absolutely love this tool. And it actually works as well. Just going back to the live site real quick. If I just go to billing and plan, you can see here 224 pages published, 70,000 impressions, 720 clicks. I'm being as transparent as possible. This is just click data from published pages, right? We don't actually collect all search console data. We just collect search console data based off of the pages that people have published inside Harbor. So, go and check out Harbor.
There's a link in the description and in the pin comment. Let's jump into the video. Okay. So, the first thing I'm going to do is I'm just going to set this going with the normal prompt just because I want to see how it how it does basically. So, we'll just click here. We'll do new folder GBTE 5.5 and then open. And then we'll go and grab the prompt. Okay. So, first things first, let's just run the prompt right here. This is from my school school community. And then at the same time, I'm going to open up Harbor and we're going to see it, you know, what we can do here.
I don't know what it is specifically with GVT models. They always just say, "Okay, well, I can't find an XJS project here." It does say you're inside an XJS project. It never pushes back. It just says, "Okay, I'm going to basically search through your entire computer and find and then it finds all of the different types of these models that I've made, etc., etc." It's kind of weird that it does that. It's only GPT. Claude doesn't do that. The Chinese models don't do that. Gemini doesn't do that. But every single GPT model, it doesn't matter what.
It just it just says, "Okay, well, I'll just read your entire computer." It's like, "That's not what I want. I want you to make a fresh project." But whatever. It's almost like it follows the instructions too carefully, right? Okay. So, let's just open up Harbor. The only issue is I am 80,000 lines ahead of main in Harbor. So, I just have to work out quickly how I'm going to do this. So yeah, just some quick checks to make sure that I'm not going to break anything here because if I lost Harbor Build, I you probably will never see this video.
Um, so just going to make sure that Harbor Build is secure and I can just, you know, dev immediately from the dev branch basically. So let's just see what this comes up with. So I'm just going to say I want to continue dev development. Let's say from dev branch. uh start by creating a backup of all of the schema in convex dev at the moment uh so that we can easily roll back if something goes wrong. I want you to refactor the entirety. Uh actually wait, I want you to add a drop-down picker to select either anthropic in brackets current implementation /clude and um open AI.
Let's say one of the latest GPT models 5.4 I think is good. look online to back all this up. And then I just need to check the name. Yeah. So, it's the agent SDK. Uh, it would have to be the TypeScript one, the TypeScript SDK, obviously, because I'm on uh Convex. Uh, and yeah, I'm on ComX. So, I want you to copy the implementation of Harbor Build, but use Open AI instead. remember uh making sure to use skills, etc. Um, make no mistakes, do we say? No, probably not. I don't want to meme. Um, so yeah, we'll just link to this and should be everything should be here as far as I know.
I don't know if they support skills, which would be really stupid if they didn't open AI agent SDK skills. Agent skills. Does it support them? Yeah, looks like they might be supported. Um, make sure to look up how skills and tool use are implemented in the agent SDK. First, start by telling me if this is even a possibility slash uh fully possible to replicate. Okay, so let's press enter here. be really curious to know if this is any good at backend coding because front end stuff doesn't look that good right now. I've seen some tests online.
It looks pretty standard to be honest with you. But yeah, we'll see what this does, guys. I'll be completely unbiased with this. Um, but I'm not expecting I'm not expecting bags and bags to to be frank. So, let's see how this is getting on. Um, this hasn't done anything yet. Okay, it does say the production build is underway. Um, that's fine then. So, we'll just let this let this build. It does just say thinking though, so I'm not really sure what it's thinking about. Okay, there we go. The production build passed and generated 108. Wait, already?
Wait, that's crazy. That was super fast. What the hell? Wait, that can't be done, can it? Okay guys, so I mean that is done. Um, that was absurdly fast. Obviously the technical build is going to be fine here. I'd be like very very disappointed if it wasn't. But I mean I don't know, dude. I I I don't know. It just looks weird but also really good but also weird. and SVGs are kind of strange here. Looks super GPT generated. I don't know how to describe that to people, but um yeah, the technical build is obviously going to be perfect.
It you would expect it to be perfect. It's a Frontier model. This benchmark is pretty out of date at this point. I just use it for certain things and people have asked me to continue making these videos, so I will continue making these videos. But overall, weird build, I have to say. Very, very strange build. Um, Codex app is pretty nice. I have to say I do like the Codeex app. Um, I still prefer Claude. Don't know why. I just prefer Claude the the app. I mean, I I also prefer Opus 4.7. I can just tell you that already because Opus 4.7 doesn't look necessarily that like it's Opus 4.7 generated, right?
Um, and having a per perfect technical build in this uh benchmark is is nothing really impressive at this point. I have to say like even the Chinese open source models get this perfect. So, but overall I mean pretty interesting build. So, the real test will come from here. Uh, so I'll just say yes here. So, apparently the repo is already on dev, but it's 247 commits behind origin dev. I don't actually know what that means. I'm such a bad developer. Well, I'm not a developer. And has several existing modified unttracked files. I'm treating those as user work and won't overwrite them beforehandation.
I'll isolate exactly which files need edits. Okay, that's fine. So now that was super fast, by the way. Like that was probably one of the fastest builds I've ever seen. Yeah. So this is kind of the story that I'm seeing across the board, right? Like yeah, this is just some some Twitter bait obviously, but I mean, yeah, it's interesting. And it says that Opus 4.7 is still the best that they've seen. I am seeing the same thing as well. to be honest with you, there's nothing here that says, you know, Opus has been taken over by anything.
So, we also have it inside the Codeex app now, which is pretty cool, finally. So, we can do codeex uh sorry, the CLI and then CDC codeex CLI. I might just run the same prompt just to see what happens. And also, you get to see a little bit more of what it's doing. Honestly, guys, this just sums up GPT. I swear I said start from scratch and it started reading. I said, do you know what start from scratch means? in a very norm normal calm manner said Dumbledore calmly and then again said Dumbledore calmly no do not delete anything because I tried to delete something just start from scratch in this directory I don't know what it is with GPT it's like I must follow all orders exactly as they were given to me I don't know it's just weird there's no what why is it gone oh my god it still went back to Davidson no not that no the directory You're inside now.
What is it doing? I don't know. I I don't know if this is just GPT. Like people are saying it's ch Honestly, there are some people on YouTube who are absolute don't even want to get involved with this, but like they're obviously buddy buddy with these AI companies. You can just tell. And these AI companies give them the models way too soon and they say that it's something new and something's changed and blah blah blah. But like it's the same GBT model. Literally, they always do this. What directory are you inside now? Like it's still going inside the wrong one.
No, wait. I'm inside I'm getting literally I'm getting gas lit. Are you sure? No, you are not. It says here it's in. Okay. I want you to only work inside codeex CLI and nothing else. Okay, [ __ ] get on with it. What is this gaslighting? By the way, shout out to GPT for for gaslighting me there. What the hell? It made me actually think that I was in the wrong directory for a second though. That was crazy. Okay, finally it's getting on with it. Jesus. And then this one is creating the wrong thing. It's creating a GPT 5.4 writer, which is also not what I wanted at all.
Jesus. Oh, and now this one is scaffolding the app instead of running npm or npx create latest. It's a joke, honestly. Is this thing not like people might say it's prompting whatever but like if I said this to to Claude it would not [ __ ] this up right. Oh I put harbor bold instead of harbor build. Right. Okay. So RIP that is actually my fault. But I would still argue that Lord would pick this up. I'm actually going to I'm actually going to see if that's So to be completely fair, what I'm going to do is I'm going to do new session as if I were inside a new claude and I'm going to press enter here.
I just want to see if it picks up that I mean harbor build and not like the whole thing, right? Like it it was trying to do harbor writer, but that's not what I want to test. I want to test specifically Harper build, right? So, let's just see if it manages to pick this up because this is the real test for for for people. Can it like do something with a halfass prompt because at the end of the day, not everyone wants to sit and write prompts all day. Okay, so interesting guys. This came up with a different answer.
It didn't actually notice that I meant harbor build. It looks like uh from what I can see, it doesn't mention build here. So, let's say I meant harbor build. But what it did say was it's not feasible, right? Yes, but with significant caveats, which is important. So, let's see what happens if I say I meant harbor build. Okay, so to be fair to OpenAI, that was my issue. What is this? Automatically compacting context. Now I'm getting bloody, it's a joke. I can't even do a normal test without getting compacted. Crazy. Now I just want to do this with with Claude instead and see how it actually builds stuff.
But let's see what happens here. Okay, guys. This is the CLI result. Um, pretty similar, to be honest with you, to the Codeex app. Maybe a little bit better or I don't know. It's it's it's about the same honestly. It's got a lot of text there, but that might be good for SEO to be honest with you. The prompt does prompt for SEO. Uh, there's no English button. Uh, very interesting. That is probably the first time I've seen the lack of uh the language switcher. That is very very specifically prompted. So that's a big fail actually not having that.
Um I don't know. Yeah, this model does not seem very good guys. I'm just going to call it as I see it. This model seems absolutely terrible and I've hit the usage limits guys. So I can no longer use the model. This is for 20 bucks a month though. I have to say shout out to OpenAI for being a little bit more generous with their usage limits. they have to because they're getting completely smoked by Opus 4.7 in my opinion. But still, credit where credit's due. They do give you a lot more usage. So, if you do want to use a serious LLM, a soda model, so to speak, then I would definitely say Codeex is the way forward.
Okay, so this is the kind of final test that we're going to be doing in this video. I have another video that I want to make about the writing capabilities of this model which we can very easily test by putting it inside the writer inside Harbor. But for now, let's go to Harbor Build and Okay, that's okay. We'll give it a little bit of time. Yes, I do have Claude code fixing Codeex's mess. Yes, that is ironic. We'll see how this build goes once this finishes. So, it didn't even build the drop down, guys. I don't know what to tell you.
This model seems absolutely god awful. I'm just going to be honest. Um yeah, I don't know what they're doing over there at OpenAI business CSV. There's no way to even select um the model, right? That was the whole point. Okay, so as usual, Claude Code to the rescue. It decided to use Gbt 5.1. I specifically said use GT 5.4. So that's crazy. Uh, change the model plus front end to chat GBT 5.4. I cannot believe how bad this is compared to Opus 4.7. And you have people on Twitter saying this is better. Like the amount of pain this has put me through to just press build here is absolutely absurd.
Absolutely absurd. Like I I literally am bored of trying to get this to work. I think Opus would have oneshot this. By the guys, honestly, I'm just going to leave the video there. This is a terrible model. I don't know what they're doing. They built all this hype up. People were saying this was going to be better than Mythos. This is OpenAI's Mythos. It's a joke. This is absolutely terrible. I've tested this through and through and it's not a good model. Thank you so much for watching. If you are watching all the way to the end of the video, you're an absolute legend.
Go and check out harborseo.ai if you want to sponsor the channel and I'll see you very, very soon with some more content. Peace out.
More from Income stream surfers
Get daily recaps from
Income stream surfers
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









