Stop letting your agents write Markdown.
Chapters10
The host critiques Markdown for being limiting as AI tools grow more capable, setting up the debate between sticking with Markdown and adopting HTML.
HTML is increasingly favored over Markdown for agent work, with Claude Code and Copilot Kit enabling rich, interactive HTML artifacts that improve readability, collaboration, and UI for AI-assisted tasks.
Summary
Theo from t3.gg argues that Markdown’s limitations are driving a shift toward HTML when working with AI agents. He cites Claude Code’s push for HTML as a more expressive output, and highlights how HTML enables richer diagrams, interactive components, and better structuring of specs, PRs, and code reviews. The video showcases concrete examples from Thor Klepp and Carpathy, plus hands-on demonstrations of HTML artifacts generated by Claude Code and Copilot Kit. Theo notes that HTML can present tables, SVGs, code snippets, and interactive widgets in a single file, making it easier to read, share, and verify, especially across Slack, Git histories, and Jira-integrated workflows. He also discusses practical onboarding tips, such as starting prompts from scratch to generate HTML artifacts and iterating through explorations, implementation plans, and verification. Throughout, the emphasis remains on staying “in the loop” with AI outputs and considering design systems or UI-focused HTML as a durable pattern. The sponsor segment introduces Copilot Kit as a full-stack solution to build interactive AI-powered apps with custom components and hooks. Theo also reflects on the future of output formats, suggesting that we may converge toward even richer, visual interfaces and interactive videos, while acknowledging that MDX/HTML design systems could bridge current gaps. Subscribe for future experiments and results as he trials more HTML skills and compares them with peers like Matt Poc and Carpathy.
Key Takeaways
- HTML offers a significantly richer information medium than Markdown for agent-generated content, including tables, SVGs, CSS styling, code snippets, and interactive UI elements.
- Claude Code can read your codebase and context (Slack, git history, MCPS) to generate comprehensive HTML artifacts that organize information and support verification tasks.
- Using HTML artifacts enables new collaboration workflows, such as side-by-side option exploration, implementation plans in HTML, and HTML-based code reviews with inline diffs and annotations.
- Copilot Kit provides a framework to build interactive AI-powered front-ends with custom components, hooks, and the ability to render dynamic UIs that respond to user input and model outputs.
- There is a trade-off: HTML diffs can be noisier to review than Markdown, but the gains in readability, navigability, and interactive capability often outweigh the downsides for complex specs and PR reviews.
- A practical onboarding approach is to prompt from scratch to generate an HTML artifact, then iterate by expanding explorations, mockups, and data flows before implementing in a new session.
- The future of AI output formats may move toward interactive, visual experiences (interactive videos, simulations) beyond static HTML, but HTML remains a strong starting point and collaboration tool.
Who Is This For?
Software engineers, AI practitioners, and engineering managers who want to understand practical, actionable benefits of HTML over Markdown for AI-assisted workflows, especially when using Claude Code and Copilot Kit.
Notable Quotes
"Markdowns become the dominant file format used by agents to communicate with us."
—Theo notes Markdown is prevalent but has growing drawbacks for complex AI workflows.
"I have started preferring HTML as an output format instead of Markdown."
—Core claim: HTML can better convey complex information and interactions.
"HTML can convey much richer information compared to markdown."
—Explains the breadth of HTML’s expressiveness (tables, SVGs, code, interactivity).
"The interaction part is really strong, too, though."
—Highlights the bidirectional, interactive capabilities of HTML artifacts.
"Stay in the loop."
—Carpathy emphasizes keeping AI-driven outputs in a human-check loop via HTML artifacts.
Questions This Video Answers
- How can I start using HTML artifacts with Claude Code instead of Markdown?
- What are practical benefits of HTML for AI-driven code reviews and PRs?
- Can Copilot Kit render interactive UI components inside HTML for agent outputs?
- What are the trade-offs of HTML vs Markdown for large specs and design docs?
- How do you structure an HTML artifact for efficient collaboration and verification?
HTML for agentsClaude CodeCopilot KitMarkdown vs HTMLCode review HTMLUI/UX for AI artifactsSVG in AI artifactsDesign systems HTMLJavaScript/HTML integrationsAI-generated documentation
Full Transcript
I recently published a video about all the flaws in Markdown. And you all really loved that one. Uh, kind of. You definitely weren't super rude to me in the comments about how I'm click baiting and terrible and then watch the video and realize I actually had things to say about Markdown. But yeah, it seems like Markdown's a sensitive topic for a lot of people because it does have a lot of strengths. I mean, I've been a Markdown fan for the better part of two decades now. I understand. But I feel like we're overusing it. And it seems like I'm no longer the only one who feels this way.
Thoric from the cloud code team just published an article using claude code the unreasonable effectiveness of HTML. He then followed up with HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using cloud code to generate HTML for me instead. This article is why I have seen some fun hacks around using HTML with tools like claude code. I actually learned a bunch of these from Simon Willis way back when we were doing GBT testing together. I have been amazed at some of the clever things people figure out with HTML and how they use it to be more effective working with agents.
And Thor and I aren't the only ones who feel this way. Carpathy actually came out and said that this works really well, just asking it to structure responses as HTML. He had a lot to say as well. We're going to go into all of this and more. Agents clearly like Markdown and HTML, but there's one more important thing that agents love. Today's sponsor. Building most things with AI is pretty easy, but there's one that's surprisingly hard. Building AI apps with AI. It turns out all the weird things like persisting threads, streaming chats, getting UI to render dynamically on the fly and more just kind of is hard to do.
Believe me, I've tried a lot. E3 chat and T3 code show just how hard it is to get this right. And if Copilot was there when we started, it would have made my life much easier. They call themselves the enterprise ready front-end stack for agents, but I would go further than that because it's not just front end. It's a real full stack solution. Here's a simple app we built with Copilot Kit. We're going to ask it to compare the latest models from OpenAI and Anthropic. You probably expected to just render a really bad markdown table, right?
Nope. You can write custom components in whatever framework or tool system you want. Obviously, we're big React guys here and their React stuff is great. So here it just rendered a custom component listing all of the different models and their prices. But more importantly, it rendered a comparison option here where I as the user can interact, make changes, hit submit, and now it's going to do another pass where the model gets all of the information back and renders another dynamic UI all based on components that we wrote. The code couldn't be simpler. They provide a use component hook that you give a name, description, and parameters, which are just the things it needs to pass.
You know, classical Zod schema here. you've been around for a while, you're already familiar with all this. And then you give it the custom component that actually renders the results. They also have a hook for interactive components right here, which is just as easy to use. Getting the front end and the back end to interact properly in systems like this is incredibly complex. Copilot kit couldn't make it easier, even if you don't want to use their components. I know Ben's already building his own wrappers around it in spelt. Build better interactive experiences with agents at soyb.v.link/copilotkit.
Let's see what Thor has to say about the unreasonable effectiveness of HTML for agents. Markdowns become the dominant file format used by agents to communicate with us. It's simple, portable, has some rich text capability, and it's easy for you to edit. Glad has even got surprisingly good at using AI to make diagrams inside of markdown files. I would disagree. I can't tell you how many times I see it miss pad and have like certain layers just like push out too far on the borders. It is what it is. As agents have gotten more and more powerful, I have felt that markdown has become a restricting format.
I've had a difficult to read a markdown file of more than 100 lines. I want richer visualizations, color, and diagrams, and I want to be able to share them easily. I'm also increasingly not editing these files myself, but using them as specs, reference files, brainstorming outputs, etc. When I do make edits, I'm usually prompting Claude to edit them, which removes one of Markdown's largest benefits. I've started preferring HTML as an output format instead of Markdown, and increasingly see this being used by others on the Claude Code team. This is why you want to start with some examples.
You can see a bunch here. This HTML effectiveness GitHub page. Just be sure to come back and read more after. We will do just that. Oh, yep. I can already smell the Claude code design skill all over this one. Something I've been meaning to do for a bit. I am finally tired of the front-end design skill. And as such, away it goes. I have now removed the front-end design skill because I am so tired of this stale copy paste design format that everything ends up with. Cool. The unreasonable effectiveness of HTML. 20 self-contained HTML files and agent produced instead of a wall of markdown.
Each one trades a document you'd skim for one that you'd actually read. Open any of them directly in a browser. Group by the kind of work that they replace. I can tell from this copy that this was written by an agent. I recently got gifted a sub to Panggram Labs from Slasher and it apparently is really good at detecting AI generated text. From my brief experimentation, it was. So, we're going to start with this article and take a look to see if we are reading AI slop or not. Okay, the site's super broken, but apparently 100% of the text is human written.
Good start. But if we switch over to this site, I'll just grab different sections of the text. Yep, just as expected, the HTML is entirely AI generated. People are already asking, so I'll make it clear. This wasn't an ad. I just have a friend that works at the company now and was skeptical of how good Pangram was at AI text detection. And thus far, it's had a pretty close to 100% hit rate for my test experience. I don't know if the product's any good. I don't know anything about it beyond it being sent to me by a friend.
And yeah, it seems cool and it's fun to quickly check if something's AI generated or not cuz I can usually tell. But now I have confirmation some amount. It's all lossy. It is what it is. Just wanted to see. So we're not reading AI slop when we read his article, but we're absolutely reading AI slop when we read these HTML files. Let's go back to reading these HTML files. He shows some examples with exploration in planning. When you're not sure what you want yet, ask the agent to fan out across several directions and lay them next to each other so you can point at one instead of reading three sequential walls of text and trying to hold them all in your head.
Once you've picked, turn the pick into a plan that the implementer can actually read. So here it made HTML for three ways to implement debounced search. It generated an inline use effective set timeout, a custom use debounce hook with pros and cons under each as well as a tiny external library that it could use. It had a recommendation at the bottom. I'm okay with this just being in the chat, but I can see why in certain cases this would be worth breaking out into an HTML file. Visual design directions for the no tasks yet state.
So, this is it mocking four potential directions for the UI. This I could see being useful. I've already done a trick similar to this before, and I've shared this in many videos where when I want different design renditions for something, I don't ask the model to do one and then ask again for it to do another. I ask you for four distinct options that it will put on different routes so I can take a look at all of them just because I want to see them all. And if you ask it to generate them all at once, you'll get more variety than asking one after another because it's going to just go down a similar path otherwise.
Another fun example that he has here is implementation plan, a full road map with a nice little UI for it. It's fine. I think there's a little too much going on to counteract the markdown. I might just be crazy here. I personally like just reading markdown. So, this I don't necessarily love, but I could see why it would be valuable in certain places. How about the PR writeups? I like the idea of basically anything that gives you a better hierarchy for PRs. I also like the idea of being able to like open and close parts of this to make it easier for me to visualize things.
Like, I see some of the potential here. I'm not in love with this yet. So, I'm going to skip back to the article and see how I can be convinced by Lurk's words instead of his examples. So, why HTML? We start with information density. Here's a set of different types of information. A single HTML file can carry. Tables, design, illustrations, code, images, spatial workflows, and interactions. Good luck getting it to include images, especially if it can't generate them. I I cannot tell you how many times I've seen models hallucinate B 64 encodings or URLs even to this day for images.
It's actually insane. Actually, on that note, I had a funny one recently where I was asking GPT 5.5 to find me images of a specific thing that we were talking about. I think it was like literally for some I was installing in one of my doors, like a carpentry task. I watched in its reasoning trace. The user asked for us to find images on the internet, but it's in our system prompt that we must generate an image whenever the user asks. So, we're going to generate images. And then it went and generated a shitty image instead of finding the one that I was looking for.
So, these models still suck at outputting images. They're better at recognizing the contents of them, but actually finding an image and putting them somewhere, good luck. Have fun. Back to the article. HTML can convey much richer information compared to markdown. It can of course do simple document structure like headers and formatting. But it can also represent all sorts of other information like tabular data using tables, design data with CSS, illustrations with SVGs, code snippets with script tags, interactive using HTML elements with JavaScript and CSS, workflows using SVG and HTML, spatial data using absolute positions and canvases, and images using image tags.
I'd go so far as to say that there's almost no set of information that Claude can read that you cannot fairly efficiently represent with HTML. This makes it a highly efficient way for the model to communicate in-depth information to you and for you to revy it. I found that in the absence of being able to do this, the model may do more inefficient things in markdown like as diagrams or my favorite estimating colors with Unicode characters like in this screenshot from Cloud Code. Also notice the uh padding issue I mentioned before where it just [ __ ] up the rows and doesn't have the right number of spaces.
So the actual border is just offset slightly. I see this all the time. The next section is visual clarity and ease of reading. As Claude is able to do more complex work, it's also writing larger and larger specs and plans. In practice, I found I tend to not actually read more than a 100line markdown file and I'm certainly not able to get anyone else in my org to read it. But HTML documents are much easier to read. Quad can organize the structure visually to be ideal to navigate with tabs, illustrations, links, etc. It can even be mobile responsive so that you can read it differently based on your form factor.
There's also the ease of sharing. Markdown files are fairly hard to share since most browsers do not render them natively well. You often have to add them as attachments to emails or messages. With HTML, as long as you upload the file, for example, to S3, you can share the link easily. Your colleagues can open it wherever they wish and easily reference it. I have a feeling those examples do not work well on mobile. Yeah. So, wherever they want is not necessarily true as you see here. So, not everywhere they want. They say it can be mobile responsive.
So, you read it differently based on your form factor, but the examples that were shared here weren't. And the more of those things you address, the more bloated the HTML gets. and at some point you're just burning output tokens. He does also say that the chance of someone actually reading your spec report or PR write up is much higher if it's in HTML. The idea I have to reconcile here is how much of that is because the markdown is too bloated and the HTML makes it more readable versus how much of that is that the HTML is novel enough that it's easier to jump on and read because there's less things like that that you're being handed.
Like if everyone who is spamming us with giant markdown files was to switch over to spamming us with HTML links, how much more of that would we be reading, I think it would be better, but I don't think it's as much better as it is now. The novelty of HTML is a big portion of the value right now. The interaction part is really strong, too, though. I will give him that. HTML can allow you to interact with the document. For example, you might want to ask it to add sliders or knobs to adjust the design or allow you to tweak different options in the algorithm to see what happens.
You can also ask it to let you copy these changes into a prompt to paste back into cloud code. He has a bunch of examples of these types of playgrounds and two-way interactions that he's built. There's also data ingestion. Why use cla code to make HTML files instead of cla or claude design, for example? One of the biggest reasons is all the context that cla code can get. For example, when writing this article, I ask Claude Code to read through my code folder and find all the HTML files that I've generated, group and categorize them, and then make an HTML file with all the diagrams representing each type.
The diagrams you see in this article are a direct result of that. Besides the file system, cloud code can find additional context using your MCPS like Slack, linear, etc. Your web browser with Claude and Chrome, your git history, etc. Actually, a valuable point here. I have done a lot of this type of thing where I have like a bunch of data on my computer or in some folder somewhere and I just ask an agent to go through it and visualize it to make something more useful. I've done this many times. I'm beginning to think that, especially when he mentions the HTML files that he has just scattered across his computer, that something we might need here is a better skill that describes the shape of the HTML, tells it can use Tailwind or whatever, gives it a structure for what directories to leave it in so you don't clog up your git history.
I'm going to think through what a good skill for this would look like, and play with that in the future. I'll report back for sure. The next point he has is that it's joyful. Making HTML documents with Claude is just more fun and it makes me feel more involved and invested in the creation and that itself is enough. Honestly, that's the biggest point here. If this makes you engage more with the output of your agent so that you get better results and are more involved in the process of steering it, there's a lot of value there.
It's silly to say it that way, but if the thing makes you more excited to build and focus and iterate, that is good. It's the same reason I've been in favor of things like Vim. If setting up Neo Vim makes you want to code more and build more, awesome, do it. I'm all in favor of things that get developers excited to be more involved in the process and to create more things. That is always what I push for. So, how do you get started? I'm a little bit afraid that people will read this article and turn it into a/HTML skill or something.
Look, I I didn't pre-eread. I am going to do this to be clear, but I'm not going to do just SL HTML. I'm going to do something very different here. I have plans. We'll get to those plans later. probably not in this video, but make sure you're subbed if you're not. By the way, I'm amazed half y'all still haven't hit that button. If you want to see my future HTML skill video, or honestly more exciting, the video I'm going to do trying all of Matt Poc's skills, definitely make sure you're subscribed because there's a lot of fun content in the pipeline.
While there might be some value in a skill like that, I want to emphasize that you don't need to do much to get Claude to do this. You can just ask it to make an HTML file or make an HTML artifact. The trick is knowing what you want the artifact to do and how you might use it. You may over time make a skill, but for now, I suggest that you just prompt from scratch to get the hang of how to use it in different cases. I absolutely agree. In general with skills, you should always start by asking the model to do the thing.
See what it gets right and wrong. Figure out what parts you have to include every time, what parts you don't, and then slowly figure out what those are so you can build a skill to do it right. And also on that same note, one of the best ways to test a skill is to just go copy the text for said skill, paste it into your input box, and hit enter and see what happens. Like it's the same thing. The first use case he has is for specs, planning, and exploration. HTML is a rich canvas for Claude to dive into a problem.
When I start working on a problem, instead of a simple markdown plan, I expect to make a web of HTML files. For example, I might start with asking Claude code to brainstorm and create some explorations of different options. I would then ask it to expand more into one, maybe make mockups or code snippets. Finally, when I'm feeling good, I'll ask it to write an implementation plan. When I have a plan, I'll create a new session and pass in all of those files for it to implement. Notice that he says he'll create a new session instead of compaction.
Definitely not cuz the compaction is kind of garbage in cloud code. Or to be fair, it got really bad at GBD55. I hate the compaction now and I used to trust it dearly with 53 and 54. So yeah, I'm starting more sessions than I ever did. Even a simple thing like I just a few days ago did the uh Azure bench thing a very simple benchmark just measuring inference speeds on Azure. I did like 30 plus threads to make this obviously all in T3 code. I wonder why but yeah I realized it's better to just make a shitload of branches and a shitload of threads instead of trying to massage an existing long thread into the shape you want.
I just copy paste the parts I like into a new thread and start from scratch. But it is cool that the HTML files are a good thing to hand to an agent. I bet you could just take one of these HTML files, give it to the agent with no additional context, and it'll usually figure out what you want. Makes sense. Also useful for verification. When he's verifying the results of the work, he'll ask a verification agent to read the HTML files, and it will have a much broader context on what it actually needs to verify.
That's a really useful point. He gives some example prompts here, too. I'm not sure what direction to take the onboarding screen. generate six distinctly different approaches, varying layout, tone, and density, and lay them out as a single HTML file in a grid so I can compare them side by side, label each with the trade-off it's making. I like this a lot. Again, having the model generate distinctly different options in one pass gets you a much better variety than you might otherwise get. The next example he has is, create a thorough implementation plan in an HTML file.
Be sure to make some mockups, show data flows, and add important code snippets that I might want to review. Make it easy to read and digest. Again, just getting used to prompting the model and seeing how it responds to these different scenarios. There is no one guide. And even if there was, as model updates happen, things change so fast. You got to just feel it out with these things. The next section he has is for code review and understanding. Code could be difficult to read in a markdown file. Eh, if you have a good like markdown parser, even one in like VS Code, and I'll just demo this quick.
Let's uh check the skillmd file. Okay, cool. This file is a markdown file. It's actually already quite readable in my opinion. If I make the font bigger, it's even more so. Watch this. Typescript const x equals sub nerds please sub. It does syntax highlighting in code blocks as long as you label it with the language. If I delete that, it stops. I'm amazed how people don't know this. In most markdown parsers, when you do the triple back tick, you can put the file extension for the language and it will do syntax highlighting for that language.
If you include this in the browser via HTML, it now has to figure out how to render the diffs and how to render the code blocks. The process of it figuring out how to do this, much less doing it properly, is as if not more expensive, both mentally for me as the person consuming it, but more so for the agent generating this and generating the tokens, that this is actually just as good. To each their own, but I don't necessarily agree with that piece. I will say rendering diffs sucks in markdown. Code can be difficult to read in a markdown file.
Don't agree there. As I just established with HTML, we can render diffs, annotations, flowcharts, modules, etc. I don't really want these things in my code snippets that I'm reading outside of like a comment. I do want diffs though, so not having diff sucks. Use this to understand code that the agent has written to get code review or to explain a PR to somebody reviewing your code. I find this often works better than a default GitHub diff view. And I attach an HTML code explainer to every PR I make now. Interesting. I see potential for this part.
One of my favorite things a sponsor built recently, and this is not something they're paying me to say. This is just a thing I genuinely think is cool. It's Devon Review. So, if you take github.com and change that to devonreview.com for a given PR, it will do an analysis of this poll request. But the coolest parts are that it will actually change the view to be organized based on the importance and grouped based on the related changes. So instead of it just having all of the files in order, it'll create its own hierarchy where things are.
Apparently I am out of credits on my original comped account. I'll have to deal with that later. Julius has been abusing all of these tools. Clearly, I'm not using it that much if it's broken right now, but the couple times I did, I actually found it really nice and I'm going to go reup because I've enjoyed it a lot. The idea of changing how we review code to be better shaped based on the sheer volume of code that we're reviewing is a good idea and it's one I like because going through all of your code in alphabetical order depending on where it is in your project has always been [ __ ] stupid and it's more stupid than ever now.
doing this yourself as a developer by telling the model to generate HTML, summarizing the code, showing the changes, and making a better hierarchy. That's a good idea. I like this. The other cool thing is that you can steer the model towards the parts you want the reviewers to focus on. So, in this example prompt he has, he says, "Help me review this PR by creating an HTML artifact that describes it. I'm not very familiar with the streaming and back pressure logic, so focus on that. render the actual diff with inline margin annotations, color code findings by severity, and whatever else might be needed to convey the concepts.
Well, I like this idea a lot. Then we have the designs and prototypes. Claude design is based on HTML because HTML is incredibly expressive at design. Even if your end surface isn't HTML, Claude can sketch out a design in HTML and then write it in your language of choice, be it React, Swift, etc. I love that React's a language here. Long debate from the olden days of webdev, but React is kind of a language. You can also prototype interactions like animations, actions, etc. Consider asking Claude to make sliders, knobs, etc. to tune in exactly what you're looking for.
Yeah, this is a little much for a simple checkout button, but I I get why you would want to care this much. This section is one of my personal favorites, using this for reports, research, and learning. Cloud Code's incredibly good at synthesizing information across multiple data sources and converting it into a report for readability. You can prompt Claude to search your Slack, your codebase, Git history, the internet, etc., and use it to generate extremely readable reports for yourself, for your leadership, for your team, etc. This is actually a crazy hack for those who are looking for ways to like stand out using AI at your company.
Give an agent access to your Jira and ask it to make a nicel looking HTML page over what got done every week and then present that during your standup. Like, give people a resource to look at that makes the leaders feel like they understand what's going on. It's stupid, but this is a really simple way to like look cool to your team and to your leadership. Making them feel more informed by running a simple prompt. Really powerful. Yeah, chat's already understanding. Wait, I don't think my manager allows me to use the Jira API yet, but I can do this.
Management will go crazy over this. Yeah, you guys get the idea. Always happy to sneak in these hacks for career growth where I can. There's a lot of fun ones here if you know where to look. You can assemble things like this in long- form HTML documents, an interactive explainer, or even a slideshow or deck. Ask Claude to use SVG for diagrams to help visualize it. I have not seen these models get super good at SVGs just yet, so might not have the best luck there, but maybe. Somebody just said that they got Cloud Code access at work last week.
Thanks for the idea. Anytime. Yeah, you can do this without having one of those either. You can spin a pretty simple API calls to DIY this, too. If you're curious how you would do this without cloud code, maybe watch my video on how harnesses work. I have a whole dedicated video on how cloud code works. That's more generic. How to get AI to write code and do real stuff. You can build your own solution internally that doesn't require cloud code that has all of these same powers. So definitely worth considering if you haven't done stuff like that yet.
An example that he has here is he made a post on prompt caching. In order to do that, he asked Claude to prepare an in-depth research file in HTML for him to read on all of their changes to prompt caching after reading the git history. That's actually really cool. And again, like your prompt doesn't always need to be honest about who you are. Sometimes it's useful to take a different perspective to steer the model the right way. Even if you are fully aware of how the rate limiter works, telling the model that you don't actually understand it is a really powerful way to get it to write and create artifacts for an audience that isn't you.
I find a lot of devs actually get hung up on this. They don't know how to talk to the agent outside of their own voice. They only know how to tell it what they specifically want and know and do, not how to move past their own knowledge to get the model steered towards something that isn't for them. It's easy to make an agent create things for yourself. It is a very different way to think to steer agents to make things for others. And this is a silly but really good example here. just saying I don't understand how a rate limiter works.
Read the relevant code and produce a single HTML explainer page, a diagram of the token bucket flow, the three to four key snippets annotated, and a gotcha section at the bottom. Optimize it for someone reading it once. I do this all the time. I would argue almost half of my prompts are me pretending to be dumber than I am so that the model will steer in the right direction for my users and then I take the reins back whenever it goes where I don't want it. Here's another fun one, and I've actually done this a few times before.
using HTML to create custom editing interfaces. Sometimes it's hard to describe what you want purely in a text box. In this case, I'll ask Cloud to build me a throwaway editor for the exact thing that I'm working on. Not a product or reusable tool, but a single HTML file purpose-built for this one piece of data. Yep. A lot of devs still aren't over the fact that like code is expensive, so you only should write it for reused things. Code is cheap now. Writing a bunch of code just to play around with some data to make a good decision about other code you're writing is absolutely worth it.
of the code I write right now write obviously in quotes I would say about 70 plus percent of it immediately gets thrown out after I run it one time if even more devs need to internalize this way of thinking because you can make much better stuff for your end users if you build custom tools to get there that are only ever used once and thrown away. The trick here is to always end with an export a copy as JSON or copy as prompt button that turns whatever I did into the the UI back into something that I can paste into cloud code.
Ready for a silly example of this? When I was trying to waste as much of GitHub's money as possible with my abuse of Copilot, I wanted a good way to visualize that. And uh yeah, Anthropic makes a really shitty broken site, so I can't show you apparently right now, even though it generated me a nice little web app that I could use inside of Claude to calculate how much money I had spent and then import and export a CSV that just took upwards of a minute and a half to [ __ ] load despite being a single JavaScript file.
God, the I've been so nice this whole video. I I will take this brief tangent to emphasize just how [ __ ] terrible the engineering at Anthropic is. It is genuinely [ __ ] hilarious how bad these guys are at code. That was insultingly bad. It makes sense that he only mentions artifacts once in this article because they're [ __ ] useless and no one should use them because the actual implementation on the anthropic site is [ __ ] garbage. Which is why I tried it for this cuz I wanted to once again see if it's made any progress. The answer is it [ __ ] hasn't.
The models can make a nice little artifact that lets me have this fancy interface for putting in values. I can import from the table. I can export the table to this view and very easily copy paste the CSV. I made this so I could quickly blast through all of the data for all the agent runs I had to see how much money I was able to waste of GitHubs. But god damn, Anthropic cannot code for [ __ ] You get the idea though. The creating of a UI to play with these things and an export format that you can paste other places be it is a state that you want to reuse later or values that you want to pass to your agent.
Very powerful. Thinking in this way will be beneficial. And that's really what I want to like make this video for and push you guys towards thinking in terms of inputs and outputs beyond just the flows of your code. How do you use new code for one-off problems that lets you import and export data that is valuable to your agents? You can build much more effectively as a result of thinking this way. It has fundamentally changed how I build what I build and how I manage my team, how I process data, all of these things. realizing that every time you have a database, a CSV, a JSON blob, that's your data from something that you did, or even just a prompt with some complex values in it, generating code to help smooth out the edges and give you the info you're looking for is super valuable.
He has an FAQ section here, and I'm sure a lot of you guys have the same questions. The first is the big question. I've seen this a lot. Isn't this less token efficient? I've even seen people going as far as saying that Thor's only writing this in order to get people to waste more tokens to make anthropic more money. I think those people are dumb. It's a funny joke, but it's not very real. While Markdown does often use fewer tokens, I found that the added expressiveness of HTML and the much higher likelihood of me reading it means that I get overall better outputs.
With the 1 million token context window in Opus 47, the increased token usage is not really noticeable in the context window. That's a silly point to make. Next, he has when do you use markdown for now? I've honestly stopped using markdown altogether for almost everything, but I'm probably far on the HTML maximalist side of things. That's a little insane to me. Moving out of Markdown entirely is crazy. How do you view the HTML files? He doesn't just open them in a browser locally. You can even ask Claude to open it or upload to S3 if you want a sharable link.
Fine. There's evidently an opportunity for a micros service for this, by the way. Wink. What about version control? This is honestly one of the biggest downsides of HTML. HTML diffs are noisy and hard to review when compared to markdown. How do I get Claude to match my taste and not make it all ugly? The front-end design plugin helps Claude make good HTML files, but to match your own company's style, you can create a single design system HTML file by pointing Claude at your codebase. You then use that design system file as a reference to other HTML files.
How they didn't call out the skill for this, but there there's a direction here where you could make a better primitive that is still HTML, but lets the agent make things look better, more portable, have better version control, have better hosting, better context on what else is going on. There are ways to adjust this, but until we move from claude MD to cla HTML, this is just an experiment as far as I'm concerned. I do like Thor's endpoint, which I'll get to in a sec because I want to remind you guys we still have to read Karpathy's thoughts as well.
So don't skip just because I'm at the end of this. He ends with stay in the loop. All of the above is to say that I think the real reason that I use HTML is that I feel much more in the loop with Claude. I'd begun to fear that because I'd stopped reading plans in depth, I would simply have to leave Claude to make its choice. But I'm happy to say instead that I feel more in the loop than ever before when I use HTML. I hope that you do, too. I'm going to try this a bunch and report back on how I feel.
So keep an eye on my channel if you want to see that. But Carpathy's already made the move and I want to hear what he has to say. This works really well, by the way. At the end of your query, ask your LM to structure your response as HTML, then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, in my opinion, audio is the human preferred input to AIS, but vision like images, animations, and video are the preferred output from them.
I think I agree. I am always surrounded with my team, so I don't get to use audio in as much as others do. And I'm not like Prim. I don't walk around talking to my phone to send texts. I I respect the balls he has to do that. The number of times I've been hanging with Prime and he like just pulls out his phone and like voice to text something to his wife or to his kids is adorable and I respect it. I'm too insecure to do that myself. But there are some people who will just prompt all day who will speak to their phone to talk to their family to send prompts whatever else and all power to them.
Around a third of our brains are massively parallel processors dedicated to vision. It's the 10 lane superighway of information into the brain. As AI improves, I think we'll see a progression that takes advantage of that. I totally agree. I think vision is how people like to process info, and it's really nice to be able to like listen to music and watch something that my computer did at the same time. Text is a little more process heavy, but yeah, you get the idea. He lists these examples of things. Raw text, which is hard and effortful to read.
Markdown, which is a bit easier to read once you have bold, italic, headings, tables, all of those types of things to the current default. Then you have HTML which is still procedural with the underlying code but it's a lot more flexible on the graphics and layout even interactivity side early but it's a forming new good default and eventually we get further and further until we have interactive neural videos and simulations. Interesting hypothesis he has here where he thinks we're going to progress to more and more visual interactive outputs for models. Very curious to see where this goes.
IMO the extrapolation though the technology doesn't exist just yet ends in some kind of interactive videos generated directly by a diffusion neural net many open questions as to how exact and procedural software 1.0 show artifacts like interactive simulations may be woven together with neural artifacts like diffusion grids but generally something in this direction of the recently viral uh what is this that he's linking interesting imagine every pixel on your screen streamed live directly from a model no HTML no layout engines no code just creating this UI dynamically very interesting there are also improvements necessary and pending at the input audio nor text nor video alone is enough I feel a need to point and gesture to things on the screen similar to all the things you would do with a person physically next to you and your computer screen.
Okay, I cannot tell you guys how often I've done exactly this. I'm going to just show screenshot edit. I use Shotterter. There's lots of other screenshot utilities. I'm going to just highlight this section, copy this, paste it to codeex. Cool. What are your thoughts on this paste image? Actually, I can't see the image here because I'm using a terminal. No one should use terminals for these types of things. So I will just hop over here instead. Paste the image. What are your thoughts on this? And now I can actually see the image I just pasted because using gooies is nice.
Shout out T3 code. Fully open source. Works with all of your terminals. This is just a different view for the same thing. Yeah, I can't believe we're all just living with pasting images in a UI where we can't even see them. The amount of cope and like copy pasting is broken too because it has to handle all the formatting and the ter. Why are we still using [ __ ] terminals? Yeah. And here we get a much better formatted output in a UI that is much less bad. Something I'm now thinking about is what would it look like if in T3 code we asked the HTML output thing similarly and it could actually output custom UI in body here.
I might play with this idea in the near future, but part of T3 code being open source is that you can go add this yourself. If you're interested in what it would look like to let agents respond with HTML inside of your agent to coding experience, you have to leave here to see it. And maybe you can click a share button to send it to somebody else. There you go. Fun opportunity. Play with it. Carpathy closes with the following. The input and output mind meld between humans and AIs is ongoing and there's a lot of work to do and significant progress to be made way before jumping all the way into neural linkes brain computer interfaces and all of that.
For what's worth exploring at the current stage, hot tip, try asking for HTML. We have a shitload of exploration to do as an industry here still. We are still figuring out the output formats that make sense. We're not even close to figuring out the interfaces. The fact that we're still so attached to our terminals shows how far we are from getting this all right. And in a world where we can make better output formats in better UIs with better control and customizability, we can make better software as a result. That said, HTML is just the starting point.
And while Karpathy seems to think that the direction we're going in is interactive videos and simulations, I think we have some important steps to go through first. in chat has correctly identified. I think MDX is an untapped market. All roads lead to React.
More from Theo - t3․gg
Get daily recaps from
Theo - t3․gg
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









