I Found The Layer OpenAI and Stripe Are Fighting Over.

AI News & Strategy Daily | Nate B Jones| 00:23:17|May 6, 2026

Chapters10

The talk argues that the real strategic shift is not just AI clicking buttons, but defining and controlling the work primitives that actions perform, including who can do them and how results are checked. It outlines three layers—Access, meaning, and authority—that govern how agents interact with computers and why semantic understanding matters for durable value.

A concepts-first look at how AI agents should understand work, not just click buttons, and why semantic meaning—more than tool access—will define the next platform wars.

Summary

Nate B. Jones argues that the real battleground in AI agents isn’t merely button-clicking or browser control, but who defines the semantic meaning of work. He outlines three layers—access, meaning, and authority—and explains why the deepest leverage comes from semantically meaningful units of work (like a calendar invite or a refund) rather than generic API access. Codex-style computer use is essential for agent reach, but it’s a shallow bridge unless agents understand context, intent, and consequences. Jones contrasts the hyperscalers’ model-centric approach with semantic-first strategies seen in perplexity and enterprise tools, stressing that durable software must expose meaningful work primitives, permissions, and reviewability. He stresses the need for richer connectors, contracts, and governance to prevent costly missteps when agents act autonomously. The roadmap, he says, is not just better models but software engineered to be agent-readable—where actions have defined meanings, permissions, and verifiable outcomes. In short, the future hinges on meaning over mere access, with notable implications for CRM, commerce stacks, and browser vs. computer strategies.

Key Takeaways

Three-layer framework for agents: access, meaning, and authority—each layer determines how an agent interacts with tasks like calendar changes or refunds.
Semantic work primitives (not just UI actions) let agents understand what a task really is, who owns it, and what success or failure looks like.
Codecs, MCPs, and plugins enable access, but durable autonomy comes from exposing semantically meaningful units of work with permissions and reviewability.
In practice, a calendar invite or a refund becomes risky when the agent doesn’t grasp human intent, timing, or downstream effects across teams.
Hyperscalers lean into model-driven tools, while perplexity and enterprise players push for semantic meaning and cross-app workflows to create durable AI-enabled software.
The future platform fight will hinge on which layer—models, browsers, identity, or domain semantics—owns the meaning of work and how it can be safely automated.
A practical takeaway: add robust connectors and plugins to expose richer semantics, not just broader access to tools.

Who Is This For?

Product leaders, AI strategy stakeholders, and engineers building agent-enabled workflows who want to shift from tool access to meaningful, governance-ready automation.

Notable Quotes

"The real primitive here is not the ability of the agent to use the computer. It's not even the browser tab for web browsing. The real primitive, the foundation on which we're building, is a semantically meaningful unit of work."

—Defines the core concept: meaningful units of work drive durable autonomy.

"If there's a connector, use the connector. If there's a proper protocol, use the protocol. If the system exposes a typed object and a permissioned action, use that."

—Emphasizes the hierarchy of meaning over generic UI actions.

"The future is software where the button is no longer the primitive. The primitive is the action behind it, described, permissioned, reviewable, and reversible where possible."

—Capsulates the long-term vision for agent-ready software.

"Three layers to keep in your head: Access, Meaning, and Authority. Those are all layers that agents can touch."

—Outlines the architectural model for agent interaction.

Questions This Video Answers

How do semantic work primitives change risk when AI agents act autonomously in business processes?
What is the difference between browser-based access and semantic access for AI agents?
Which companies are leading the push toward semantic meaning in AI workflows (e.g., Salesforce vs SAP) and why does it matter?
How can startups design software to be agent-readable from day one?
What are MCPs and why are plugins crucial for durable AI-enabled workflows?

AI AgentsCodeex Claude PerplexitySemantic meaning of workComputer useMCPs APIsBrowser vs ComputerWorkspace governance

Full Transcript

This is a piece about the strategy that we have to build as product leaders when we think about where agents play best. And what I'm asserting is that the work primitive is what really matters. And a lot of us are assuming that the agents ability to use the computer sort of levels the playing field because we can all sort of put our programs out there and the agent can use it or we can build an MCP server and it's just going to work. I want to suggest that there's a deeper strategy in play that some of the hyperscalers understand and that needs to be more widely shared and understood. I don't want us to get stuck in a world where we just build demos that look good in a Twitter video and we're not thinking about that more carefully. So, let's dive in and understand what happens under the surface when you see an agentic workflow and what we mean by controlling a work primitive because it's a new term. So, we're going to define it. We're going to explain what it means and we're going to explain why it's value. Let's jump in. When an AI agent opens a browser and moves through tabs and clicks buttons and fills out a form or checks your calendar and it can do all of that now, it feels like the model has crossed a line. And I will say specifically, Codeex computer use can do that. It is no longer just answering questions. It's doing real work for you. But I think that the visible work that the model does is distracting us from the platform shift underneath. The future is not an AI that gets really good at clicking buttons for you. That's the bridge. The real fight is over who defines what the button means. Because once agents start acting inside companies, the question is not just can it click a button for me. The question is does the system understand what kind of work is being done, who's allowed to do it, what could go wrong, and how the result is checked. I've seen this personally just in using codeex computer use. I feel like I'm running into these sort of friction points that I never would have expected to see because I am now using an agent on my computer at the same time as I am on that computer. And so I'm trying to figure out what does it look like when we have a different set of permission states for agents versus people. Let's jump into the details here. There are three layers to keep in your head. Access, meaning, and authority. Those are all layers that agents can touch. Computer use lets agents access parts of the computer. Semantic work primitives gives agents meaning. So there are three layers to keep in your head as we go through this video. The layer of access underneath it the layer of meaning and deeper still the layer of authority. Computer use is what I've been messing with. It gives agents access. Semantic work primitives give agents a real sense of meaning. And the companies that control those primitives are the ones that end up with real platform power. So there's three levels, right? And that sounds abstract. So we're going to start with something really simple. Imagine an AI agent moving a calendar invite. I've had Codex do that on the screen. That looks like changing a time and clicking save. But the action is not really click save. It may notify five people. It may move prep time. It may break a commitment someone made to a customer. It may turn a private conversation into a meeting that now conflicts with something more important. The human sees a calendar event and brings all of that context with them. The software seeds fields in a database, right? The agent sees that it needs to fill out the calendar and just do the job. it doesn't necessarily understand the human intent behind the meeting. And the human intent behind the meeting, making that more legible is what I mean by a semantic work primitive. It's a fancy word, but it means basically does the computer understand what it's doing and what we humans need it to do when it does a task or is it just using the fields? And that's a big difference. The same thing happens with checkout. A button that says buy is not just a button. It represents money, user consent, tax, merchant identity, fraud risk, fulfillment, returns, card security, and maybe a dispute a few weeks from now. Or take deleting a file. One file might be harmless cleanup. Another might be the only copy of a signed agreement. On the screen, those actions can look identical. In the work, they're very different. So, yes, agents need to use computers. They need browsers. They need desktops. They need to survive inside software that was built for people. But computer use is not a long-term moat. Computer use is like how agents reach the old world, right? The thing that makes agents really valuable long term is the layer that tells the agent what it is touching and why it matters. And right now we're kind of we're kind of getting hints of that. So the auto review feature in codeex basically is there to guard human intent and ensure that the agent using the computer is actually using it to do the right task. I love it. It works pretty well, but it feels like an initial draft in that direction because it's very much a guardrail tool. It's there to guard rail the agent and keep it from doing something it shouldn't. That's good. I want it to do its job, but that's different from positively ensuring that agents have the semantic meaning they need to really deeply understand my calendar. Calendars are complex things. deeply understand the email context for a relationship I've had for three and a half years with someone when they write one message. That's a larger piece of context. And look, I get it. Most of the world is not agent native. And the fact that we have computer use is hugely helpful. The fact that we have jumped in just a few months to the point where it's useful is a godsend. Companies are full of software that assumes a human is sitting there interpreting everything, right? Internal dashboards, procurement tools, share drives, government websites, Excel workflows, the whole thing, right? Like the all of computing assumes a human will use it. If an agent cannot use a computer visually in that world, it cannot reach so much of our work. It is stuck inside the clean modern API friendly part of the world which is much smaller than people in tech want it to be. So computer use is absolutely necessary. It is the universal adapter for the messy middle period. It's kind of like screenshots, right? It just is going to be a universal adapter. But a universal adapter is typically a shallow interface. A screenshot can show the agent what is on the screen, but it does not automatically reveal the structure underneath. A browser can reach almost every web app, but it does not automatically know the domain meaning of each workflow. A desktop controller can click a button, but it does not automatically know whether that button is reversible, whether that button is financially material or dangerous. The agent can guess, and the guesses are getting much, much better, but guessing is not a strategy for high consequence work. If an agent is helping you summarize an article, then guessing it's probably something you can fix. If it is deciding whether to issue a contract, that's a different thing entirely, right? If it's deciding whether to email a customer, that's a different thing entirely. Or spend money, you have to be sure. And this is where the hierarchy of meaning becomes clear. Agents should use the richest semantic interface available. If there's a connector, use the connector. If there's a proper protocol, use the protocol. If the system exposes a typed object and a permissioned action, use that. only fall back to a browser or desktop control when the richer interface doesn't exist. This is not just engineering preference here. This is how things should be architected. And as far as I can see, this is generally how the hyperscalers have built their models. Codeex works this way. Claude prefers to work through MCPs when it can. And I think that's correct. Ultimately, it is that hierarchy of meaning that ensures that we get the richest possible experience for any given task. So, we're not likely to have as many issues as long as we have as many connectors as possible plugged into our preferred AI systems, which by the way is, pun intended, a plug for you adding plugins to your chat GPT, to your codecs, to your cloud. Make sure it has those rich tools if they are available to you. And increasingly, so much of our work, we have MCPs or APIs that are already pre-built as plugins for these tools. You should add them. That is just a very practical takeaway here. If you want your agent to not have to use computer use all the time, add the plugins, add the connectors. All of that is there just to facilitate access, right? The model needs access to tools. The agent needs access to the browser. The assistant needs to access your files. You get the idea. But access only gets the agent into the work space. It doesn't make the work understandable. The next layer that we are just getting to now is meaning. What is this object? What action is being proposed? Who owns it? Who's allowed to change it? What happens if the action succeeds? What happens if it fails? Is it reversible? Does it touch the money? Does it touch customer data? Does it touch production? Does it create an obligation outside the company? Does it require approval? Can another agent review it? Can the system tell whether the outcome is correct? These sound like governance questions, but they're really product questions. The more clearly a system can answer those correctly, the more autonomy it can support. the less clearly it can answer them, the more the human has to sit there supervising. This is why I think describing the agent having the power to write as just a you trusted to write. Trusted write access is the engineering term. That's too small a way of picturing what we're doing here. Trust is not a switch. An agent might be trusted to read but not write, draft but not send, stage but not deploy, recommend but not approve, change a sandbox but not production, write in one space but not another. All of those distinctions depend on semantics. If it cannot tell the difference between issuing a refund from your chosen Shopify shop versus issuing a refund from your Stripe, you're going to have problems as well. If it cannot tell the difference between staging and production, which by the way, there were real production systems deleted as a result of exactly that issue, then it shouldn't be anywhere near the deploy button. So, the real primitive here is not the ability of the agent to use the computer. It's not even the browser tab for web browsing. The real primitive, the foundation on which we're building is a semantically meaningful unit of work. A refund, a reschedule, a payment authorization, a compliance exception, a meeting brief. All examples of this, right? Those are things that agents need to understand as units of work. Human software hides them behind buttons and forms, but humans have always understood them intuitively. Agentnative software needs to expose them directly. This, by the way, is why coding agents arrived first. This, by the way, is also why coding agents arrived first. It is very tempting to say that coding agents worked first because code is text and language models are good at text. That's part of it, but it's not the whole story. Coding agents worked first because software development already has unusually rich work semantics. A codebase is not just a pile of text files. It has modules and dependencies and tests and type systems and llinters and package managers and get history etc. Right? It has all of these things. That means the agent can perceive state and act on state and observe feedback and revise its actions. It can inspect the repo. It can edit a file. It can run a test. It can see the error. It can change the implementation and hand the result back. The loop is powerful because the work environment itself gives the agent semantic feedback. The human doesn't have to answer every 30 seconds, is this right? If the test is failing, the agent can just tell it's wrong. In other words, when we are talking about coding tests, we are not just talking about verification artifacts. We're talking about semantic meaning artifacts. They tell the agent what world it's operating in. Most knowledge work is not like that yet, right? A strategy doc doesn't have tests. A calendar has events, but the importance of those events is hidden behind politics and priorities and relationships. A sales process might depend on unwritten account history. Often, it does. A procurement decision may depend on budget, timing, and risk tolerance, which isn't written down. Agents can help in those domains. They already do, but the environment doesn't give them the same density of meaning that a codebase would give them. This is why coding is a wedge. Not because all work automatically becomes coding or every worker becomes a coder. Coding is a wedge because code is legible enough that an agent can facilitate and participate in it without a human being a full-time supervisor. So once you see the world that way, products like codec stop looking like coding tools and they start looking like labs for where the future of work is going to be. And that's where the product strategy starts to get really interesting. The model is still central, right? Better models definitely matter. Faster models matter. Reasoning matters, but the model alone is not the product and hasn't been for a while. Because to do work, a model needs to be in a harness that can enable it to access and operate against a system that has semantically meaningful units of work. And if you want it to be non-coding work, then the non-coding work has to be semantically meaningful. So harnesses really matter. Harnesses help the agent access the work. But you also have to make sure that the work that's being accessed is actually done in a way that makes sense. The whole point of an agent doing the work is to reduce the amount of attention I have to spend coordinating the work. If I still carry all of that harness intuition that makes the semantic meaning of work legible inside my head, I'm not getting very far. If I carry all the meaning of my three calendars and the agent can't figure it out, we're not getting very far. And I want to be blunt here. I know that this is a hard problem but it is exactly the hard problems that are valuable to solve. This is basically a free road map if you are a startup because as a startup you want to be in a position where you can solve problems that are not easy for someone else to come and grab and one of those classic problem shapes is make a semantic meaning of work legible to agents today. Don't just rely on a standard MCP interface. Try and break it. Understand where it's not working. understand where it connects two levers, but the agent doesn't know how to reliably drive the levers from a prompt because there's something else about understanding the task that isn't there. I get super passionate about this because if we don't have agents that understand the meaning of work, we get bad calendar invites, decks that feel like they're off on tone, but we can't explain why. We get refunds that are issued to customers that shouldn't be issued to customers. All kinds of things go wrong, not because the agents can't control the system, but because the semantic meaning of your work is not available. Now, in the article that I'm writing for this on Substack, I spend more time on getting into the commerce stack, understanding the difference between discovery and checkout and infrastructure and how our agentic commerce strategies are shaped by this approach, by how we understand semantic meanings of work. Because there's a critical semantic layer to agentic transactions that's super important. But for our purposes today, we're going to assume that you realize we have to have a semantic meaning to transactions. that transactions themselves are part of the semantic meaning of work. That there's a whole strategy there. And we're also going to put a pin in that and look at something that is more tangible and easy to understand in a quick video. And that's perplexity. Perplexity strategy is super interesting here. If you think about it from a move to the semantic meaning of work perspective, a lot more makes sense. This is why perplexity has to move toward products like Comet and computer and personal computer long term. It needs to get away from search per se and closer to the browser, the desktop, the files, the apps, the workflows where research becomes action. That move makes sense. The browser is where a huge amount of work already happens. Email, documents, dashboards, SAS apps, analytics, shopping, calendar, support tools, customer systems, internal tools, they all collapse into tabs. An agent inside the browser can see context between web apps and compare pages and take multi-step actions. And it just becomes legible because it sees your work. And this is why browsers in AI are interesting and why one of the things that is really undecided in 2026 is who is going to have the AI browser. If Perplexity becomes an AI browser for someone else's tools and other tools plug into it, it gets durable control here because it manages the browser that can see your calendar and the calendar system owns your recurrence and your attendees and your notifications and your meeting state. It can see your GitHub. It can see whatever you're logging into. But the browser war is not just about which company gets closest to the user. It's about whether the browser can assemble cross-domain meaning for you. If perplexity owns the browser in comment, can it build a durable work graph above the underlying apps? Can it turn search results into structured actions with permissions and validation and review? Can it remember the user's projects and policies in a way that makes work easier or does it remain just an operator of interfaces? And that is the trap for any kind of search native or browser native agent. And that is why even though browser is a play for perplexity, perplexity also has to move to the computer because if they're not on the computer, if they're not handling those compute files I talked about that are close to semantic meaning where it basically has an open claw in your computer, perplexity personal computer and it touches files. It touches these compute primitives I was talking about earlier in this video. It still has kind of a shaky hold on the semantic meaning of work. Basically, there are two big plays going on right now to figure out how agents will do meaningful work in the world. Play one is to start from the semantic meaning of work out here in the real world where we do work and work back to the agents. It's the only play a lot of people who aren't hyperscalers have. That's why I chose the perplexity example. The other play is the play that's available to the hyperscalers and that is to start from the models themselves and their ability to understand and use code and move out through computing primitives to figure out how to do work from there. And that is why people have made a lot of hay out of the fact that Claude and Codeex have not too many tools but use those tools super super well to do tasks. They are close to the computer. They use the tools that make sense for them. They're allowed to compose tools to accomplish complex workflows and their ability to understand the semantic meaning of code turns out to be a good general unlock for a lot of other work. But the thing is the bridge in between those two approaches has some holes in it. If you're just coming from the computer side, as I've been sharing many specific examples of, your computer may not fully understand the purpose of the work it's doing. Your agent may not fully doing. The calendar example is a good one. If it moves the calendar invite, does it really realize it's inconveniencing two or three other people you don't want to mess with? Probably not. On the other hand, if you're coming from the semantic meaning of work, if you're coming from making sure that you understand how to bundle that together and make it useful, sort of like perplexity is doing, you have to think about it and say, am I ready to make this bridge into the hyperscalers and where do I plug in? And what Perplexity has basically decided to do is to say, we welcome all models. We're going to be the shop where you have all models. And our focus is going to be making these semantic units of work very very legible and easy. And that's why personal computer is full of specific workflows for knowledge work like finance. They're so far in on finance. And so you kind of have to pick a lane, one approach or the other. And if you're not a hyperscaler, the lane's been picked for you because you don't have a gigantic model that you can use to do code with that doesn't belong to someone else that you're renting. And so when you think about it that way, the world becomes simpler. Humans need clear interfaces. Agents need clear semantics. The best software will provide both. It is going to stay simple for people while making underlying objects and operations really legible to agents. And that is going to generate a software where AI and humans can coexist together. And that's what this video is really about. software that is ready for AI to tell the agent what exists, what can be done, what each action means, what permission is required, how the result should be checked, and what happens next. That is a way way higher bar to software than I see for most software today. It is the it is the future of software in 2026. That is your roadmap if you are not doing that today. So the coming platform fight is not going to look like one company simply winning AI. It's going to look like a negotiation across the whole stack. Model companies want broad agents that can operate across domains. Browser companies want to orchestrate work across applications. SAS companies want to preserve authority over domain semantics. Identity providers want to govern authorization. They all have their interests, right? The question is going to be which layer owns the meaning of work? Which layer owns the meaning of work that the agent can read? And every software company is going to have to decide how much semantic access to expose and to whom. If you expose too little, generic agents will operate clumsily through the UI. If you expose too much, the product risks becoming back-end infrastructure for someone else's agentic interface. That is the tension that anyone in software is facing today. This is actually a great tension exemplified by uh Salesforce 360 versus how SAP is handling agents. SAP is locking off agents right now. They don't want agents to use their products. Salesforce is going the other way. They're saying they're leaning into agents and saying, "Let agents operate across our substrate and grab MCPs and grab APIs and we're going to be headless from the get-go because we know that's the future." I think Salesforce is more correct here, especially from their perspective as a system of record. They want to be a system of record that's sticky and so they want to be legible semantically to agents and humans. And I think that's a good example. I think SAP is not going to last with that approach. like SAP deciding, eh, we're going to say no no no to agents is like sticking your head in the sand when the title wave is coming. Pardon my mixed metaphors. It's going to be a disaster. And under this deeper test for semantic meaning, a lot of flashy products start to look thin. A model clicking through a website is great today. It does do work. I'm glad it works, but it's not the end result when we think about the kinds of work we want to do with agents long term that are durable and repeatable. And that's the question I ask every time I see a new AI product. Does this give the model access or does it give the model a meaningful set of levers it can really use to drive the product? I love raw computer access. I love that we're getting the agent closer to file primitives, closer to the work. I love these MCP servers. I love that we're talking about access in 2026. I want to talk about semantic control and semantic meaning. I want to talk about an AI understanding the implications of my calendar and how messy it is. And that is going to require a new set of rethought software that is designed to be agent readable from the get-go. Semantically readable to the agent, not just technically legible. Not just that the agent can use edit calendar and move the date, but that the agent understands the semantic context of this particular invite with these people. We don't have software for that yet. We need a lot of software like that. This is part of why I think software isn't dead. And it's part of why perplexity moving toward the computer is strategically necessary but maybe not complete because perplexity has to move into a world where it is able to deliver a lot more workflows like the finance workflow it's talking about to become truly sticky because the future is not an AI that clicks every button for you. That's the bridge we have today. The future is software where the button is no longer the primitive. The primitive is the action behind it. It's described. It's permissioned. It's reviewable. It's reversible where possible. It's composable. So computer use and tools like that give agents hands. MCP gives agents hands. Semantic controls tell the agent what it's touching. And that is the deeper mode. Now if you want to dive deeper on this, I'm going to go into memory ownership, enterprise permissions, browser strategy, and agent commerce on the Substack. But the core lens here is the same one I would use for every AI product over the next year. Do not ask only whether the agent can act. Ask whether the product knows what that action means. That is your key takeaway. All right, I'll see you next time.