I Tested Every AI Agent. They All Fail the Same Way.

Chapters14
The speaker argues that current AI progress has not reduced the burden of managing multiple AI agents and tasks, and emphasizes the need for AI that can do useful work without creating a new management layer. He suggests that consumer AI still lacks a simple, integrated experience and highlights Symphfony as a partial step toward better automation without overwhelming users.

We’re at a tipping point where AI agents must stop demanding our micromanagement and start delivering truly proactive help in everyday life, or they’ll stay a shiny but unusable novelty.

Summary

Nate B. Jones is blunt: AI agents are everywhere, yet they still force us into management roles. He argues that the real frontier isn’t “can AI answer” or “can AI act,” but “can AI do useful work without pulling me into a new management layer.” OpenAI’s Symphfony and enterprise patterns are moving us toward shared workspaces, but consumer life remains chaotic, with messy calendars and conflicting obligations. The video surveys current products (Clickie, Clo, Poke, and others) and highlights their strengths and crucial gaps, especially around anticipation, memory, and appropriate permissioning. Jones emphasizes that the bar for consumer-grade proactive agents is not just technical capability but trustworthy, context-aware proactivity that intervenes at the right moment with the right level of commitment. He introduces a ladder of trust (read -> suggest -> draft -> act with confirmation -> autonomously) to frame how close we are to truly autonomous assistants. The talk also bridges work tools and personal use, noting that many consumer patterns evolve from workplace products (Slack, Notion, Co-Work) and may eventually migrate into daily life. Finally, he offers concrete signals to watch for in the next six to twelve months: key hires, breakthrough moments, and model release notes that point toward real consumer proactivity. The overall call to action is clear: labs and startups must build “actual proactive agents” that lighten our load, not just repackage busywork as AI-powered automation. This video is a candid forecast and a product critique aimed at investors, developers, and everyday users frustrated by half-measures.

Key Takeaways

  • Consumer AI is ready for proactive assistance, but today’s agents still require too much human prompting and oversight.
  • A true proactive agent should interrupt, understand context, and act within guardrails, reducing the user’s cognitive load rather than adding another system to manage.
  • The permission ladder (read, suggest, draft, act with confirmation, autonomously) helps product teams design safer, more trustworthy agents for real life.
  • Current consumer agents are mostly reactive; the leap to useful proactivity hinges on better memory, personalization, and timing that feels natural—without breaking trust.
  • Enterprise patterns (codeex memory, Chronicle-like memory, and controlled provisioning) hint at how to scale consumer proactivity, but consumer life demands simpler, safer UX and clearer boundaries.
  • Platform bets like Poke (messaging-based) and Clickie (cursor-aware) show promising directions, though they still struggle with salience and control.
  • Model release notes signaling long-horizon intent with memory often precede consumer-ready proactive experiences by several quarters, making them a useful early-warning signal for savvy watchers.

Who Is This For?

This is essential viewing for product managers, AI researchers, and consumer UX designers who want to understand why proactive AI feels tantalizingly close but still falls short for everyday users. It’s also valuable for engineers evaluating how to build memory and permissioning into consumer agents without breaking trust.

Notable Quotes

"The frontier where we need to go next is can AI do useful work without pulling me into a new management layer."
Summary of the central problem: real proactivity vs. managing agents.
"I want an AI that does the draft for me because it knows I need it."
Emphasizes desire for proactive drafting rather than prompting.
"The bar is not be proactive. The bar is real lived proactivity."
Defines the quality standard for consumer agents.
"An assistant reduces the number of things you have to remember."
Key distinction between a tool and a true assistant.
"The breakaway consumer agent has to figure out how to appear in the situation when they're needed without being asked."
Anticipation as a core UX goal for proactive agents.

Questions This Video Answers

  • how can AI agents become truly proactive in daily life without overwhelming users
  • what is the ladder of trust for AI agents and why does it matter
  • which AI products are closest to providing useful proactive assistance for consumers
  • how do memory features like Chronicle influence AI agent usefulness
  • what signals indicate a breakthrough in consumer AI agents unlikely to disappoint users
AI AgentsProactive AISymphfonyOpenAICodeex/ChroniclePoke AIClickieCloEnterprise AIMemory in AI UX
Full Transcript
The main problem in 2026 in AI is that software is finally capable enough to help. And somehow it has become one more thing to manage. I don't need that. I don't need another chatbot. I don't need another blank box waiting for me to invent the perfect prompt. I don't need another agent that says it can do anything and then sits there waiting for me to assign it work. And I definitely don't need my attention sucked into managing a fleet of agents. There's more tabs. There's more sessions. There's more partial tasks. There's more notifications, more things I have to check and steer and approve and restart and clean up. That's not what an assistant does. That's a new inbox. And that is why this feels different now. A year or two ago, saying, "I don't need another chatbot" still had some edge on it. Right? In 2026, I think most people who use AI would kind of agree that the sharper question is what what's next? What's after the chatbot? Because the answer can't be a bunch of agents and I get to be the project manager. That is already the wall that Frontier products are hitting. Our attention is running out. OpenAI's workspace agents are built for long-running work across tools and teams. Sure. Right. They run in cloud. They operate in Slack. I made a video about them, right? They work on schedules. AWS is now talking about managed agents with identities and logs and steering and production controls. And Symphfony, which is an open- source protocol that the developers at OpenAI launched for everybody. Symphfony exists because engineers at OpenAI hit a human attention bottleneck. They had fast coding agents, but people were still opening up their sessions and assigning their tasks and checking progress and nudging agents and restarting stalled work and keeping all of this in their heads. Agents were capable, but the humans had become like a manager that was stressed. So, Symphfony moved that work into a place where it was easier to manage, right? The issue tracker became the source of truth. Agents pick up the work. Humans review outcomes. That's kind of where we are right now. And I don't think it's good enough, but it's progress. The frontier is no longer just can AI answer. It's not even can AI act. The frontier where we need to go next is can AI do useful work without pulling me into a new management layer. And we don't have a great answer for that yet. Symphony is probably the closest we've got. And for consumer AI, the question is way harder because my mom isn't going to Symfony because my mom doesn't know what GitHub is. My life does not have a clean linear board. I got news for you. I've got multiple messy calendars. I have a house calendar. I have a work calendar. This and that. I have multiple inboxes. I have commitments that I know I'm probably not going to be able to keep that are still on the calendar somehow. I haven't canceled them yet. I have family logistics. I have stable recurring events. I have text threads. I have school emails. I have travel changes. I have bills. I have reminders. I have work conversations. And no app has ever been able to understand all of that. What I want is the opposite of agent management. I want the thing that notices my flight got delayed before I do. And by the way, my flight got delayed and I had to deal with it today. I want the agent that sees the school email and says, "This permission slip needs a signature by Friday." and it looks at my messy calendar and it has a half-finish grocery list and a work thread that's starting to get tense and it quietly asks, "I can handle the next step. Want me to?" I want an AI that does that. An AI that answers questions. I don't want an AI that writes a draft and I remember to ask. I want the AI that does the draft for me because it knows I need it. I don't want an AI that makes me become the manager of my AI. I want an AI that catches the small problems before they turn into work, that acts inside guardrails, asks for approval when the decision matters, otherwise gets out of the way. And yes, I'm fully aware that people who are developers or who are technically adjacent can start to build out something in the open claw family and get there. I I know it. I can do that. I have done it. Other people can do it too. And thousands of us have. That doesn't mean it meets the bar for a great consumer application. And I don't think anyone would sort of aggressively argue with me there, but I need to be honest here because as much as we are using coding tools to pull the future forward and to basically make proactivity exists for us the way it should, we deserve better. We deserve proactive agents that actually deliver for us. And so this is absolutely a call to the labs. If you are building at a lab, we got to have that. If you are building on top of lab primitives, like I was playing with clicky.so, right? Clicky.so So is absolutely a consumer application that is building on top of a codeex primitive. It's building on top of computer use. Great. I love that idea. It's not proactive yet, but it is one of the better UX experiences I've seen for consumer agents because you just talk to it and this little tiny blue cursor instantiates, and that's a big word, but like my mom would say, creates a little guy that sits in the corner of your screen and does the little task you ask it to do. And you can just ask it whatever you want in plain English, and it will create a little guy. And if you want to create 10 little guys in the corner of your screen, you can create 10 little guys in 30 seconds. They'll all do little jobs for you. By the way, that will drain your battery because laptops are not ready for how powerful agents are, but it's still really cool. We need actual proactive agents that are as simple and easy to use as there's a little guy in the corner of my screen. I I believe we'll get them. We may not get them from the labs that this may be a building opportunity for someone who wants to build a beautiful proactive experience. I would love to see that. Put it in the comments. But we aren't there yet. And this video is about three things. It's about why people are telling us, gaslighting us into believing we're there yet and kind of the stuff that people are saying is like proactive and really isn't. We got to be honest about the difference because I've seen a lot of applications that claim to be proactive, including AI agent applications that claim to be proactive and they're not. Not really. Not in a way that's useful. And two, it's about making sure that we understand how we can pull forward some of the patterns we see in enterprise and use them in our own personal lives if we want to. And then three, it's about understanding once we've done all of that, how we can actually think about the future and what agents can unlock for us as far as use cases go. All right, let's get into it. I don't want fake proactivity. I don't want an app that looks at my calendar, that assumes every event is real, that starts bothering me because a model saw a time stamp. I have seen cases where people have put together agents, I'm not going to name names here, but put together agents that claim to be proactive. And part of why I'm not naming names is I know they're aggressively fixing it and so I want to give them a chance. But I've seen cases where people onboard me and it's a textbased agency and they claim to be proactive and they and they just rely on bad data because my life is messy. Everybody's life is messy and they can't tell the difference and they think all the data is real and they send me proactive nudges that I don't need to hear about meetings that I don't have. So the bar is not be proactive. The bar is real lived proactivity. You have to understand enough context to know what actually matters. You have to interrupt when you need to. You have to act within your guardrails. You have to make my life feel lighter instead of giving me another system to operate. No agent has met that bar yet. And that is the product I think everybody actually wants. And the strange part is there just isn't any movement there that I've been able to see that's meaningful. Right? We have AI that can write code and browse websites and make slides and compare flights and read documents and summarize meetings and run multi-step tasks. Those are all true things. We have almost a billion people using chatbots, maybe more. Now we have enterprises building control planes for agents. So why don't we have the normal person proactive assistant yet? And the answer is not the agent can't do it. The agent can. The answer is that most agents don't have the intuition yet to know what to ask and to know how to be proactive without being annoying. And I get that intuition is a terribly hard thing to create in a product. But the breakthrough product will know when to show up, when to ask, and when to shut up. And that is what I call the anticipation gap. The reason that this is strange is that both halves of our story are true today. Right? So consumer demand, it's out there. It's enormous. I have seen firsthand the number of people trying to install OpenClaw for lots of applications when they're not developers and you know Lord help you like I hope that your gateway is secure and you're not just letting anybody onto your network while you put like lots of data maybe including your kids data all over your open claw like I've seen the household open claw installs it scares me I'm not using openclaw with my kids data I have kids I don't want the claw to have the risk even if I think I've got it all secure I don't want to have the claw uh to have the risk of carrying my kids data right now. I need a little bit more assurance and security and I am a fairly technical person and I feel like I have no issue. I have I have an installed secure open claw instance. I feel good about that level of security, but I think real hard about what goes on that Open Claw and what doesn't because I'm reasonably paranoid and I've seen enough instances where claws do exciting things like deleting production emails. So, the consumer demand is there. Chad GPT made that really obvious. flawed. Now, for proumers in the workplace, it's a very normal thing. Gemini is on lots of surfaces everywhere and is a big consumer application in its own right. You shouldn't sleep on it. And there are lots and lots of other players in the game. This is not a demand problem. Now, flip it over. Look at the capability side. Agents are real now. Coding agents went from Curiosity to default workflow really since December, right? Cursor, Claude Code, Codeex style tools. The nerds were playing with those in 2025, but then we hit a tipping point when models got good enough around December, January, and now everybody's after it, right? And you can literally see it in the data. Stripe was showing their data with agentdriven starts for businesses and agent-driven starts for accounts. It's gone exponential. And they're not the only chart that is doing that. You wonder why GitHub has had issues. It's had issues because there are so many more agents working on GitHub. They are planning for a more than 10x increase, a 30x increase in GitHub repos, which is completely insane. So the Asian economy is there. It's especially there around code. And because code is really a shorthand for managing compute, we even have good computer use, right? Like codecs dropped. We have excellent computer use. That is that is becoming a solved problem. So it's not a capability problem. And that's what makes this gap really interesting. If people want AI and agents can act, where is the agent my mom can use? Where is the product a normal person uses to delegate the messy parts of daily life? And the polite answer is it's coming, right? The better answer is that most consumer agent products are still reactive. You open them, you tell them what you want, they try to do it. That sounds like agency. Sometimes it's agency, but it still puts the hardest job on my shoulders because I have to figure out what the agent is doing. I can't fully trust the agent. Chad GPT worked because the mental model was already there. For 20 years, users learned to type a query into a box. Google trained that behavior. You have a question, you can press it into words, you hit enter, and something comes back. Chat was a huge capability shift, but a tiny behavioral shift because instead of typing into Google, you typed into chat GPT. And by the way, if you think that's not true, how many searches are really just give me answers back? There is a reason that people think of chat GPT as a knowledge engine for casual users using chat GPT on the phone. They use it to get answers the way they use Google to get answers. And that shows up in the data. And I got to tell you, agents don't get that benefit. There's no cheap UX trick there. Most people don't wake up thinking, well, which of my life admin tasks should I assign to an autonomous system today? If you ask a normal person what they want an AI agent to do, a lot of them honestly don't know what to say. The honest answer is usually I I don't know what can it do. That is why that is the most common question after you install openclaw. What do I do with it? That is why there were lines in China to uninstall openclaw after there were lines to install it. That is not a small UX problems that that's the ceiling we're at right now. And it's not as simple as saying well use people delegation and do it for ages. When you delegate to a person there's a social model underneath that. You can say, "Can you handle the dinner reservation?" And the other person knows the city, the vibe, the number of people, the rough budget, the fact that you hate places that are too loud, the fact that you know your spouse is tired, so you shouldn't pick a place across town. And there's shared history and taste and relationship and judgment in the imaginary world where I would delegate that reservation because honestly, that's also hard to do, right? We need to be honest that a lot of us aren't delegating tasks like that. One, because we don't have someone to delegate it to, and two, because that requires everything I just described to be there. You have to have the shared taste in history and judgment. But at least we can imagine it, right? We can imagine a world where like maybe you have an EA and you tell your EA to book dinner and the EA has known you for 10 years, right? Like we've known people like that. I had lots of people who were bosses in my life who had EAs who could do that stuff. But software software does not get that for free. So when a consumer agent says, "Tell me what you want and I'll do it." The pitch sounds really great. And part of why I'm making this video is because I've seen these pitches, so many of them right now, and the burden is still on the user. The user has to notice the task. The user has to remember the agent exists. The user has to translate the task into a prompt. The user has to decide how much permission to grant. The user has to supervise the result. And for a two-minute task, that's way more work than doing the thing yourself. And that's how long it takes to book a reservation, right? This is why consumer agents can feel amazing in demos and then disappear in your life. The demo has a prepared user. Real life doesn't. In real life, I'm moving between my email and my calendar and my text and my tab and my work and my groceries and my travel and my Slack and my school forms and my doctor's appointments and whatever else is happening that day. You're not sitting there thinking, "This would be a great moment to invoke my agent." The product that requires you to remember to use it is still at that reactive ceiling, right? The breakaway consumer agent has to figure out how to appear in the situation when they're needed without being asked. I fully admit that's a hard problem as a product person wearing my product hat. It's a hard problem. And I know I just described and I told you I'd get to this. We have coding with enterprise applications that is proactive. But coding has conditions that consumer life doesn't have. Coding has clean verification. I can write an eval for it. The code runs or it doesn't. Tests pass or they fail. The compiler tells you when something is broken. Consumer life doesn't have any of that. Did the agent book the right flight? I don't know. Maybe it got to the right city. Maybe it's the wrong time. Maybe that makes me think it's incorrect. Did it choose the right restaurant? How do you define right? Did it write the right email? How do you define right? Did it summarize the meeting correctly? There's not a compiler for taste. There's not a test suite for life admin yet. Although I think there needs to be and that's something that we could work on. Coding also has bounded scope. You can say fix this bug and the agent has a repo, an error, a task, and a target. Consumer tasks like book a trip, which I don't know why. Why are we always demoing book a trip? How many trips did you take last year? Right? Even if you're traveling a lot, is book a trip a daily thing? I hope it's not. But even then, it sounds like it's one task. It has budget and timing and taste and all of that. But when you dig into it, you have family preferences. You have calendar constraints. You have your tolerance for cancellation. How do you handle changes? All of that. What do you do when you get there? Do you need a hotel? Do you need a car? Like, there's a reason why Expedia exists. It exists because the entire thing is complicated and they have thousands of developers figuring out how to make that work. It is not an easy problem. And consumer has surprising amounts of detail inside it. Consumer is a category. Us as a category in our daily lives, we have surprising amounts of detail and we're individuals and so we're unique. And that is part of why the consumer app space is so challenging. But also, if you get it, so sticky and lucrative. Consumer agents have to break through in an environment where delegation is not something we naturally do. Success is subjective. Errors are expensive, right? The user can't name the task in the first place, maybe. And that is why the next consumer agent doesn't just have buttons with a chatbot, right? It has to actually have that natural anticipation. And when I say anticipation, I don't mean magic. I don't mean the agent guesses everything correctly and starts running your life. I mean the product moves from you ask me to do X to this is like the moment when X matters, right? Do do you want me to handle that? The flight gets delayed. The agent says there's a later flight that still gets you there tonight. Want me to switch for you? your kid's school sends an email and the agent says, "Okay, you know what? We do need permission for that field trip today. I pulled it up for you." The work thread gets really tense and the agent said, "You know what? This looks like it needs a careful reply. I'm going to draft something with a very careful, neutral tone to diffuse this." The grocery list gets really long and the agent says, "I can turn this into a Wednesday delivery. Want me to review it?" Notice what changes. It's not that the user is remembering the agent and calling them. The situation is calling the agent into existence. That's the difference between a tool and an assistant. A tool waits for you to remember it. An assistant reduces the number of things you have to remember. Consumer software has crossed smaller versions of this threshold before. Push notifications did it for messages. You did not have to open the app to find someone texted you. Recommendation feeds did it for content. You did not have to know what you wanted to watch before the product started showing you options. Maybe YouTube did that for this video for you. Autocomplete did it for search. You did not have to finish the query. Smart replies did it for email. You did not have to compose from scratch. The next move just appeared. But those features worked because they were narrow and bounded and reversible. You could ignore the notification. You could scroll past the recommendation. You could skip the autocomplete. You could write your own reply. Nobody asked you to hand over your credit card and let the feed magically book a vacation. Again, why do they demo that all the time? I don't get it. Agents are trying to do the same basic job. They're trying to surface the right thing at the right time, but across many domains with real world actions and with much higher error costs. And that's why the bar is so high. It's one thing for Gmail to say, "Hey, it sounds good. Thanks." It's another thing for an agent to figure out how to buy something with your card and sign you up for a service. And by the way, if you think that's pie in the sky, Stripe launched agent wallets. It is a real thing. You can get an agent to buy you something now. And I believe that's the future. I see it. The convenience is there. But we have got to get agents to the point where that feels natural. And that's a product problem because it has to act at the right moment at the right level of commitment with the right amount of permissions. The rails are starting to get there for agents to do this cool stuff, but we still have an anticipation gap. And so if you want to look at some of the active products out there, and we will look at them here, there are different bets on how to cross that threshold, right? Poke is betting that the interface should be messaging. And that's a strong bet because messaging has almost no cognitive cost. People already text all the time. The interface does not feel like software. It feels like a connection, right? Poke lives in iMessage and SMS and Telegram. It connects to your email and your calendar and your search and all of that. So, the posture feels like it should be proactive and it nudges you a bit, right? It can remind you sometimes about emails and calendar and reminder. It's it feels like it's trying to get there, right? The risk is that the messaging rails are not fully under Poke's control. Apple, Meta, SMS cost, those are all part of the product. And the other risk is salience. How does Poke know what matters to me? And I found through using Poke that it's not quite there yet. I can see the vision. If you're pitching me on a pitch deck for this startup, I can see where you're going and I can see that you're betting on the models getting better so that you can make this more useful. I get it. I'm excited. I want it to work. It's not quite there yet. And by the way, getting it there is not just a model phenomenon. This is also part of what makes it a really fun product problem. Getting it there is also a function of understanding how you work with memory and personalization and salience for facts. I'll give you a great example. If I tell you that I have the goal of losing weight so I look great in my swimsuit in Hawaii then which by the way I'm not going to Hawaii but the point is if I tell you I have a goal like that, you have to understand for one consumer that may be a very serious goal and the agent has to take it super seriously and the agent has to get um the swimsuit together. They have to get the Beach Body uh workout plan together. They have to get the diet together. This and that and the other thing, right? There's all kinds of things that the human implicitly expects them to do to feel like it lifts weight from their shoulders. But for a lot of people who are not quite as serious about working out, I count myself among them. That is not what you mean by that. What you mean by that is that you saw a TikTok about Hawaii and you're like, "Wow, that sounds nice. I should probably get in shape for Hawaii if I ever go or like when I go in September or like whatever it is." You decide you pick your trip and then you tell the agent because you want the agent to help you, but the problem is the habit formation and how much you want it. And what you really want the agent to do is to recognize that you're kind of halfway serious about it. And maybe it should book two workouts a week, but not five highintensity interval training workouts a week. And if it tried to do the ladder and it was the most efficient way to meet the goal, you would be kind of disappointed because you would be dying on the treadmill, dying in your high intensity interval training and not be able to actually keep it up and then end up depressed and sad, which by the way is a real workout loop. Like I've been the guy who was like, I'm going to run and then like I ran and it did not work and I had to find an exercise that worked for me. you all of that complexity. The agent can't just assume you're the incredibly type A driven person that wants the most efficient path to the goal and just give you that if you express a goal. That's why this is hard and that's why this is not just a model problem. I have a lot of sympathy for folks who are trying to solve this. This is not easy. Let's look at another agent. Clickie is betting on the cursor, right? I told you about Clickie a little bit. It sits beside the cursor on a Mac. It sees your screen when you ask for help. You can speak to it. It can point at things. And that matters because so much user pain happens inside software we don't fully understand. We don't fully understand all the settings in Figma or in Photoshop or in Premiere or in Da Vinci Resolve or Excel or whatever. The user is staring at the answer, but they don't know what it's called or where to click. Or maybe they have three things to do when they can only do one at a time. And so there's a sense of wow, the cursor is something we should pay attention to on the computer because it tells us where the eyes are, where our attention is. And if we start to get there, we can start to get to anticipation eventually. I would not be surprised if Clickie goes there eventually. It's not there now. It's reactive. It's a lovely user experience, but it's reactive. And that's where they're at right now. We'll see where they go. Clo, yes, they're still here. Clo is like a different bet on presence, right? So, it started with the cheat on everything framing famously. They raised a bunch, especially around interviews, exams, and that was great marketing and also a giant warning label, right? And a lot of people were like, "Oh my gosh, either I can't use it or I'm going to invent a detector for it." But underneath the controversy is a very real behavior. Visible AI use is socially costly. So invisible AI use feels like an advantage. People want help without feeling judged. The demand is real. The risk is real. And so Culy basically has to be in a position where if they're going to really be helpful to you. The perspective they provide cannot feel canned, has to feel personalized, has to not be visually distracting, which actually is one of the most painful things for me about Cluey. I have I have messed with Cluey. I've put Cluey on my screen. I've seen Cluey pop up proactively in conversations. And the two biggest issues I have with it are that the answers feel canned so that I think if I express them as mine, I will sound dumb. and it's slow. And if it's slow and your answers feel canned, you're not going to stand out. And that's why when interviewers are talking about how they detect AI use in interviews, some of it is the pause, the pause, the p now we have an answer and it's kind of a generic answer or suddenly it's way out of depth with the character of the rest of the interview because you asked a deep technical question and clearly takes a minute and comes back with an answer or maybe another AI tool and it's like, "Wow, this is really in-depth. This is really good." Except the whole rest of the interview, this wasn't like that. This was generic. And so, you have to have agents that help you enhance what you already are and already know without making you feel unnatural. That's the larger lesson there. Now, you might not think co-work counts here, but co-work is interesting because it takes the thing that made Claude code valuable, multi-step work toward an outcome. It points it at non-technical knowledge work. and and and you can see the pieces there where pretty soon Claude is going to have enough data that it's going to suggest to you, hey, do you want to work on this? And by the way, if you think that's just a fantasy, the memory feature that Codeex launched called Chronicle does exactly that. Like I can enable Chronicle on my laptop, and I have, and I can say, "Hey, how can you help me? You've seen what I've worked on this morning." And Chronicle will tell me. Chronicle will say, "You know what? I've noticed you're working on a lot of process. Uh, can we do some SOP writing? I think I can do it. And by the way, I was working on process and it did write it. It was like 80 85% good as a first draft. I would never have thought of assigning it to codeex and it did a great job. So the memory piece is a part of this and I think we see a clue toward the future and the way Chronicle is handled. Okay, I promised to talk about how we solve it. Now I think the first way to solve it is to think about our permissioning. If we're going to build this, the cleanest way to think about permissioning is as a ladder of trust. Step one is allowing an agent to read. The agent can see something. It can read your file, your email, your screen, your calendar. It's the lowest trust step. Step two is to suggest. The agent surfaces something proactively. This email matters. You discussed this last time. You said you would follow up. The agent makes a proposal, but the user remains in charge. Step three is to draft. The agent prepares the action. It writes the email. It builds the schedule. It fills the card. Doesn't check out. The work is done, but the user approves. Step four is act with confirmation. The agent can go into the world. It can do things, but it asks before consequential moments. It can navigate. It can fill forms. It can assemble options. It can maybe prepare the booking, but the user signs off. And finally, of course, step five is autonomously. And a lot of people want to jump right there. And that's why I'm giving you the ladder because when the agent buys and books and sends and signs without you, you you are credibly crossing the line with consumers, but only if you get it right. Because the downstream consequences of breaking trust are really high. One of the reasons, by the way, that trust is hard to recover from is that we humans are risk avoidant, right? And so when you're trying to build a complicated software product, you have to think about the fact that the user is taking a risk on you. And if you get it wrong, the user's not going to try it again because they're risk avoidant. So we have to have that high bar before we even get to step five. And that's why if you're trying to pull the future forward, if you're trying to use my guide, you're trying to figure out how to build an agent that works for you, like a chief of staff or something like that, you have got to figure out what level on the permission ladder you want to get to and be intentional about that. And yes, I'm I'm writing all of that up because you you need to be in a point where you have intentional choices about calendar scheduling, email drafting, meaning follow-up, shopping replenishment, all the stuff that you want to tackle. And you need to make sure that you're not giving it a vague overall assignment. That's one of the things I'm going to encourage you to do. Don't think of agent, manage my life as your goal. Think of it as a few domains where the product has enough context, enough reliability, enough permission, and enough restraint to feel like an assistant instead of a chatbot. That's the bridge to the future. There is one more pattern that I don't think we're talking about enough, and that's sort of a proumer bridge into agents. A lot of consumer software doesn't start as pure consumer soft. It starts with knowledge workers. Slack entered our lives through the idea that we work in teams. Notion entered our lives through the idea that software needs PMs and designers to collaborate. Superhum entered our lives through executives. And those are all tools that I have seen people use in their personal lives, not just at work, but they started as work tools. And that's by the way why I mentioned co-work as a knowledge work agent because the boundary between work files and personal files is not as clean as we like to pretend. And something that is useful for work can end up being useful for us personally. And so I do wonder if one of the first ways we start to see proactivity is we start to see it at work. And that's why when I'm writing up this guide, I'm thinking about work for you as a proumer, right? Like how do I think about how to instantiate proactive agents in the workplace is a legitimate consumer case? Because we start often by bringing our best intention, our most thoughtful work to our own space in front of us in our own desk. We don't always think about the team as a whole. Now leaders will think about their teams and that's great but leaders are often following trends they see in their own workers. That by the way is how Slack got started. People started to install Slack for them and their buddies at work and it was just a small group and Slack spread like wildfire and eventually CTO's had to get involved and like instantiate everything as one Slack instance and that's the Slack we know today. But it started as a wildfire of bottoms up adoption and we may see that with agents because we care about our own work paths just like we care about our own consumer paths. Okay, we've talked a little bit about why this problem is hard. We've talked about some of the agents that are tackling this. We've talked about some of the key use cases. We've talked about some of how you build it and pull the future forward. The last thing I want to do is I want to talk to you a little bit about why the window to this next part of the agent ecosystem is short but indeterminate because we need to talk about timeline. One of the things that's hard in AI is that everything is right around the corner. Everything is near. And so when we are in this moment with agents, I want to give you a few of the early signs that this is going to go from being something only the nerds build like the openclaw nerd that sort of builds the proactive agent for the house, right? Like that kind of thing is a very techheavy thing. My mom is not installing openclaw. It's just not happening and nor should she but it's coming soon. And I want to tell you how you can tell that and how you have early warning signs. I think we've seen one of the first early warning signs already. It was a key hire when OpenAI hired Peter Steinberger. It's an example of the kind of thing you have to watch and say this is coming because Steinberger is known for OpenClaw. That's what he built. That's what he did. I know he's done amazing things for 20 years before that. But that's going to be his career legacy. And I think he knows it. And since he wears lobsters, I think he's excited about it. Open AAI is clearly working on Asians. They are working on this problem. And if they are working on this problem, there are two dozen other companies that are working to beat them. And so key hires are something to watch for. Most of us do not spend enough time looking at hiring pages. By the way, do you know how I know that Enthropic is going after HR techch? Their hiring page. They're hiring for people who are going after that with AI. It's not It's public information. It's on the hiring page, right? Like, you can just see it. So, look at the hiring. And that also is a a key tip. If you're interviewing, don't just look at the jobs you're interested in. You can infer the entire company strategy by looking at the hiring page of the company you're targeting. And yes, you can use Codeex, you can use Claw, you can use a tool that browses, does that analysis for you. Second thing to watch for, look for breakthrough moments when a particular agent feels like it lifts a load. And so, yes, I am saying try agents. I've mentioned several that I've tried. None of them meet my bar for proactive. I see bits of it in all of them. I am looking for an increased cadence of load lifting off my shoulders. And if I see less of it over time, that particular product is probably not going where it needs to go. If I see more of it, more moments when it feels reasonably proactive, yeah, I see progress in that direction from that product. And that's why I'm running like three or four different agents at a time deliberately as a test over multiple months so I can check back in in a bit and see how they're doing. And I add them, you know, I I add new agents all the time. So my my encouragement to you, you don't have to like have a dozen agents like I do running on the computer, but try something that you think is interesting and stick around and make a reminder on your calendar and try it again after it has a chance to update like once a month. That is not too much to ask. It is a chance for you to keep an eye on how the category is evolving and look especially for ways in which it tangibly lifts the load for you. And you probably have to build in the connector so it actually feels connected to do that. But but you can do that. Third and last thing I will call out if you're looking for signs that this is right around the corner is look at the model release notes when the big models come out. The new frontier models come out. They're usually about six months ahead of the open source models. And when the model release notes start to talk not just about longunning agentic tasks for coding but longunning agentic intent with memory for consumers. They aspire to that. They talk about that. That is when you can start to look at that as a almost solved problem for proactivity. Because by the way, as I hope you've learned from this video, actual proactivity is a lot more complicated than just longunning intent. That is one of the key building blocks we're not quite there on. That's why it's an early warning sign, but also it's not enough. Which is why I shared my Hawaii example, right? If a human can communicate the exact same goal and the exact same wording and mean something different, we have a complicated problem to solve on proactivity. You have to be very aware of the user's other behavior. In that case, probably the behavioral way to test that is to look at your health data and say, "Wow, this guy goes to workouts like may maybe, you know, once a week at best." They're probably not asking for for five days a week of high interval training, right? Let's let's be honest here. And so there's data ways to tell it, but it's a hard problem. You have to multiply that across all of the domains. So those are some ways you can tell in advance. I am not here to say you cannot build proactive agents today. Literally with this video on the Substack, I am giving you options to put some proactivity into your life that's meaningful and useful. Fantastic. I'm so glad that I that that that we have a way to do that is really because of all the work that's been done so far, right? like we are able to make small proactive agents because of the work of folks like Steinberger because of the work of many other developers who are open sourcing different projects like I open source the open brain project we're trying to all work together as a community of AI interested people to build cool stuff but again I keep thinking of my mom my mom is not going to read my substack guide and install a proactivation nor should she and that is what I keep waiting for and I am excited because I think that I see enough of these pieces that there's a chance that we get that this year. And then a lot of the admin mess that we deal with is actually something where I can be like, you know what, you don't even have to be a developer. You can just install this proactive agent and it will work. I am very much looking forward to making that video. That'll be fun, but it's not that day. Subscribe and we'll have uh more stories on agents uh or the lack thereof shortly. Cheers.

Get daily recaps from
AI News & Strategy Daily | Nate B Jones

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.