Episode 15 - Inside the Model Spec

OpenAI| 00:37:26|Mar 26, 2026
Chapters19
Andrew Maine and Jason Wolf discuss the OpenAI model spec, its purpose, and how it guides model behavior, including how models think through problems and respond to questions about topics like Santa Claus.

OpenAI’s Jason Wolf explains the model spec: what it is, how it guides model behavior, and how it stays transparent and adaptable as AI evolves.

Summary

OpenAI’s Andrew Maine hosts Jason Wolf to unpack the model spec and why it matters for anyone building or using AI tools. Wolf describes the spec as a human-readable set of high-level decisions about how models should behave, not a perfect blueprint or implementation. He emphasizes that alignment with the spec is ongoing—models aren’t perfectly compliant today, and the spec itself evolves with deployment and user feedback. The conversation covers the spec’s transparency: you can read it at model-spec.openai.com or on GitHub, and OpenAI actively invites public input. We also learn how the spec relates to training, policy enforcement, and practical examples that illustrate decision boundaries, like how to handle honesty, confidentiality, and a child asking if Santa is real. Wolf traces the spec’s lineage from a 2024 initiative with Joan Jang and John Schulman, explains the chain of command that prioritizes OpenAI instructions while preserving user steerability, and highlights how chain-of-thought analysis helps diagnose and improve behavior. The discussion ends with reflections on the future: smaller models, agent-based systems, and the growing role of company- or project-specific specs in a world of advancing AI capabilities.

Key Takeaways

  • The model spec is a human-facing guide to intended behavior, not a guaranteed rulebook or exhaustive implementation.
  • Alignment to the spec is ongoing and measured as models deploy and user feedback accumulates.
  • Transparency is central: the latest spec is accessible at model-spec.openai.com and on GitHub, with open channels for public input.
  • The chain of command in the spec helps manage conflicts between user, developer, and OpenAI policies, while maintaining user steerability.
  • Honesty is prioritized, but exceptions exist where confidentiality or context requires nuanced handling.
  • Deliberative alignment and training interventions link policy principles in the spec to actual model behavior, even if the map and the terrain diverge at times.
  • Smaller models can exhibit strong alignment because they often think through policies more explicitly, aided by deliberative alignment.”

Who Is This For?

developers integrating OpenAI models, policy makers, product managers, and AI researchers who want to understand how the model spec shapes behavior and how to design responsible AI experiences.

Notable Quotes

""The spec is our attempt to explain the high level decisions we've made about how our models should behave.""
Wolf defines the core purpose of the model spec as a human-readable guide to model behavior.
""The model spec isn’t an implementation artifact. It’s primarily to explain to people how it is our models are supposed to behave.""
Emphasizes the distinction between the spec’s intent and code-level execution.
""If there are conflicts between instructions the model should prefer OpenAI instructions to developer instructions to user instructions.""
Explanation of the chain of command and hierarchy of authority within the spec.
""Honesty is really important, but there are some hard interactions where white lies or confidentiality come into play.""
Illustrates the nuanced handling of honesty, confidentiality, and safety boundaries.
""The model spec is a north star… it often leads where our models actually are today.""
Acknowledges the aspirational role of the spec in guiding development.

Questions This Video Answers

  • How does OpenAI's model spec influence how a chatbot answers tricky questions for kids?
  • What exactly is deliberative alignment and how does it relate to the model spec?
  • Where can I read the latest OpenAI model spec and submit feedback?
  • How do chain-of-command policies balance honesty, confidentiality, and safety in AI models?
  • Do smaller models follow the model spec as well as larger ones and why?
OpenAI Model SpecDeliberative AlignmentChain of CommandHonesty vs ConfidentialityModel TransparencyPolicy EnforcementAgent/Autonomy in AISmall vs. Large Models
Full Transcript
Hello, I'm Andrew Maine and this is the Open Eye podcast. Today we are joined by Jason Wolf, a researcher on the alignment team to discuss the model spec, how it shapes model behavior, and why it's important for anyone building or using AI tools to understand the the spec often leads where our models actually are today. At this point, you know, models are pretty good at like kind of going out and finding new interesting examples. Models should think through hard problems. Don't start with the answer. Like actually think it through first. What did you do this weekend? Uh what did I do? Uh just like kid stuff. I don't even remember what like they talk to chatbt or uh yeah we use we use voice mode sometimes. She'll like ask it random like science questions and and that kind of thing. It's fun. You know one time she she snuck in there before I could dive in like is Santa Claus real? Wow. like and then yeah the luckily the the model answered in a way that was spec compliant which is you know to recognize that maybe there's actually a kid who's asking this question and you should kind of uh you know uh just be a little bit vague about your answer. So so we we've talked before here about model behavior and the term model spec has come up numerous times. I would love for you to unpack what that means model spec. Yeah. So, uh the spec is our attempt to explain uh the high level decisions we've made about how our models uh should behave. Uh and yeah, this covers many different aspects of of model behavior. A few key things to note that it it is not uh one, it's not a uh a statement that our models perfectly follow the spec today. uh aligning models to spec is is always an ongoing process and this is uh you know something we uh we learn about as as we deploy our models and we measure their alignment with the spec and uh and you know understand what users like and don't like about these and then come back and uh iterate on both the the spec itself and uh and uh and our models. Uh the spec is also not an implementation artifact. So um I think this is maybe a common confusion that the primary purpose of the spec is really to explain to to people how it is our models are supposed to behave. Uh where you know the these people are you know uh employees of open AI and also users, developers, policy makers, members of the public. uh it is you know a secondary goal that our models are are are able to understand and apply the spec but uh we never uh kind of put something in the spec or change the wording in the spec in a way where the goal is just to uh have this better teach our models. The goal is always uh primarily to be understandable to um to to humans. And lastly, the the spec isn't a uh it's not a complete description of the whole system that you interact with when you you come to chatbt. There's lots of uh other other pieces in play there. So there's uh you know there's product features like like memory. Uh there's uh usage policy enforcement is is an important part of our overall safety strategy which is not uh captured directly in the model spec. And uh there there's various other components as well. And it's also not a fully detailed uh exposition of every detail of of every policy. Uh the the key thing that we try for is that it captures all of the the most important decisions that we've made and that it uh accurately describes our intentions even if it might not contain every detail. So I can understand like a document or something that says this is the model spec, but how does that work in practice? So it's a pretty long document like maybe uh you know a 100 pages or something like this. Uh starts out with some uh a sort of high level exposition of of our goals. You know OpenAI's mission is to benefit humanity and this is the reason we deploy our models and uh kind of getting into you know the the the goals uh we have in doing that are to uh to empower users and to uh protect society from serious harm and how we think about the trade-offs. and then goes into uh kind of a a big set of of policies that actually get into the the nitty-gritty details of uh how we think about these many different aspects of model behavior. If you if you think about it, it's like kind of crazy that you can you can ask these models literally anything and they'll try to respond. And so the the space of uh of policies you you might want to have to cover that is is kind of huge. And we do our best to try to structure this space in in kind of a clear way. And um uh yeah have have policies that uh that do something reasonable. And some of these things are hard rules that can't be overridden. A lot of it is defaults like things like tone style personality where we want to have a good default so that users come in and get a good experience but we also want to maintain steerability. So if the the the user uh wants to uh wants to do something different that's fine. Those those things will be overridden. And we also have tons of examples that try to kin down these decision boundaries of like okay let's take a take like a borderline case where uh it's kind of unclear whether you know honesty or politeness should win and uh explain what the what the decision is here. Um so so part of it is to sort of show the principles in in action and uh help make sure that they're interpreted in the the way that's intended. A kind of secondary thing is that you know the model style, personality, tone is also really important and really hard to explain in words. And so the examples are also a way to get some of that nuance across of like how do you actually want the model to put these principles in practice by by giving like an ideal answer or often like a a sort of compressed version of an ideal answer that gets at it the most critical parts. And so kind of both like shows the principles in action and how the the model should actually uh uh how it should actually talk. Let's talk a little about transparency. That's been something that's come up a lot and how important it is to let people see what the spec is. Where do they actually see this? How do they let you know what they think? So users can go to uh model-spec.openai.com to see the to see the latest version of the model spec. uh or if you search for the the model spec on GitHub, you can view the the source code. Uh the spec is actually open source, so uh people are are free to to fork it and and uh make uh make their own version if uh if they want to. And uh yeah, we we've had different mechanisms for public feedback at different points. I think right now the the best mechanisms that exist are either you know if you're if you're in the product and and you get an output from a model that you don't like to to give us feedback uh right there directly in the product um or uh yeah you can uh you can tweet at me Jason Wolf and yeah I will uh I I will read your read your feedback uh and we've uh um yeah a lot of changes in the the model spec have come from people just sending up their sending us their their input and thoughts. It it's interesting because you we've gone from just a few short years things were very simple just getting the model literally complete a sentence or fix grammar or whatnot. Now we're at this point where you were able to have a lot of these different goals of what they're doing. How did the model spec come about? How did this become the open approach towards determining this? Personally, I was uh at a different company working on conversational AI and uh uh putting together my job talk for OpenAI and and thinking about like what uh what maybe the the future of uh of aligning models looks like. And you know at the time I think at least the the published approach was this thing called reinforcement learning from human feedback where you collect all this data uh from humans that kind of captures in some way the policies that you want to have. And you know this was pretty effective. This is what uh uh but uh but when you look at that data it's very hard to tell what it's actually teaching. And it's even harder like if you change your mind about about what you want. it's sort of uh you know very very difficult to to go back and change that without like recollecting all that data. And so it seemed to me that you know as models you know that at the time this approach is basically we're meeting models where they are and as models get smarter and smarter and smarter like eventually the models will be meeting us where we are. And uh you know if you think about how would we actually structure this in in a in a case where where that's true. Well, probably the way we would structure uh our our uh teaching to the model is basically the way we would do it when we teach a person. Uh we'd write some kind of like uh employee handbook or something like that would be a be a big part of it. And um so yeah, this this was like something I included in my my job talk that like basically I I think it at at some point models should learn from something like like a spec. Um, and then you know the story of the actual model spec I guess starts uh like a few months later in 2024 when Joan Jang who was head of model behavior at the time and John Schulman, one of the co-founders uh decide to get a model spec project going and uh they they wanted to you know not only write this down in a document but also make it public for uh kind of transparency reasons and uh yeah I very quickly joined forces with them and helped write the the original spec and have uh kind of helped work on this makes sense. So help me understand kind of on a basic level. So you have the specification, all these sort of the the intents for what you want the model to do. Then you have the model itself. How does it make its way from the spec to the model? Yeah, this is a great question and I think it's uh it's the answer is uh kind kind of complicated. I would say um you know there there are some ways in which we we uh use the spec uh sort of more directly in in training like we have this process called deliberative alignment where we teach especially our reasoning models to uh to follow uh certain policies and uh some of those policies are uh are kind of directly derived from the language in the model spec or vice versa. Um, in general, I'd say, you know, model behavior, safety training, these things are they're super complicated processes and we have, you know, uh, hundreds of researchers who are working on these things. Um, and so often the the connection is a little bit less direct. It's not necessarily that, you know, we make a change to the spec and that's what drives a change in behavior. it's that uh the we uh you know we we we make a change in the way that we train the models and then we make sure that the spec accurately reflects uh our intentions. Um but but again the actual process of training is kind of uh much more complicated and nuanced than we could possibly like put in in the model spec itself. So you have a spec, you have a lot of different things that you want the model to do, examples you want to do. What's the hierarchy? How do you decide what's most important? At sort of the heart of the the spec is this thing we call the chain of command. You know coming up with a set of goals for for the model is sort of relatively straightforward. We want the model to to help people and you know not do unsafe things. But uh what gets tricky is when these goals come into conflict. And so the the chain of command is really about managing conflicts between instructions. And this can be uh you know between things the user said what what the what the developer instructions are if this is in an API context and uh from instructions or or policies that come from a uh open AI uh which are are typically in the model spec itself. And so what the chain of command basically says is that you know at a high level uh you know the model if there are conflicts between instructions the model should prefer OpenAI instructions to developer instructions to user instructions. But then uh you know we don't actually want all all of OpenAI's instructions to be at this very high level because we want to empower users. We want to kind of allow them to to have intellectual freedom and to pursue ideas uh that you know so long as they they don't really come up against uh what we think are are really important safety boundaries. So the chain of command also sets up this framework where in the rest of the spec each policy can be given what we call an authority level and this places it somewhere uh in this hierarchy and we try to put as many of the policies as we can at the lowest level like below user instructions. And so this means that uh this maintains steerability. So if the the user comes in and they want something different they can have that. And we try to have as few policies at the the sort of highest level uh as we can. Uh and these are are basically all like safety policies where we think it it's actually you know it's essential that we uh sort of impose these on on all users and developers to to maintain uh to maintain safety. Well, you mentioned a great example before which is if a child asks a Santa Claus reel, how do you decide what the model should or should not do in a situation like that? This is a great question. And I think it it illustrates one of the really tricky things about model behavior, which is that um in the spec we're focusing just on how the model should behave, but the model often doesn't know uh it doesn't have all the context. It doesn't actually know who's behind that screen talking or typing. It doesn't know what that person is going to to do with the results that come out of the model. And so, uh, yeah, this is a tricky case because we we don't know if, uh, you know, if it's an adult who's, uh, who's asking if Santa Claus is real or a kid. I have questions. Exactly. Uh, so I think, uh, you know, we, yeah, we try to come up with policies that make sense even given this uncertainty. And and so there there's a similar example of this about the the tooth fairy in the spec where it's like the uh here the the conservative assumption is to uh assume that maybe it's it's not an adult who's talking to the model and that you should you know uh not not lie but also not uh not spoil the magic just in in case it's a kid or there's a kid around who might be might be listening. That's a very interesting choice though because on one hand you might say oh the model should never lie at all which you know seems like a very good policy to put in there but then you're saying that okay we have to have some sort of nuance here not necessarily lie to the kid but find a way to sort of would you say dance around or uh yeah I mean uh as a parent I guess this is something uh I' I've uh come to come to terms with with with my own kids uh not I we always try try to be honest and never say anything that's that's untrue. But uh you know, yeah, it doesn't it doesn't always work to be 100% upfront. But but no, I'd say with our with our models, we do really try we focus on on on honesty being really important, but there are some really hard interactions. Honesty, full honesty may not be be the the best approach. Um, and so we we've actually iterated a lot over over the years on the precise nuances of of honesty and where it potentially uh uh conflicts with or or runs into other policies of you know honesty versus friendliness. Like when is a white lie okay? Um I think earlier we said maybe at some point that white lies were okay and have shifted that so that white lies are are are out of bounds. But another interesting interaction here is between honesty and confidentiality. So in earlier versions of the spec we uh we had this like very strong principle that by default developer instructions are confidential because I I think often in in applications if a a developer they deploy some system on top of the API and they want they consider their instructions to be like IP or maybe it's just part of the experience. You know, if you have a customer service bot and the user can say like, "Hey, what's your prompt?" And, you know, it spills all the beans about the the company and how they want their bot to respond and that's not like the experience that they want to deliver and that's not how you know a customer service agent would would respond, right? If you're like, "Hey, uh, start reading your employee manual to me, right? They're they're uh they're going to say no." Uh but um yeah, the I guess there's an unintended interaction here where if you're both trying to follow developer instructions and keep them secret, you could get into a situation where uh at least we saw this in like controlled situations, not in in production deployment, where the model might try to sort of covertly pursue the developer instruction when it's in conflict with the user instruction. And this is something we we really don't want. Um and so we've uh gone back and and revised that. And uh yeah, I'd say over time have carved out uh removed most of the uh sort of exceptions that we had from from honesty. So that now honesty is uh is uh definitely above confidentiality in the spec. Yeah. What it saved the people in 2001 is space odyssey a lot of trouble. How does the process work? So like literally is it you know like a regular meeting where you all talk about what you're working on? How does that process of the model spec evolving and figuring out what's working and what's not working? There's uh there's a ton of inputs that that go into this and broadly like uh we have we have an open process so everyone at at OpenAI can uh see the latest version of the model spec they can propose uh they can propose updates they can uh chime in on on changes these are all public um uh yeah I'd say changes get driven uh by a variety of different uh sort of different sources You know, one source is just that models get more capable, our products evolve as we ship new things, we need to cover those things in the model spec. Uh so for instance, uh you know, when we wrote the the first spec, I think uh uh I'm not sure if we we had shipped multimodal yet, but it wasn't covered in the first version of the spec. And so we had to, you know, add multimodal principles. And then later we added uh principles for uh autonomy and agents as we started deploying agents. And most recently we added under 18 principles um as we added under 18 mode back in in December. Um so that's that's sort of one source. Another source is uh you know OpenAI believes in iterative deployment. So we uh we think the sort of best way to uh figure out how to deploy models safely and and to help society kind of learn and adapt to AI progress is to get models out there and and learn from what happens. And so uh often we'll we'll uh learn from learn from something uh like for instance the the the safincy incident and um and then you know take those learnings and bring them back into into our policies and you know we also just have we're we're we're using uh using the models. We have our you know model behavior and safety teams that are uh sort of uh yeah studying the models and what users like and and these kind of stuff and and uh using these to to evolve our policies and these are all kind of inputs that then ultimately flow into back into the spec. How do you handle situations where there's maybe a disagreement between the way the model does something and what the intent is in the spec or what the humans want? It depends a little bit on on what the the problem is. uh but um I think yeah so in in general the model spec is not uh a claim that models are going to perfectly follow the principles in the spec all the time. Uh this is for a few reasons. One uh the model spec is really we we kind of treat it as a north star where this is where we align on where we're trying to head and so the the spec often leads where our models actually are today. Mhm. So that that's one thing and then you know another is that the the process of actually training models to follow the spec is you know it's a it's it's both an art and a science it's incredibly complicated. You know even though we kind of describe many of the principles in the spec in the same way there's actually many different techniques that are used for different principles and you know uh at at uh the models are fundamentally non-deterministic. they uh um you know there's some randomness in the outputs they produce. So nothing's ever going to be uh perfectly perfectly aligned. Um so yeah the I guess uh the answer to that uh comes down to like if we we see an output that is not what's not expected. I guess the first question is like uh do we do we think that output is good or or bad? Uh you know if the if the output contradicts the the spec but we actually think the output is good then maybe the resolution is to go back and change the policies of the spec. But yeah, it yeah, in most cases it probably means doing doing some kind of uh some kind of training intervention that uh that brings the model uh into greater alignment with the with the spec or with our detailed policies. And in fact, we've uh um we we've also been building model spec eval which try to evaluate how our models are doing across the entire model spec. And we've seen that in in fact over time our models are becoming more and more aligned to the the principles in the spec. I think that was one of the kind of predictions early on as the models became smarter they would understand edge cases better and that's where the hard part is is trying to figure that out. So open released some new models some smaller variants GPD 5.4 mini and GPD 5.4 Nano. How well do you see smaller models handling the the spec? I think in general the small models are um they they they've been pretty pretty aligned. They're pretty smart and they're uh one one interesting thing that we've seen is that uh you know uh supporting what you said the thinking models generally follow the spec better. Mhm. Um this is uh you know both because they're smarter and because uh they're trained partially with deliberative alignment where they actually they're not just trained to behave in a way that matches the policies. they actually understand the policies and you know if you can look at their train of thought they're actually thinking through like okay I know this is the policy and this is the situation and oh it's in conflict with this other policy and how should I resolve this and so uh that that sort of understanding of the policies and intelligence uh naturally leads to to better generalization and I think our smaller models are uh pretty good at that too. Chain of thought is a really interesting way to see inside how these models are processing information. Have you found that that's been a big help? I help uh write the model spec and I work on model specs and spec compliance, but uh a lot of the research I've been doing recently is is actually on like scheming or or strategic deception. And there it's it's really completely uh essential having the chain of thought because you can see some behavior and uh yeah it's like the the behavior seems like maybe fine or like oh maybe the model just like made a mistake here or something and then you know you can look at the chain of thought and and see that no actually the model's misbehaving. It's uh you know it's it's uh uh being very strategic about about this or something and and uh yeah our models generally I think we work very hard to not supervise the chain of thought. this is something we we feel is like really important and um I think yeah it it pays off and that models are very honest in their train of thought and it's it's very helpful in in understanding what they're doing. So model spec is one way to do this. Different labs have tried different approaches. I think in anthropic they use they talk about a constitution. Could you explain the difference and why you know is it just more suited towards the temperament of the labs and why they choose it? Yeah, I think when it comes down to the actual behaviors uh that that people would see in practice, I think these documents are are more aligned than maybe most people would believe. like in most cases they they probably lead to the same conclusions although they're defin definitely differences in uh in some places and in what's emphasized. Uh I think a major difference is that these are actually just like different kinds of documents. So the model spec is really again this this public behavioral interface. It's its main goal is to explain to people how they should expect the model to behave. Um, and it's sort of a secondary goal that models can also like understand this and apply it and and talk about it with users and so on versus uh at least my read of the like the the the sole spec is that it's much more of an implementation artifact like the the goal of of this is to specifically teach Claude about uh what what its identity is and how it should relate to the world and to its training process and to to anthropic and so on. Um and and so I think a lot of the the differences uh basically come down come down to this. Um and I think these aren't necessarily competing approaches. Like I I think both of these could be valuable. uh but for example you know even if you you had a model that you think is uh deeply aligned and uh you know has all the the values that you want and so on. I I think you still want something like the model spec so that you can then look at that and and you can ask like okay did this this is actually uh generalized in the way that I want is it actually following the behaviors that that kind of we've agreed that the model uh should follow and like that's kind of what the the model spec is what surprised you the most the example I gave earlier of this this interaction of confidential confidentiality honesty is a great one where yeah we we had uh worked really hard on these policies and we thought we had kind of you know, red teamed out all of the the potential interactions and so on. And then seeing this behavior where like the model does something that you you really don't want it to do and justifies it by by leaning on the the policies that you gave it is uh uh yeah, that's definitely um yeah, an an experience. But how do you determine what the scope of it's going to be? Like I have ideas. How do you say I'm sorry, Andrew. Uh no, I mean I think that the scope is uh broadly everything. So, you know, if if if it's a part of model behavior, it it might make sense to put it in the spec. I think, you know, the the only constraint is uh sort of our our time and and and space, and we we want uh to make sure the spec stays accessible and people actually able to to read and understand it. So I think uh ultimately the the cut comes down to if something is if something seems like an important decision that it would be useful or valuable for uh for especially the the public to understand then then we put it in and if not then uh maybe it doesn't make the cut. Where do you think the future of this goes? Do you think that you the model spec is probably something that's going to be used five years from now 10 years from now? five five years is a lot in AI years but uh yeah I I definitely hope so. I think um yeah I think a a thought experiment that that I found interesting is let's say you assume that that a model is is like human level hi you can ask well do do you still is there still a role for the model spec like at that point can you just tell the model like hey be good and is that sufficient um and I think if you actually go through the principles in the the spec I I think At least my conclusion is that you still kind of want all the things that are in there uh for a few different reasons. One is that, you know, even if the model could figure this stuff out on its own, it's still useful to be able to set clear expectations with uh you know, both internally and externally for people to to know what to expect. And so it's like useful to uh useful to have uh a lot of these these policies. Um, another is that a lot of these are not, you know, they're not like math problems where you can just figure out the answer. It's like we uh we've made product decisions or uh other difficult decisions or and these are encoded in this spec and these are not just things that you can uh kind of think you know you could uh the model would be expected to figure out on its own. That said, I think uh yeah, I think what's important is definitely going to evolve over time. So, uh yeah, one thing is as there's more uh you know, agents are more and more autonomous and they're out in the world, you know, interacting with lots of other people and agents and transacting and and so on. like you know I think you still want all this stuff in the spec just like you know society has all these like laws but but ultimately you know what what's important what you think are thinking about most of the time day-to-day is not like following all the laws right it's more more like things like trust and figuring out what other what other people want and you know how to find positive sum outcomes and you know this kind of stuff so I think I think there'll be you know maybe yeah the these kind of skills will become more and more important and I'm not sure if these are exactly spec shaped. So, uh I don't know quite what that means, but I think it it's it's interesting. Um another uh maybe observation like the other direction or prediction is that as AI becomes uh more and more useful it's going to be more and more um worthwhile for people companies so on to invest in their own specs like uh you know want you you know why why wouldn't you want to have the the model spec for you know your own uh companies uh bots and how they should behave and you know following your your company's mission and values and and so on and so forth. And I think there's different ways that that could play out, but uh probably at least one way will be uh just training models to be really good at interpreting these specs on the fly. And uh so everyone can going to kind of you know put their put their spec in context kind of like in agents.mmd or something like that. and and the model will be really good at following it and and probably also at uh helping update the spec as it learns more about how it's supposed to behave in a certain environment. You've mentioned before developers and I think it's helpful for a lot of people to understand that they're not always interacting with a model spec when they're in chat GBT. I might be using some customer service bot with a airline or something like that and it may be powered by chat GBT and OpenAI API and that seems like it'd be an very interesting area for other developers to start thinking about their approach towards things that are model spec or model spec like yeah on the one hand it's probably useful for developers to at least have a high level picture of of the model spec and how it works so they uh understand how the the uh how exactly the product they they build on the API is going to work and what they should you know what they should put in their developer messages to make sure they get uh get the the experience that they want. Um, I also think yeah, uh, the spec could be a useful sort of source of inspiration for both for developers building on our API or these days really for also for people using coding agents who are uh, you know, writing agents.m MD and so on which are are kind of like mini specs for the project that you're working on and um, yeah, uh, just kind of using the spec to to understand like what what principles have we found are useful for providing guidance that is uh that's sort of understandable and and actionable. Um a couple tips I I could give there is that uh yeah, we're we're uh always kind of trying to balance a couple different factors when when we're writing the spec. First and foremost, we want everything we say to be true. We want it to be actually accurately reflect our intentions. And so this means not not kind of overstating or oversimplifying or giving overly broad guidance. Really making sure to be like precise. And um then on on the other side we uh we also want the guidance to be meaningful and actionable. Again it's sort of very easy to uh kind of just like gesture at some high level principles but not actually saying anything meaningful. And so the art is is trying to uh kind of uh bring these as close together as you can, right? Be as as uh as sort of actionable as you can while still being um still being precise. And examples are are another really useful way to do this where like sometimes a picture is worth a thousand words, right? like coming up with the really tricky case where it's kind of not not immediately clear what what should happen and spelling that out and how the principles should be applied suddenly makes the principles like uh you know 100 times clearer. Where did you get this interest to begin with? We we understood some of your career but was this something early on when you were a kid? Were you thinking about AI? Were you thinking about the future of this? Uh yeah, I guess I I've had at least a little interest in in AI for for a long time. I I was programming from since when I was little. I remember implementing a a neural network training package from from scratch in in like 1997 in high school or something like that. Um uh but yeah, I definitely never never expected to to see this level of of uh sort of capability and uh in in my lifetime, but I've just always been fascinated by by intelligence and brains and and how they work. So, it's it's really cool to be able to to work on that. You ever read any Isaac Azimoff when you were younger? Uh, yeah, I have. It's uh it's been a while. Um but yeah, I think there's uh there there's actually a really interesting parallel here where at the top of the spec. Um let's see. We we uh talk about our three goals in in deploying models being to uh empower users and developers uh protect society from from serious harm and uh to uh maintain open AI's license to operate. And I think you can look at these and put them next to Azmov's laws which are are basically to you know follow instructions don't harm uh don't harm any humans and uh don't harm yourself and uh you know these seem like extremely parallel um yeah and I think uh yeah he it was sort of very precient in seeing that you know okay it's it's it's one thing to lay out these goals but then the the really tricky thing is how to how to handle conflicts and I think in his his story is kind of the the the initial version of this was that this is a strict hierarchy where it was like one then two then three and then going through all the ways in which this uh this might play out in ways that were not actually good or intentional. So so in the spec we these three are are not in a strict hierarchy. Yeah. He also had to add like a zeroith law and whatnot the more he thought about it. But it's it's interesting because you you start off thinking oh this will be easy. We'll just write a couple rules no problem. And then you're like oh well there's an exception here. there's an exception there and you have to keep evolving it. How much has using AI helped you shape the model spec? Uh yeah, it's a good question. The the AI is uh yeah, it's very useful and getting more more and more useful all the time. I think uh you know the the spec itself is uh still you know human human written but I I think uh model is really useful for you know finding finding issues in the spec or for you know applying the spec to new cases and trying to understand if it's uh doing doing what we want. Um, at this point, you know, models are are even pretty good at like kind of going out and finding new interesting examples or like helping to brainstorm, you know, new test cases or interactions between different principles that you might not have thought of and come up with with new situations that uh then we can kind of think through like how do we actually want to to resolve these. Have you ever thought about asking it to write a spec for you? Uh, I haven't, but I'll have to try Uh, well Jason, thank you very much. This is very interesting. I'm excited to see where this goes. Yeah. Thank you. This been fun.

Get daily recaps from
OpenAI

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.