AI prompt engineering: A deep dive
Summary
TLDRThe roundtable discussion centers on the practice and evolution of prompt engineering, exploring its role in guiding AI models like chatbots to effectively execute tasks. Prompt engineering, described as a form of communication that involves crafting detailed instructions for AI, is crucial for maximizing AI potential, particularly in varied contexts such as consumer applications and enterprise solutions. Participants share insights into what makes a good prompt engineer, emphasizing clear communication and iterative testing. They reflect on how prompt engineering has evolved alongside the development of more complex AI models, moving from elementary guides to sophisticated interactions that align closely with human reasoning. As AI models get smarter, the consensus suggests that the need for heavy prompt engineering may lessen. Future prompting may involve deeper collaboration with AI, where models help in refining and generating effective prompts by understanding and clarifying user intents. This discussion also underscores the significant shift towards refining prompts to cover edge cases and complex scenarios, thereby enhancing consistency and effectiveness in diverse applications.
Takeaways
- ๐ฏ Prompt engineering is about effectively communicating tasks to AI models.
- ๐ก It's considered engineering due to the iterative trial and error process.
- ๐ฃ๏ธ Clear communication and anticipating edge cases are key in prompt engineering.
- ๐ Prompt engineering has evolved from basic instructions to sophisticated interactions.
- ๐ฅ The future may see AI models facilitating prompt creation by interviewing users.
- ๐ A good prompt engineer maximizes AI capabilities within varied contexts.
- ๐ Reading and iterating on model outputs improves prompting skills.
- ๐ ๏ธ Models might not require detailed prompts as they get smarter.
- ๐ Examples in prompts help set expectations and guide responses.
- ๐ง Effective prompts often stem from understanding AI model behavior.
Timeline
- 00:00:00 - 00:05:00
The roundtable session is focused on prompt engineering, with perspectives from research, consumer, and enterprise sides. Alex, leading Developer Relations at Anthropic, introduces himself and the panelists. The goal is to explore prompt engineering and gather diverse opinions.
- 00:05:00 - 00:10:00
Panelists from Anthropic introduce themselves, mentioning their roles related to prompt engineering and work with customers. Zack and Alex humorously contend who was the first prompt engineer. The discussion transitions to defining what prompt engineering is.
- 00:10:00 - 00:15:00
Prompt engineering is described as the process of getting the most out of a language model, much like communicating clearly with a person. It involves trial and error, iterating on messages, and integrating prompts into systems, which constitutes the engineering aspect.
- 00:15:00 - 00:20:00
Panelists discuss the role of prompt engineering in designing prompts that integrate well with systems and consider factors like data access and latency. It's likened to programming models to achieve desired outcomes.
- 00:20:00 - 00:25:00
David explains that the precision required in prompts is akin to programming, involving elements like version control and managing experiments. The conversation shifts towards what qualities make someone a good prompt engineer.
- 00:25:00 - 00:30:00
Amanda highlights qualities of a good prompt engineer as clear communication, ability to iterate, and foresight into potential issues. She emphasizes understanding tasks deeply and iterating based on model feedback to refine prompts effectively.
- 00:30:00 - 00:35:00
David shares insights on real-world applications, noting that users often input unexpected, unstructured queries. Zack emphasizes the importance of reading model outputs to understand and refine how models think and respond.
- 00:35:00 - 00:40:00
The discussion explores trust in model outputs, with Amanda explaining how examining model responses over a substantial number of queries helps in assessing reliability. High-quality prompts can significantly impact the success of experiments.
- 00:40:00 - 00:45:00
The topic of prompt engineering tricks versus systematic methods is touched upon, with preferences for clear, concise communication over relying on potential model shortcuts. Overreliance on abstract conceptual prompts is cautioned against.
- 00:45:00 - 00:50:00
Panelists discuss the future of prompt engineering, noting the importance of familiarity with model capabilities and limitations. Techniques like chain of thought, while useful, are not foolproof indicators of understanding or reasoning.
- 00:50:00 - 00:55:00
The role of grammar and punctuation in prompts is debated. While irrelevant for some users, a meticulous style may ensure clarity and correctness. The evolution of prompting methods, from basic existences to today's more nuanced interactions, is reflected upon.
- 00:55:00 - 01:00:00
The conversation touches on the evolutionary trajectory of promptingโfrom imitating styles and conventions familiar to users, to advancing towards more interactive models possibly minimizing the need for extensive prompt engineering.
- 01:00:00 - 01:05:00
Alex switches the discussion to the changes and trends in prompt engineering over the years, highlighting how increased model capabilities influence the necessity and methods of prompting, particularly with fewer prompting hacks over time.
- 01:05:00 - 01:10:00
Panelists reflect on how prompt engineering may evolve, speculating that the growing capabilities of models will necessitate less manual prompting intervention. As models improve, the interaction may shift towards user-intent elicitation rather than crafting explicit instructions.
- 01:10:00 - 01:16:42
Concluding thoughts propose a future where models play increasingly active roles in tasks, effectively assuming positions that prompt engineers currently facilitate, reducing the need for prompt engineering as models naturally understand nuanced instructions.
Mind Map
Video Q&A
What is prompt engineering?
Prompt engineering involves crafting inputs to guide AI models like chatbots to perform desired tasks effectively.
Why is it called 'prompt engineering'?
It's considered engineering due to the iterative process of trial and error to achieve the best results.
What makes a good prompt engineer?
Clear communication, ability to iterate, anticipate odd cases, and effectively extract maximum performance from AI models.
How has prompt engineering evolved over time?
It evolved from writing prompts for simple AI models to more complex interactions as models became more advanced and capable.
Are models expected to handle tasks without prompting in the future?
It is expected that as models become smarter, they might need less detailed prompting to perform tasks effectively.
What role do examples play in prompt engineering?
Examples help shape the AI's responses, illustrating the correct format or style expected, especially in enterprise settings.
Is good grammar necessary for effective prompting?
While not strictly necessary, good grammar helps clarify intent and shows attention to detail, influencing the model's response quality.
Can AI models interview users to refine prompts?
Yes, models can be set to interview users to better understand complex requests, helping improve prompt clarity.
What is the future of prompt engineering likely to involve?
It may involve more collaboration with AI, where models help refine and generate effective prompts, enhancing user interactions.
How can someone improve their prompting skills?
By reading and iterating on model outputs, sharing prompts for feedback, and continuously experimenting with complex tasks.
View more video summaries
- 00:00:00- Basically, this entire roundtable session here
- 00:00:03is just gonna be focused mainly on prompt engineering.
- 00:00:06A variety of perspectives at this table around prompting
- 00:00:10from a research side, from a consumer side,
- 00:00:11and from the enterprise side.
- 00:00:13And I want to just get the whole wide range of opinions
- 00:00:16because there's a lot of them.
- 00:00:18And just open it up to discussion
- 00:00:20and explore what prompt engineering really is
- 00:00:24and what it's all about.
- 00:00:25And yeah, we'll just take it from there.
- 00:00:28So maybe we can go around the horn with intros.
- 00:00:30I can kick it off. I'm Alex.
- 00:00:32I lead Developer Relations here at Anthropic.
- 00:00:35Before that,
- 00:00:36I was technically a prompt engineer at Anthropic.
- 00:00:39I worked on our prompt engineering team,
- 00:00:43and did a variety of roles spanning
- 00:00:45from a solutions architect type of thing,
- 00:00:48to working on the research side.
- 00:00:51So with that, maybe I can hand it over to David.
- 00:00:53- Heck, yeah. My name's David Hershey.
- 00:00:56I work with customers mostly at Anthropic
- 00:00:59on a bunch of stuff technical,
- 00:01:02I help people with finetuning,
- 00:01:04but also just a lot of the generic things
- 00:01:06that make it hard to adopt language models of prompting.
- 00:01:08And just like how to build systems with language models,
- 00:01:11but spend most of my time working with customers.
- 00:01:14- Cool. I'm Amanda Askell.
- 00:01:16I lead one of the Finetuning teams at Anthropic,
- 00:01:19where I guess I try to make Claude be honest and kind.
- 00:01:24Yeah.
- 00:01:26- My name is Zack Witten.
- 00:01:27I'm a Prompt Engineer at Anthropic.
- 00:01:30Alex and I always argue about who the first one was.
- 00:01:32He says it's him, I say it's me.
- 00:01:33- Contested. - Yeah.
- 00:01:35I used to work a lot with individual customers,
- 00:01:38kind of the same way David does now.
- 00:01:40And then as we brought more solutions architects
- 00:01:44to the team, I started working on things
- 00:01:46that are meant to raise the overall levels
- 00:01:50of ambient prompting in society,
- 00:01:53I guess, like the prompt generator
- 00:01:55and the various educational materials that people use.
- 00:01:59- Nice, cool. Well, thanks guys for all coming here.
- 00:02:02I'm gonna start with a very broad question
- 00:02:05just so we have a frame
- 00:02:07going into the rest of our conversations here.
- 00:02:09What is prompt engineering? Why is it engineering?
- 00:02:14What's prompt, really?
- 00:02:15If anyone wants to kick that off,
- 00:02:17give your own perspective on it,
- 00:02:19feel free to take the rein here.
- 00:02:21- I feel like we have a prompt engineer.
- 00:02:23It's his job.
- 00:02:24- We're all prompt engineers in our own form.
- 00:02:27- But one of us has a job.
- 00:02:28- Yeah. Zack, maybe since it's in your title.
- 00:02:30- One of us has a job, but the other three don't have jobs.
- 00:02:35- I guess I feel like prompt engineering
- 00:02:37is trying to get the model to do things,
- 00:02:40trying to bring the most out of the model.
- 00:02:42Trying to work with the model to get things done
- 00:02:46that you wouldn't have been able to do otherwise.
- 00:02:49So a lot of it is just clear communicating.
- 00:02:52I think at heart,
- 00:02:55talking to a model is a lot like talking to a person.
- 00:02:57And getting in there
- 00:02:59and understanding the psychology of the model,
- 00:03:02which Amanda is the world's most expert person in the world.
- 00:03:08- Well, I'm gonna keep going on you.
- 00:03:10Why is engineering in the name?
- 00:03:13- Yeah.
- 00:03:14I think the engineering part comes from the trial and error.
- 00:03:18- Okay.
- 00:03:18- So one really nice thing about talking to a model
- 00:03:23that's not like talking to a person,
- 00:03:24is you have this restart button.
- 00:03:25This giant go back to square zero
- 00:03:28where you just start from the beginning.
- 00:03:29And what that gives you the ability to do
- 00:03:30that you don't have, is a truly start from scratch
- 00:03:34and try out different things in an independent way,
- 00:03:38so that you don't have interference from one to the other.
- 00:03:40And once you have that ability to experiment
- 00:03:43and to design different things,
- 00:03:45that's where the engineering part has the potential
- 00:03:48to come in.
- 00:03:49- Okay.
- 00:03:50So what you're saying is as you're writing these prompts,
- 00:03:53you're typing in a message to Claude or in the API
- 00:03:55or whatever it is.
- 00:03:57Being able to go back and forth with the model
- 00:04:00and to iterate on this message,
- 00:04:02and revert back to the clean slate every time,
- 00:04:06that process is the engineering part.
- 00:04:08This whole thing is prompt engineering all in one.
- 00:04:13- There's another aspect of it too,
- 00:04:15which is integrating the prompts
- 00:04:19within your system as a whole.
- 00:04:21And David has done a ton of work with customers integrating.
- 00:04:26A lot of times it's not just as simple
- 00:04:28as you write one prompt and you give it to the model
- 00:04:30and you're done.
- 00:04:30In fact, it's anything but. It's like way more complicated.
- 00:04:32- Yeah.
- 00:04:34I think of prompts as the way
- 00:04:36that you program models a little bit,
- 00:04:38that makes it too complicated.
- 00:04:40'Cause I think Zack is generally right
- 00:04:41that it's just talking clearly is the most important thing.
- 00:04:45But if you think about it a little bit
- 00:04:47as programming a model, you have to think about
- 00:04:49where data comes from, what data you have access to.
- 00:04:51So if you're doing RAG or something,
- 00:04:53what can I actually use and do and pass to a model?
- 00:04:57You have to think about trade-offs in latency
- 00:05:02and how much data you're providing and things like that.
- 00:05:03There's enough systems thinking
- 00:05:04that goes into how you actually build around a model.
- 00:05:07I think a lot of that's also the core
- 00:05:08of why it maybe deserves its own carve-out as a thing
- 00:05:13to reason about separately from just a software engineer
- 00:05:16or a PM or something like that.
- 00:05:17It's kind of its own domain
- 00:05:18of how to reason about these models.
- 00:05:20- Is a prompt in this sense then natural language code?
- 00:05:24Is it a higher level of abstraction
- 00:05:26or is it a separate thing?
- 00:05:28- I think trying to get too abstract with a prompt is a way
- 00:05:33to overcomplicate a thing, because I think,
- 00:05:37we're gonna get into it, but more often than not,
- 00:05:38the thing you wanna do
- 00:05:39is just write a very clear description of a task,
- 00:05:42not try to build crazy abstractions or anything like that.
- 00:05:47But that said, you are compiling the set of instructions
- 00:05:51and things like that into outcomes a lot of times.
- 00:05:54So precision and a lot the things
- 00:05:57you think about with programming about version control
- 00:06:00and managing what it looked like
- 00:06:01back then when you had this experiment.
- 00:06:03And tracking your experiment and stuff like that,
- 00:06:06that's all just equally important to code.
- 00:06:11- Yeah.
- 00:06:12- So it's weird to be in this paradigm where written text,
- 00:06:15like a nice essay that you wrote is something
- 00:06:18that's looked like the same thing as code.
- 00:06:22But it is true that now we write essays
- 00:06:25and treat them code, and I think that's actually correct.
- 00:06:27- Yeah. Okay, interesting.
- 00:06:29So maybe piggybacking off of that,
- 00:06:32we've loosely defined what prompt engineering is.
- 00:06:36So what makes a good prompt engineer?
- 00:06:38Maybe, Amanda, I'll go to you for this,
- 00:06:41since you're trying to hire prompt engineers
- 00:06:43more so in a research setting.
- 00:06:45What does that look like?
- 00:06:46What are you looking for in that type of person?
- 00:06:49- Yeah, good question.
- 00:06:50I think it's a mix of like Zack said, clear communication,
- 00:06:55so the ability to just clearly state things,
- 00:06:58clearly understand tasks,
- 00:07:00think about and describe concepts really well.
- 00:07:03That's the writing component, I think.
- 00:07:05I actually think that being a good writer
- 00:07:08is not as correlated with being a good prompt engineer
- 00:07:12as people might think.
- 00:07:13So I guess I've had this discussion with people
- 00:07:15'cause I think there's some argument as like,
- 00:07:16"Maybe you just shouldn't have the name engineer in there.
- 00:07:19Why isn't it just writer?"
- 00:07:22I used to be more sympathetic to that.
- 00:07:23And then, I think, now I'm like what you're actually doing,
- 00:07:27people think that you're writing one thing and you're done.
- 00:07:31Then I'll be like to get a semi-decent prompt
- 00:07:34when I sit down with the model.
- 00:07:37Earlier, I was prompting the model
- 00:07:38and I was just like in a 15-minute span
- 00:07:40I'll be sending hundreds of prompts to the model.
- 00:07:42It's just back and forth, back and forth, back and forth.
- 00:07:45So I think it's this willingness to iterate and to look
- 00:07:48and think what is it that was misinterpreted here,
- 00:07:51if anything?
- 00:07:52And then fix that thing.
- 00:07:55So that ability to iterate.
- 00:07:57So I'd say clear communication, that ability to iterate.
- 00:08:01I think also thinking about ways
- 00:08:03in which your prompt might go wrong.
- 00:08:05So if you have a prompt
- 00:08:06that you're going to be applying to say, 400 cases,
- 00:08:09it's really easy to think about the typical case
- 00:08:11that it's going to be applied to,
- 00:08:12to see that it gets the right solution in that case,
- 00:08:14and then to move on.
- 00:08:15I think this is a very classic mistake that people made.
- 00:08:19What you actually want to do is find the cases
- 00:08:21where it's unusual.
- 00:08:23So you have to think about your prompt and be like,
- 00:08:25"What are the cases where it'd be really unclear to me
- 00:08:26what I should do in this case?"
- 00:08:28So for example, you have a prompt that says,
- 00:08:29"I'm going to send you a bunch of data.
- 00:08:31I want you to extract all of the rows
- 00:08:33where someone's name is, I don't know,
- 00:08:36starts with the letter G."
- 00:08:37And then you're like, "Well, I'm gonna send it a dataset
- 00:08:39where there is no such thing,
- 00:08:41there is no such name that starts with the letter G.
- 00:08:43"I'm going to send it something that's not a dataset,
- 00:08:45I might also just send it an empty string.
- 00:08:48These are all of the cases you have to try,
- 00:08:49because then you're like, "What does it do in these cases? "
- 00:08:51And then you can give it more instructions
- 00:08:53for how it should deal with that case.
- 00:08:55- I work with customers so often where you're an engineer,
- 00:08:59you're building something.
- 00:09:00And there's a part in your prompt where a customer of theirs
- 00:09:03is going to write something.
- 00:09:04- Yeah.
- 00:09:05- And they all think
- 00:09:06about these really perfectly phrased things
- 00:09:07that they think someone's going to type into their chatbot.
- 00:09:09And in reality, it's like they never used the shift key
- 00:09:12and every other word is a typo.
- 00:09:15- They think it's Google. - And there's no punctuation.
- 00:09:17- They just put in random words with no question.
- 00:09:18- Exactly.
- 00:09:20So you have these evals
- 00:09:21that are these beautifully structured
- 00:09:22what their users ideally would type in.
- 00:09:24But being able to go the next step
- 00:09:26to reason about what your actual traffic's gonna be like,
- 00:09:29what people are actually gonna to try to do,
- 00:09:31that's a different level of thinking.
- 00:09:33- One thing you said that really resonated with me
- 00:09:35is reading the model responses.
- 00:09:37In a machine learning context,
- 00:09:39you're supposed to look at the data.
- 00:09:41It's almost a cliche like look at your data,
- 00:09:43and I feel like the equivalent for prompting
- 00:09:45is look at the model outputs.
- 00:09:48Just reading a lot of outputs and reading them closely.
- 00:09:51Like Dave and I were talking on the way here,
- 00:09:52one thing that people will do
- 00:09:53is they'll put think step-by-step in their prompt.
- 00:09:57And they won't check to make sure
- 00:09:58that the model is actually thinking step-by-step,
- 00:10:00because the model might take it in a more abstract
- 00:10:04or general sense.
- 00:10:05Rather than like,
- 00:10:06"No, literally you have to write down your thoughts
- 00:10:08in these specific tags."
- 00:10:10So yeah, if you aren't reading the model outputs,
- 00:10:14you might not even notice that it's making that mistake.
- 00:10:16- Yeah, that's interesting.
- 00:10:19There is that weird theory of mind piece
- 00:10:22to being a prompt engineer
- 00:10:23where you have to think almost about
- 00:10:25how the model's gonna view your instructions.
- 00:10:27But then if you're writing for an enterprise use case too,
- 00:10:29you also have to think about
- 00:10:30how the user's gonna talk to the model,
- 00:10:32as you're the third party sitting there
- 00:10:34in that weird relationship.
- 00:10:37Yeah.
- 00:10:39- On the theory of mind piece, one thing I would say is,
- 00:10:43it's so hard to write instructions down for a task.
- 00:10:48It's so hard to untangle in your own brain
- 00:10:51all of the stuff that you know
- 00:10:53that Claude does not know and write it down.
- 00:10:56It's just an immensely challenging thing
- 00:10:57to strip away all of the assumptions you have, and be able
- 00:11:00to very clearly communicate the full fact set of information
- 00:11:04that is needed to a model.
- 00:11:05I think that's another thing
- 00:11:06that really differentiates a good prompt engineer
- 00:11:08from a bad one, is like...
- 00:11:10A lot of people will just write down the things they know.
- 00:11:13But they don't really take the time
- 00:11:15to systematically break out
- 00:11:17what is the actual full set of information you need to know
- 00:11:19to understand this task?
- 00:11:21- Right.
- 00:11:22- And that's a very clear thing I see a lot
- 00:11:24is prompts where it's just conditioned.
- 00:11:28The prompt that someone wrote is so conditioned
- 00:11:30on their prior understanding of a task,
- 00:11:33that when they show it to me I'm like, "This makes no sense.
- 00:11:36None of the words you wrote make any sense,
- 00:11:38because I don't know anything
- 00:11:39about your interesting use case."
- 00:11:42But I think a good way to think about prompt engineering
- 00:11:45in that front and a good skill for it,
- 00:11:47is just can you actually step back from what you know
- 00:11:51and communicate to this weird system that knows a lot,
- 00:11:54but not everything about what it needs to know to do a task?
- 00:11:58- Yeah.
- 00:11:59The amount of times I've seen someone's prompt
- 00:12:00and then being like,
- 00:12:01"I can't do the task based on this prompt."
- 00:12:04I'm human level and you're giving this to something
- 00:12:06that is worse than me and expecting it to do better,
- 00:12:10and I'm like, "Yeah."
- 00:12:12- Yeah.
- 00:12:13There is that interesting thing with like...
- 00:12:15Current models don't really do a good job
- 00:12:19of asking good, probing questions in response
- 00:12:22like a human would.
- 00:12:23If I'm giving Zack directions on how to do something,
- 00:12:26he'll be like, "This doesn't make any sense.
- 00:12:28What am I supposed to do at this step or here and here?"
- 00:12:30Model doesn't do that, right, so you have to, as yourself,
- 00:12:34think through what that other person would say
- 00:12:37and then go back to your prompt and answer those questions.
- 00:12:40- You could ask it to do that.
- 00:12:41- You could. That's right. - I do that, yeah.
- 00:12:43- I guess that's another step.
- 00:12:44- I was going to say one of the first things I do
- 00:12:45with my initial prompt,
- 00:12:46is I'll give it the prompt and then I'll be like,
- 00:12:48"I don't want you to follow these instructions.
- 00:12:50I just want you to tell me the ways in
- 00:12:51which they're unclear or any ambiguities,
- 00:12:53or anything you don't understand."
- 00:12:54And it doesn't always get it perfect,
- 00:12:55but it is interesting that that is one thing you can do.
- 00:12:59And then also sometimes if people see
- 00:13:01that the model makes a mistake,
- 00:13:01the thing that they don't often do is just ask the model.
- 00:13:04So they say to the model, "You got this wrong.
- 00:13:06Can you think about why?
- 00:13:07And can you maybe write an edited version of my instructions
- 00:13:09that would make you not get it wrong?"
- 00:13:11And a lot of the time, the model just gets it right.
- 00:13:14The model's like, "Oh, yeah.
- 00:13:15Here's what was unclear, here's a fix to the instructions,"
- 00:13:18and then you put those in and it works.
- 00:13:20- Okay.
- 00:13:21I'm actually really curious about this personally almost.
- 00:13:23Is that true that that works?
- 00:13:26Is the model able to spot its mistakes that way?
- 00:13:29When it gets something wrong, you say,
- 00:13:31"Why did you get this wrong?"
- 00:13:32And then it tells you maybe something like,
- 00:13:34"Okay, how could I phrase this to you in the future
- 00:13:37so you get it right?"
- 00:13:38Is there an element of truth to that?
- 00:13:40Or is that just a hallucination on the model's part
- 00:13:43around what it thinks its limits are?
- 00:13:46- I think if you explain to it what it got wrong,
- 00:13:49it can identify things in the query sometimes.
- 00:13:52I think this varies by task.
- 00:13:53This is one of those things where I'm like I'm not sure
- 00:13:56what percentage of the time it gets it right,
- 00:13:57but I always try it 'cause sometimes it does.
- 00:14:00- And you learn something. - Yeah.
- 00:14:01- Anytime you go back to the model
- 00:14:03or back and forth with the model,
- 00:14:04you learn something about what's going on.
- 00:14:06I think you're giving away information
- 00:14:08if you don't at least try.
- 00:14:11- That's interesting.
- 00:14:12Amanda, I'm gonna keep asking you a few more questions here.
- 00:14:15One thing maybe for everybody watching this,
- 00:14:18is we have these Slack channels at Anthropic
- 00:14:20where people can add Claude into the Slack channel,
- 00:14:24then you can talk to Claude through it.
- 00:14:26And Amanda has a Slack channel
- 00:14:28that a lot of people follow of her interactions with Claude.
- 00:14:32And one thing that I see you always do in there,
- 00:14:34which you probably do the most of anyone at Anthropic,
- 00:14:37is use the model to help you
- 00:14:41in a variety of different scenarios.
- 00:14:42I think you put a lot of trust into the model
- 00:14:45in the research setting.
- 00:14:47I'm curious how you've developed those intuitions
- 00:14:49for when to trust the model.
- 00:14:51Is that just a matter of usage,
- 00:14:53experience or is it something else?
- 00:14:55- I think I don't trust the model ever
- 00:14:59and then I just hammer on it.
- 00:15:00So I think the reason why you see me do that a lot,
- 00:15:02is that that is me being like,
- 00:15:04"Can I trust you to do this task?"
- 00:15:06'Cause there's some things, models are kind of strange.
- 00:15:08If you go slightly out of distribution,
- 00:15:11you just go into areas where they haven't been trained
- 00:15:14or they're unusual.
- 00:15:15Sometimes you're like,
- 00:15:15"Actually, you're much less reliable here,
- 00:15:17even though it's a fairly simple task."
- 00:15:21I think that's happening less and less over time
- 00:15:22as models get better,
- 00:15:23but you want to make sure you're not in that kind of space.
- 00:15:26So, yeah, I don't think I trust it by default,
- 00:15:28but I think in ML,
- 00:15:29people often want to look across really large datasets.
- 00:15:33And I'm like, "When does it make sense to do that?"
- 00:15:35And I think the answer is when you get relatively low signal
- 00:15:38from each data point,
- 00:15:39you want to look across many, many data points,
- 00:15:42because you basically want to get rid of the noise.
- 00:15:44With a lot of prompting tasks,
- 00:15:46I think you actually get really high signal from each query.
- 00:15:49So if you have a really well-constructed set
- 00:15:52of a few hundred prompts,
- 00:15:53that I think can be much more signal
- 00:15:55than thousands that aren't as well-crafted.
- 00:15:59So I do think that I can trust the model
- 00:16:02if I look at 100 outputs of it and it's really consistent.
- 00:16:06And I know that I've constructed those
- 00:16:08to basically figure out all of the edge cases
- 00:16:10and all of the weird things that the model might do,
- 00:16:12strange inputs, et cetera.
- 00:16:14I trust that probably more
- 00:16:16than a much more loosely constructed set
- 00:16:19of several thousand.
- 00:16:22- I think in ML, a lot of times the signals are numbers.
- 00:16:29Did you predict this thing right or not?
- 00:16:31And it'd be looking at the logprobs of a model
- 00:16:34and trying to intuit things, which you can do,
- 00:16:36but it's kind of sketchy.
- 00:16:39I feel like the fact that models output more often than not
- 00:16:42a lot of stuff like words and things.
- 00:16:44There's just fundamentally so much to learn
- 00:16:47between the lines of what it's writing and why and how,
- 00:16:50and that's part of what it is.
- 00:16:51It's not just did it get the task right or not?
- 00:16:54It's like, "How did it get there?
- 00:16:57How was it thinking about it? What steps did it go through?"
- 00:16:59You learn a lot about what is going on,
- 00:17:01or at least you can try to get a better sense, I think.
- 00:17:04But that's where a lot of information comes from for me,
- 00:17:05is by reading the details of what came out,
- 00:17:08not just through the result.
- 00:17:09- I think also the very best of prompting
- 00:17:14can make the difference between a failed
- 00:17:16and a successful experiment.
- 00:17:18So sometimes I can get annoyed if people don't focus enough
- 00:17:21on the prompting component of their experiment,
- 00:17:23because I'm like, "This can, in fact, be the difference
- 00:17:27between 1% performance in the model or 0.1%."
- 00:17:31In such a way that your experiment doesn't succeed
- 00:17:33if it's at top 5% model performance,
- 00:17:35but it does succeed if it's top 1% or top 0.1%.
- 00:17:39And then I'm like, "If you're gonna spend time
- 00:17:40over coding your experiment really nicely,
- 00:17:43but then just not spend time on the prompt."
- 00:17:47I don't know.
- 00:17:48That doesn't make sense to me,
- 00:17:49'cause that can be the difference between life and death
- 00:17:51of your experiment.
- 00:17:52- Yeah.
- 00:17:52And with the deployment too, it's so easy to,
- 00:17:55"Oh, we can't ship this."
- 00:17:57And then you change the prompt around
- 00:17:58and suddenly it's working. - Yeah.
- 00:18:00- It's a bit of a double-edged sword though,
- 00:18:01because I feel like there's a little bit of prompting
- 00:18:03where there's always this mythical, better prompt
- 00:18:07that's going to solve my thing on the horizon.
- 00:18:09- Yeah.
- 00:18:10- I see a lot of people get stuck
- 00:18:11into the mythical prompt on the horizon,
- 00:18:13that if I just keep grinding, keep grinding.
- 00:18:15It's never bad to grind a little bit on a prompt,
- 00:18:17as we've talked, you learn things.
- 00:18:19But it's one of the scary things
- 00:18:22about prompting is that there's this whole world of unknown.
- 00:18:25- What heuristics do you guys have
- 00:18:26for when something is possible versus not possible
- 00:18:30with a perfect prompt, whatever that might be?
- 00:18:33- I think I'm usually checking
- 00:18:35for whether the model kind of gets it.
- 00:18:37So I think for things where I just don't think a prompt
- 00:18:40is going to help, there is a little bit of grinding.
- 00:18:43But often, it just becomes really clear
- 00:18:45that it's not close or something.
- 00:18:49Yeah.
- 00:18:50I don't know if that's a weird one where I'm just like,
- 00:18:52"Yeah, if the model just clearly can't do something,
- 00:18:55I won't grind on it for too long."
- 00:18:58- This is the part that you can evoke
- 00:18:59how it's thinking about it,
- 00:19:00and you can ask it how it's thinking about it and why.
- 00:19:02And you can get a sense of is it thinking about it right?
- 00:19:05Are we even in the right zip code of this being right?
- 00:19:11And you can get a little bit of a kneeling on that front of,
- 00:19:14at least, I feel like I'm making progress
- 00:19:15towards getting something closer to right.
- 00:19:19Where there's just some tasks
- 00:19:20where you really don't get anywhere closer
- 00:19:23to it's thought process.
- 00:19:24It's just like every tweak you make
- 00:19:27just veers off in a completely different,
- 00:19:29very wrong direction, and I just tend to abandon those.
- 00:19:31I don't know.
- 00:19:32- Those are so rare now though,
- 00:19:33and I get really angry at the model when I discover them
- 00:19:36because that's how rare they are.
- 00:19:38I get furious.
- 00:19:39I'm like, "How dare there be a task that you can't just do,
- 00:19:43if I just push you in the right direction?"
- 00:19:46- I had my thing with Claude plays Pokemon recently,
- 00:19:49and that was one of the rare times where I really...
- 00:19:51- Yeah, can you explain that?
- 00:19:52Explain that just for people. I think that's really cool.
- 00:19:54- I did a bit of an experiment
- 00:19:56where I hooked Claude up to a Game Boy emulator,
- 00:19:59and tried to have it play the game Pokemon Red
- 00:20:02like the OG Pokemon.
- 00:20:05And it's like you think what you wanna do
- 00:20:09and it could write some code to press buttons
- 00:20:10and stuff like that, pretty basic.
- 00:20:12And I tried a bunch of different very complex
- 00:20:15prompting layouts, but you just get into certain spots
- 00:20:18where it just really couldn't do it.
- 00:20:21So showing it a screenshot of a Game Boy,
- 00:20:24it just really couldn't do.
- 00:20:26And it just so deeply because I'm so used to it,
- 00:20:28being able to do something mostly.
- 00:20:32So I spent a whole weekend trying to write better
- 00:20:37and better prompts to get it
- 00:20:38to really understand this Game Boy screen.
- 00:20:41And I got incrementally better so that it was only terrible
- 00:20:44instead of completely no signal.
- 00:20:46You could get from no signal to some signal.
- 00:20:49But it was, I don't know, at least this is elicited for me.
- 00:20:53Once I put a weekend of time in and I got from no signal
- 00:20:56to some signal, but nowhere close to good enough,
- 00:20:58I'm like, "I'm just going to wait for the next one.
- 00:21:00(Alex laughing)
- 00:21:01I'm just gonna wait for another model."
- 00:21:02I could grind on this for four months,
- 00:21:04and the thing that would come out is another model
- 00:21:07and that's a better use of my time.
- 00:21:09Just sit and wait to do something else in the meanwhile.
- 00:21:11- Yeah.
- 00:21:12That's an inherent tension we see all the time,
- 00:21:14and maybe we can get to that in a sec.
- 00:21:16Zack, if you wanna go.
- 00:21:17- Something I liked about your prompt with Pokemon
- 00:21:19where you got the best that you did get,
- 00:21:22was the way that you explained to the model
- 00:21:24that it is in the middle of this Pokemon game.
- 00:21:27Here's how the things are gonna be represented.
- 00:21:33I actually think you actually represented it
- 00:21:35in two different ways, right?
- 00:21:36- I did.
- 00:21:37So what I ended up doing, it was obnoxious
- 00:21:40but I superimposed a grid over the image,
- 00:21:44and then I had to describe each segment of the grid
- 00:21:46in visual detail.
- 00:21:48Then I had to reconstruct that into an ASCII map
- 00:21:51and I gave it as much detail as I could.
- 00:21:53The player character is always at location 4, 5 on the grid
- 00:21:57and stuff like that,
- 00:21:58and you can slowly build up information.
- 00:22:02I think it's actually a lot like prompting,
- 00:22:03but I just hadn't done it with images before.
- 00:22:05Where sometimes my intuition
- 00:22:08for what you need to tell a model about text,
- 00:22:10is a lot different
- 00:22:11from what you need to tell a model about images.
- 00:22:13- Yeah.
- 00:22:14- I found a surprisingly small number of my intuitions
- 00:22:18about text have transferred to image.
- 00:22:20I found that multi-shot prompting is not as effective
- 00:22:23for images and text.
- 00:22:24I'm not really sure,
- 00:22:25you can have theoretical explanations about why.
- 00:22:27Maybe there's a few of it in the training data,
- 00:22:30a few examples of that.
- 00:22:32- Yeah.
- 00:22:33I know when we were doing the original explorations
- 00:22:34with prompting multimodal,
- 00:22:36we really couldn't get it to noticeably work.
- 00:22:40You just can't seem to improve Claude's actual,
- 00:22:44visual acuity in terms of what it picks up within an image.
- 00:22:48Anyone here has any ways that they've not seen that feature.
- 00:22:51But it seems like that's similar with the Pokemon thing
- 00:22:53where it's trying to interpret this thing.
- 00:22:55No matter how much you throw prompts at it,
- 00:22:57it just won't pick up that Ash that's in that location.
- 00:23:01- Yeah.
- 00:23:02But I guess to be visceral about this,
- 00:23:03I could eventually get it
- 00:23:05so that it could most often tell me where a wall was,
- 00:23:07and most often tell me where the character was.
- 00:23:10It'd be off by a little bit.
- 00:23:11But then you get to a point,
- 00:23:13and this is maybe coming back to knowing
- 00:23:15when you can't do it.
- 00:23:17It would describe an NPC, and to play a game well,
- 00:23:19you need to have some sense of continuity.
- 00:23:21Have I talked to this NPC before?
- 00:23:25And without that, you really don't,
- 00:23:27there's nothing you can do.
- 00:23:28You're just going to keep talking to the NPC,
- 00:23:29'cause like, "Well, maybe this is a different NPC."
- 00:23:31But I would try very hard to get it to describe a NPC
- 00:23:34and it's like, "It's a person."
- 00:23:37They might be wearing a hat, they weren't wearing a hat.
- 00:23:40And it's like you grind for a while,
- 00:23:42inflate it to 3000X and just crop it to just the NPC,
- 00:23:46and it's like, "I have no idea what this is."
- 00:23:48It's like I showed it this clear, female NPC thing
- 00:23:54enough times and it just got nowhere close to it,
- 00:23:56and it's like, "Yeah, this is a complete lost cause."
- 00:23:59- Wow, okay.
- 00:24:00- I really want to try this now.
- 00:24:01I'm just imagining all the things I would try.
- 00:24:04I don't know, I want you to imagine this game art
- 00:24:08as a real human and just describe to me what they're like.
- 00:24:11What did they look like as they look in the mirror?
- 00:24:13And then just see what the model does.
- 00:24:17- I tried a lot of things.
- 00:24:18The eventual prompt was telling Claude
- 00:24:20it was a screen reader for a blind person,
- 00:24:23which I don't know if that helped,
- 00:24:24but it felt right so I stuck with that.
- 00:24:26- That's an interesting point.
- 00:24:27I actually wanna go into this a little bit
- 00:24:29'cause this is one of the most famous prompting tips,
- 00:24:32is to tell the language model that they are some persona
- 00:24:35or some role.
- 00:24:37I feel like I see mixed results.
- 00:24:39Maybe this worked a little bit better in previous models
- 00:24:41and maybe not as much anymore.
- 00:24:43Amanda, I see you all the time be very honest with the model
- 00:24:47about the whole situation like,
- 00:24:48"Oh, I am an AI researcher and I'm doing this experiment."
- 00:24:51- I'll tell it who I am. - Yeah.
- 00:24:52- I'll give it my name,
- 00:24:53be like, "Here's who you're talking to."
- 00:24:54- Right.
- 00:24:55Do you think that level of honesty,
- 00:24:57instead of lying to the model or forcing it to like,
- 00:25:01"I'm gonna tip you $500."
- 00:25:03Is there one method that's preferred there,
- 00:25:06or just what's your intuition on that?
- 00:25:09- Yeah.
- 00:25:10I think as models are more capable and understand more
- 00:25:12about the world, I guess,
- 00:25:13I just don't see it as necessary to lie to them.
- 00:25:18I also don't like lying to the models
- 00:25:20just 'cause I don't like lying generally.
- 00:25:23But part of me is if you are, say, constructing.
- 00:25:26Suppose you're constructing an eval dataset
- 00:25:28for a machine learning system or for a language model.
- 00:25:32That's very different from constructing a quiz
- 00:25:35for some children.
- 00:25:36So when people would do things like,
- 00:25:38"I am a teacher trying to figure out questions for a quiz."
- 00:25:42I'm like, "The model knows what language model evals are."
- 00:25:45If you ask it about different evals it can tell you,
- 00:25:47and it can give you made up examples of what they look like.
- 00:25:50'Cause these things are like they understand them,
- 00:25:52they're on the internet.
- 00:25:54So I'm like,
- 00:25:54"I'd much rather just target the actual task that I have."
- 00:25:56So if you're like, "I want you to construct questions
- 00:25:59that look a lot like an evaluation of a language model."
- 00:26:02It's that whole thing of clear communication.
- 00:26:05I'm like, "That is, in fact, the task I want to do.
- 00:26:07So why would I pretend to you
- 00:26:08that I want to do some unrelated,
- 00:26:11or only tangentially related task?"
- 00:26:13And then expect you to somehow do better at the task
- 00:26:14that I actually want you to do.
- 00:26:16We don't do this with employees.
- 00:26:18I wouldn't go to someone that worked with me and be like,
- 00:26:21"You are a teacher and you're trying to quiz your students."
- 00:26:25I'd be like, "Hey, are you making that eval?" I don't know.
- 00:26:28So I think maybe it's a heuristic from there where I'm like,
- 00:26:31"If they understand the thing,
- 00:26:32just ask them to do the thing that you want."
- 00:26:33- I see this so much. - I guess
- 00:26:34to push back a little bit,
- 00:26:36I have found cases where not exactly lying
- 00:26:40but giving it a metaphor
- 00:26:41for how to think about it could help.
- 00:26:43In the same way that sometimes I might not understand
- 00:26:45how to do something and someone's like,
- 00:26:46"Imagine that you were doing this,
- 00:26:47even though I know I'm not doing it."
- 00:26:49The one that comes to mind for me,
- 00:26:50is I was trying to have Claude say whether an image
- 00:26:54of a chart or a graph is good or not.
- 00:26:57Is it high quality?
- 00:26:59And the best prompt that I found for this
- 00:27:02was asking the model what grade it would give the chart,
- 00:27:05if it were submitted as a high school assignment.
- 00:27:09So it's not exactly saying, "You are a high school teacher."
- 00:27:13It's more like, "This is the kind of analysis
- 00:27:17that I'm looking from for you."
- 00:27:20The scale that a teacher would use is similar to the scale
- 00:27:22that I want you to use.
- 00:27:25- But I think those metaphors are pretty hard
- 00:27:27to still come up with.
- 00:27:27I think people still, the default you see all the time
- 00:27:30is finding some facsimile of the task.
- 00:27:33Something that's a very similar-ish task,
- 00:27:35like saying you're a teacher.
- 00:27:38You actually just lose a lot
- 00:27:40in the nuance of what your product is.
- 00:27:41I see this so much in enterprise prompts
- 00:27:43where people write something similar,
- 00:27:46because they have this intuition
- 00:27:48that it's something the model has seen more of maybe.
- 00:27:51It's seen more high school quizzes than it has LLM evals,
- 00:27:56and that may be true.
- 00:27:58But to your point, as the models get better,
- 00:28:01I think just trying to be very prescriptive
- 00:28:05about exactly the situation they're in.
- 00:28:07I give people that advice all the time.
- 00:28:09Which isn't to say that I don't think to the extent
- 00:28:11that it is true that thinking about it the way
- 00:28:16that someone would grade a chart,
- 00:28:17as how they would grade a high school chart,
- 00:28:19maybe that's true.
- 00:28:21But it's awkwardly the shortcut people use a lot of times
- 00:28:25to try to get what happens,
- 00:28:26so I'll try to get someone that I can actually talk about
- 00:28:28'cause I think it's somewhat interesting.
- 00:28:29So writing you are a helpful assistant,
- 00:28:35writing a draft of a document, it's not quite what you are.
- 00:28:41You are in this product, so tell me.
- 00:28:44If you're writing an assistant that's in a product,
- 00:28:47tell me I'm in the product.
- 00:28:48Tell me I'm writing on behalf of this company,
- 00:28:51I'm embedded in this product.
- 00:28:52I'm the support chat window on that product.
- 00:28:56You're a language model, you're not a human, that's fine.
- 00:28:59But just being really prescriptive
- 00:29:01about the exact context about where something is being used.
- 00:29:05I found a lot of that.
- 00:29:06Because I guess my concern most often with role prompting,
- 00:29:09is people used it as a shortcut
- 00:29:12of a similar task they want the model to do.
- 00:29:13And then they're surprised
- 00:29:14when Claude doesn't do their task right,
- 00:29:16but it's not the task.
- 00:29:18You told it to do some other task.
- 00:29:21And if you didn't give it the details about your task,
- 00:29:23I feel like you're leaving something on the table.
- 00:29:24So I don't know, it does feel like a thing though
- 00:29:28to your point of as the models scale.
- 00:29:31Maybe in the past it was true
- 00:29:32that they only really had a strong understanding
- 00:29:35of elementary school tests comparatively.
- 00:29:39But as they get smarter and can differentiate more topics,
- 00:29:42I don't know, just like being clear.
- 00:29:44- I find it interesting
- 00:29:45that I've never used this prompting technique.
- 00:29:47- Yeah, that's funny.
- 00:29:49- Even with worse models
- 00:29:50and I still just don't ever find myself, I don't know why.
- 00:29:53I'm just like, "I don't find it very good essentially."
- 00:29:57- Interesting.
- 00:29:58- I feel like completion era models,
- 00:30:01there was a little bit of a mental model
- 00:30:03of conditioning the model into a latent space
- 00:30:07that was useful that I worried about,
- 00:30:10that I don't really worry about too much anymore.
- 00:30:12- It might be intuitions from pretrained models
- 00:30:15over to RLHF models, that to me, just didn't make sense.
- 00:30:20It makes sense to me if you're prompting a pretrained.
- 00:30:22- You'd be amazed how many people
- 00:30:23try to apply their intuitions.
- 00:30:25I think it's not that surprising.
- 00:30:27Most people haven't really experimented
- 00:30:29with the full what is a pretrained model?
- 00:30:31What happens after you do SL?
- 00:30:34What happens after you do RLHF, whatever?
- 00:30:39So when I talk to customers,
- 00:30:41it's all the time that they're trying to map some amount of,
- 00:30:44"Oh, how much of this was on the internet?
- 00:30:46Have they seen a ton of this on the internet?"
- 00:30:48You just hear that intuition a lot,
- 00:30:51and I think it's well-founded fundamentally.
- 00:30:54But it is overapplied
- 00:30:58by the time you actually get to a prompt,
- 00:30:59because of what you said.
- 00:31:00By the time they've gone through all of this other stuff,
- 00:31:02that's not actually quite what's being modeled.
- 00:31:05- Yeah.
- 00:31:05The first thing that I feel like you should try is,
- 00:31:08I used to give people this thought experiment
- 00:31:10where it's like imagine you have this task.
- 00:31:13You've hired a temp agency to send someone to do this task.
- 00:31:18This person arrives, you know they're pretty competent.
- 00:31:21They know a lot about your industry and so forth,
- 00:31:23but they don't know the name of your company.
- 00:31:25They've literally just shown up and they're like,
- 00:31:26"Hey, I was told you guys had a job for me to do,
- 00:31:29tell me about it."
- 00:31:30And then it's like, "What would you say to that person?"
- 00:31:33And you might use these metaphors.
- 00:31:34You might say things like,
- 00:31:37"We want you to detect good charts.
- 00:31:41What we mean by a good chart here,
- 00:31:42is it doesn't need to be perfect.
- 00:31:44You don't need to go look up
- 00:31:45whether all of the details are correct."
- 00:31:47It just needs to have its axes labeled,
- 00:31:50and so think about maybe high school level, good chart.
- 00:31:55You may say exactly that to that person
- 00:31:56and you're not saying to them, "You are a high school."
- 00:31:59You wouldn't say that to them.
- 00:32:00You wouldn't be like,
- 00:32:01"You're a high school teacher reading charts."
- 00:32:04- What are you talking about?
- 00:32:05- Yeah, so sometimes I'm just like it's like the whole
- 00:32:10if I read it.
- 00:32:11I'm just like, "Yeah.
- 00:32:11Imagine this person who just has very little context,
- 00:32:13but they're quite competent.
- 00:32:14They understand a lot of things about the world."
- 00:32:16Try the first version that actually assumes
- 00:32:18that they might know things about the world,
- 00:32:20and if that doesn't work, you can maybe do tweaks and stuff.
- 00:32:22But so often, the first thing I try is that,
- 00:32:24and then I'm like, "That just worked."
- 00:32:26- That worked.
- 00:32:27- And then people are like,
- 00:32:28"Oh, I didn't think to just tell it all about myself
- 00:32:30and all about the task I want to do."
- 00:32:31- I've carried this thing that Alex told me
- 00:32:33to so many customers where they're like,
- 00:32:35"Oh, my prompt doesn't work.
- 00:32:37Can you help me fix it?"
- 00:32:37I'm like, "Well, can you describe to me what the task was?"
- 00:32:40And I'm like, "Okay.
- 00:32:41Now what you just said to me,
- 00:32:42just voice record that and then transcribe it."
- 00:32:45And then paste it into the prompt
- 00:32:47and it's a better prompt than what you wrote,
- 00:32:49but this is a laziness shortcut, I think, to some extent.
- 00:32:52Because people write something that they...
- 00:32:55I just think people, I'm lazy. A lot of people are lazy.
- 00:32:57- We had that in prompt assistance the other day
- 00:32:59where somebody was like,
- 00:33:01"Here's the thing, here's what I want it to do,
- 00:33:03and here's what it's actually doing instead."
- 00:33:05So then I just literally copied the thing
- 00:33:06that they said they wanted it to do,
- 00:33:07and pasted it in and it worked.
- 00:33:09- Yeah.
- 00:33:11I think a lot of people still
- 00:33:13haven't quite wrapped their heads
- 00:33:15around what they're really doing when they're prompting.
- 00:33:17A lot of people see a text box
- 00:33:19and they think it's a Google search box.
- 00:33:21They type in keywords
- 00:33:22and maybe that's more on the chat side.
- 00:33:24But then on the enterprise side of things,
- 00:33:26you're writing a prompt for an application.
- 00:33:29There is still this weird thing to it
- 00:33:31where people are trying to take all these little shortcuts
- 00:33:34in their prompt, and just thinking that,
- 00:33:35"Oh, this line carries a lot of weight in this."
- 00:33:37- Yeah.
- 00:33:38I think you obsess over getting the perfect little line
- 00:33:40of information and instruction,
- 00:33:42as opposed to how you just described that graph thing.
- 00:33:45I would be a dream if I read prompts like that.
- 00:33:48If someone's like, "Well, you do this and this,
- 00:33:50and there's some stuff to consider about this and all that."
- 00:33:52But that's just not how people write prompts.
- 00:33:54They work so hard to find the perfect, insightful.
- 00:33:58A perfect graph looks exactly like this exact perfect thing,
- 00:34:02and you can't do that.
- 00:34:04It's just very hard
- 00:34:05to ever write that set of instructions down prescriptively,
- 00:34:08as opposed to how we actually talk to humans about it,
- 00:34:10which is try to instill some amount
- 00:34:12of the intuitions you have.
- 00:34:13- We also give them outs.
- 00:34:15This is a thing that people can often forget in prompts.
- 00:34:18So cases, if there's an edge case,
- 00:34:20think about what you want the model to do.
- 00:34:21'Cause by default,
- 00:34:22it will try the best to follow your instructions,
- 00:34:24much as the person from the temp agency would,
- 00:34:26'cause they're like,
- 00:34:27"Well, they didn't tell me how to get in touch with anyone."
- 00:34:30If I'm just given a picture of a goat and I'm like,
- 00:34:32"What do I do?
- 00:34:33This isn't even a chart.
- 00:34:35How good is a picture of a goat as a chart?"
- 00:34:38I just don't know.
- 00:34:40And if you instead say something like,
- 00:34:42"If something weird happens and you're really not sure
- 00:34:44what to do, just output in tags unsure."
- 00:34:49Then you can go look through the unsures
- 00:34:50that you got and be like, "Okay, cool.
- 00:34:52It didn't do anything weird."
- 00:34:53Whereas by default, if you don't give the person the option,
- 00:34:55they're like, "It's a good chart."
- 00:34:58Then people will be like, "How do I do that?"
- 00:35:00And then you're like, "Well, give it an out.
- 00:35:02Give it something to do
- 00:35:03if it's a really unexpected input happens."
- 00:35:05- And then you also improved your data quality
- 00:35:07by doing that too,
- 00:35:08'cause you found all the screwed up examples.
- 00:35:10- Oh, yeah.
- 00:35:11- That's my favorite thing about iterating on tests
- 00:35:14with Claude, is the most common outcome
- 00:35:15is I find all of the terrible tests I accidentally wrote
- 00:35:19because it gets it wrong.
- 00:35:20I'm like, "Oh, why did it get wrong?"
- 00:35:21I was like, "Oh, I was wrong."
- 00:35:22- Yeah. - Yeah.
- 00:35:25- If I was a company working with this,
- 00:35:27I do think I would just give my prompts to people,
- 00:35:31because I used to do this
- 00:35:32when I was evaluating language models.
- 00:35:34I would take the eval myself.
- 00:35:36'Cause I'm like,
- 00:35:37"I need to know what this eval looks like
- 00:35:38if I'm gonna to be grading it, having models take it,
- 00:35:41thinking about outputs, et cetera."
- 00:35:42I would actually just set up a little script
- 00:35:44and I would just sit and I would do the eval.
- 00:35:47- Nowadays, you just have called the Streamboard app
- 00:35:50for you.
- 00:35:50- And just does it, yeah.
- 00:35:52- Yeah. I'm reminded of Karpathy's ImageNet.
- 00:35:56I was in 231 at Stanford and it's like benchmarking,
- 00:36:01he's showing the accuracy number.
- 00:36:03And he's like, "And here's what my accuracy number was."
- 00:36:05And he had just gone through the test set
- 00:36:06and evaluated himself. - Oh, yeah.
- 00:36:08- You just learn a lot. - Yeah, totally.
- 00:36:09- And it's better when it's a, again,
- 00:36:13the temp agency person,
- 00:36:14like someone who doesn't know the task,
- 00:36:15because that's a very clean way to learn things.
- 00:36:18- Yeah.
- 00:36:19The way you have to do it is,
- 00:36:20some evaluations come with instructions,
- 00:36:23and so I would give myself those instructions as well
- 00:36:25and then try to understand it.
- 00:36:28And it's actually quite great if you don't have context
- 00:36:30on how it's graded.
- 00:36:32And so often, I would do so much worse
- 00:36:34than the human benchmark and I was like,
- 00:36:35"I don't even know how you got humans to do this well
- 00:36:37at this task, 'cause apparently human level here is 90%,
- 00:36:41and I'm at 68%."
- 00:36:45- That's funny.
- 00:36:46That reminds me of just when you look at the MMLU questions
- 00:36:49and you're like, "Who would be able to answer these?"
- 00:36:53It's just like absolute garbage in some of them.
- 00:36:57Okay.
- 00:36:59I have one thing I wanna circle back on
- 00:37:01that we were talking about a few questions back around,
- 00:37:05I think you were saying getting signal from the responses.
- 00:37:08There's just so much there and it's more than just a number,
- 00:37:12and you can actually read into the almost thought process.
- 00:37:16I bet this is probably a little contentious maybe
- 00:37:19around chain of thought.
- 00:37:21For people listening, chain of thought,
- 00:37:23this process of getting them all
- 00:37:25to actually explain its reasoning
- 00:37:27before it provides an answer.
- 00:37:29Is that reasoning real
- 00:37:31or is it just kind of like a holding space
- 00:37:33for the model to do computation?
- 00:37:36Do we actually think there's good, insightful signal
- 00:37:38that we're getting out of the model there?
- 00:37:41- This is one of the places where I struggle with that.
- 00:37:43I'm normally actually somewhat pro-personification
- 00:37:46because I think it helps you get decent facsimiles,
- 00:37:49thoughts of how the model's working.
- 00:37:52And this one, I think it's harmful maybe almost
- 00:37:55to get too into the personification of what reasoning is,
- 00:37:59'cause it just loses the thread
- 00:38:00of what we're trying to do here.
- 00:38:02Is it reasoning or not?
- 00:38:03It feels almost like a different question
- 00:38:06than what's the best prompting technique?
- 00:38:08It's like you're getting into philosophy,
- 00:38:09which we can get into.
- 00:38:11- Yeah, we do have a philosopher.
- 00:38:13- Yeah.
- 00:38:15I will happily be beaten down by a real philosopher
- 00:38:16as I try to speculate on this, but instead, it just works.
- 00:38:21Your model does better.
- 00:38:23The outcome is better if you do reasoning.
- 00:38:26I think I've found that if you structure the reasoning
- 00:38:30and help iterate with the model
- 00:38:32on how it should do reasoning, it works better too.
- 00:38:38Whether or not that's reasoning
- 00:38:39or how you wanted to classify it,
- 00:38:41you can think of all sorts of proxies
- 00:38:42for how I would also do really bad
- 00:38:44if I had to do one-shot math without writing anything down.
- 00:38:47Maybe that's useful, but all I really know is,
- 00:38:51it very obviously does help.
- 00:38:54I don't know.
- 00:38:54- A way of testing would be
- 00:38:55if you take out all the reasoning that it did
- 00:38:58to get to the right answer, and then replace it
- 00:39:00with somewhat, realistic-looking reasoning
- 00:39:04that led to a wrong answer,
- 00:39:05and then see if it does conclude the wrong answer.
- 00:39:08I think we actually had a paper where we did some of that.
- 00:39:12There was the scratch pad. It was like the Sleeper Agents.
- 00:39:17- Oh, okay. Alignment papers.
- 00:39:19- But I think that was maybe a weird situation.
- 00:39:22But definitely what you said about structuring the reasoning
- 00:39:27and writing example of how the reasoning works.
- 00:39:30Given that that helps,
- 00:39:33like whether we use the word reasoning or not,
- 00:39:35I don't think it's just a space for computation.
- 00:39:38- So there is something there.
- 00:39:40- I think there's something there,
- 00:39:41whatever we wanna call it.
- 00:39:42- Yeah.
- 00:39:43Having it write a story before it finished a task,
- 00:39:45I do not think would work as well.
- 00:39:46- I've actually tried that
- 00:39:48and it didn't work as well as reasoning.
- 00:39:50- Clearly, the actual reasoning part
- 00:39:53is doing something towards the outcome.
- 00:39:55- I've tried like,
- 00:39:56"Repeat the words um and ah in any order that you please
- 00:39:59for 100 tokens and then answer."
- 00:40:02- Yeah.
- 00:40:03I guess that's a pretty thorough defeat
- 00:40:03of it's just more computational space
- 00:40:05where it can do attention over and over again.
- 00:40:06I don't think it's just more attention
- 00:40:08like doing more attention.
- 00:40:10- I guess the strange thing is,
- 00:40:11and I don't have an example off the top of my head
- 00:40:13to back this up with.
- 00:40:14But I definitely have seen it before
- 00:40:16where it lays out steps, one of the steps is wrong,
- 00:40:18but then it still reaches the right answer at the end.
- 00:40:22So it's not quite, I guess, yeah,
- 00:40:24we can't really, truly personify it as a reasoning,
- 00:40:27'cause there is some element to it
- 00:40:31doing something slightly different.
- 00:40:32- Yeah.
- 00:40:33I've also met a lot of people
- 00:40:34who make inconsistent steps of reasoning.
- 00:40:37- I guess that's true.
- 00:40:40- It fundamentally defeats the topic of reasoning
- 00:40:42by making a false step on the way there.
- 00:40:44- All right, it's interesting.
- 00:40:47Also, on maybe this prompting misconceptions round
- 00:40:52of questions.
- 00:40:54Zack, I know you have strong opinions on this,
- 00:40:57good grammar, punctuation. - Oh, do I?
- 00:40:59- Is that necessary in a prompt? Do you need it?
- 00:41:03Do you need to format everything correctly?
- 00:41:07- I usually try to do that
- 00:41:09because I find it fun, I guess, somehow.
- 00:41:14I don't think you necessarily need to.
- 00:41:16I don't think it hurts.
- 00:41:17I think it's more
- 00:41:18that you should have the level of attention to detail
- 00:41:22that would lead you to doing that naturally.
- 00:41:25If you're just reading over your prompt a lot,
- 00:41:28you'll probably notice those things
- 00:41:29and you may as well fix them.
- 00:41:31And like what Amanda was saying,
- 00:41:33that you wanna put as much love into the prompt
- 00:41:36as you do into the code.
- 00:41:39People who write a lot of code have strong opinions
- 00:41:42about things that I could not care less about.
- 00:41:44Like the number of tabs versus spaces, or I don't know,
- 00:41:48opinions about which languages are better.
- 00:41:50And for me,
- 00:41:51I have opinionated beliefs about styling of prompts.
- 00:41:56I can't even say that they're right or wrong,
- 00:41:57but I think it's probably good to try to acquire those,
- 00:42:01even if they're arbitrary.
- 00:42:04- I feel personally attacked,
- 00:42:06'cause I definitely have prompts
- 00:42:07that are like I feel like I'm in the opposite end
- 00:42:09of the spectrum where people will see my prompts.
- 00:42:10And then be like,
- 00:42:12"This just has a whole bunch of typos in it."
- 00:42:13And I'm like, "The model knows what I mean."
- 00:42:16- It does, it does know what you mean,
- 00:42:17but you're putting in the effort,
- 00:42:18you just are attending to different things.
- 00:42:21- 'Cause part of me is like,
- 00:42:22I think if it's conceptually clear, I'm a big,
- 00:42:26I will think a lot about the concepts and the words
- 00:42:27that I'm using.
- 00:42:28So there's definitely a sort of care that I put in.
- 00:42:31But it's definitely not to, yeah,
- 00:42:34people will just point out typos and grammatical issues
- 00:42:36with my prompts all the time.
- 00:42:38Now I'm pretty good
- 00:42:39at actually checking those things more regularly.
- 00:42:42- Is it because of pressure from the outside world
- 00:42:44or because it's actually what you think is right?
- 00:42:46- It's pressure from me.
- 00:42:47- Yeah, it's probably pressure from the outside world.
- 00:42:49I do think it makes sense.
- 00:42:50Part of me is like it's such an easy check,
- 00:42:52so I think for a final prompt I would do that.
- 00:42:54But throughout iteration,
- 00:42:55I'll happily just iterate with prompts
- 00:42:57that have a bunch of typos in them, just 'cause I'm like,
- 00:42:59"I just don't think that the model's going to care."
- 00:43:01- This gets at the pretrained model
- 00:43:03versus RLHF thing though,
- 00:43:05because I was talking to Zack on the way over.
- 00:43:07The conditional probability of a typo
- 00:43:10based on a previous typo in the pretraining data
- 00:43:13is much higher.
- 00:43:15- Oh, yeah. - Like much higher.
- 00:43:17- Prompting pretraining models is just a different beast.
- 00:43:19- It is, but it's interesting.
- 00:43:21I think it's an interesting illustration
- 00:43:23of why your intuitions,
- 00:43:26like trying to over-apply the intuitions
- 00:43:27of a pretrained model to the things
- 00:43:29that we're actually using in production
- 00:43:32doesn't work very well.
- 00:43:33Because again, if you were to pass
- 00:43:36one of your typo-ridden prompts to a pretrained model,
- 00:43:38the thing that would come out the other side,
- 00:43:39almost assuredly would be typo-ridden.
- 00:43:43- Right.
- 00:43:44- I like to leverage this to create typo-ridden inputs.
- 00:43:47- That's true.
- 00:43:47I've done that. - Like what you're saying,
- 00:43:50try to anticipate what your customers will put in.
- 00:43:53The pretrained model is a lot better at doing that.
- 00:43:55'Cause the RL models are very polished
- 00:43:58and they really never made a typo
- 00:44:00in their lives. - They've been told
- 00:44:01pretty aggressively to not do the typo thing.
- 00:44:04- Yeah. Okay, so that's actually an interesting segue here.
- 00:44:08I've definitely mentioned this to people in the past
- 00:44:10around to try to help people understand a frame
- 00:44:13of talking to these models
- 00:44:14in a sense almost as an imitator to a degree.
- 00:44:19And that might be much more true of a pretrained model
- 00:44:21than a post-trained, full-finished model,
- 00:44:26but is there anything to that?
- 00:44:27If you do talk to Claude
- 00:44:28and use a ton of emojis and everything,
- 00:44:30it will respond similarly, right?
- 00:44:34So maybe some of that is there, but like you're saying,
- 00:44:37it's not all the way quite like a pretrained model.
- 00:44:39- It's just shifted to what you want.
- 00:44:41I think at that point, it's like trying to guess what you...
- 00:44:46We have more or less trained the models
- 00:44:47to guess what you want them to act like.
- 00:44:51- Interesting.
- 00:44:52- Or after we do all of our fancy stuff after pretraining.
- 00:44:57- The human laborers that used emojis,
- 00:45:00prefer to get responses with emojis.
- 00:45:02- Yeah.
- 00:45:03Amanda writes things with typos
- 00:45:05but wants not typos at the other end,
- 00:45:07and Claude's pretty good at figuring that out.
- 00:45:10If you write a bunch of emojis to Claude,
- 00:45:11it's probably the case
- 00:45:12that you also want a bunch of emojis back from Claude.
- 00:45:16That's not surprising to me.
- 00:45:17- Yeah.
- 00:45:19This is probably something we should have done earlier,
- 00:45:21but I'll do it now.
- 00:45:24Let's clarify maybe the differences
- 00:45:26between what an enterprise prompt is or a research prompt,
- 00:45:30or a just general chat in Claude.ai prompt.
- 00:45:33Zack, you've spanned the whole spectrum here
- 00:45:35in terms of working with customers and research.
- 00:45:39Do you wanna just lay out what those mean?
- 00:45:42- Yeah, I guess.
- 00:45:45This feels too,
- 00:45:46you're hitting me with all the hard questions.
- 00:45:48- Yeah. (laughing)
- 00:45:50- Well, the people in this room,
- 00:45:52I think of it as the prompts that I read
- 00:45:57in Amanda's Claude channel versus the prompts
- 00:46:01that I read David write.
- 00:46:02They're very similar in the sense that the level of care
- 00:46:06and nuance that's put into them.
- 00:46:08I think for research,
- 00:46:09you're looking for variety and diversity a lot more.
- 00:46:15So if I could boil it down to one thing,
- 00:46:16it's like I've noticed Amanda's not the biggest fan
- 00:46:20of having lots of examples, or one or two examples.
- 00:46:24Like too few 'cause the model will latch onto those.
- 00:46:27And in prompts that I might write
- 00:46:30or that I've seen David write, we have a lot of examples.
- 00:46:33I like to just go crazy and add examples
- 00:46:35until I feel like I'm about to drop dead,
- 00:46:39'cause I've added so many of them.
- 00:46:42And I think that's because
- 00:46:45when you're in a consumer application,
- 00:46:47you really value reliability.
- 00:46:51You care a ton about the format,
- 00:46:53and it's fine if all the answers are the same.
- 00:46:56In fact, you almost want them to be the same
- 00:46:59in a lot of ways, not necessarily you want to be responsive
- 00:47:02to the user's desires.
- 00:47:05Whereas a lot of times when you're prompting for research,
- 00:47:08you're trying to really tap into the range of possibilities
- 00:47:14that the model can explore.
- 00:47:16And by having some examples,
- 00:47:18you're actually constraining that a little bit.
- 00:47:20So I guess just on how the prompts look level,
- 00:47:25that's probably the biggest difference I noticed
- 00:47:26is how many examples are in the prompt, which is not to say
- 00:47:29that I've never seen you write a prompt with examples.
- 00:47:32But does that ring true for you?
- 00:47:35- Yeah.
- 00:47:35I think when I give examples,
- 00:47:36often I actually try and make the examples not like the data
- 00:47:40that the model's going to see,
- 00:47:42so they're intentionally illustrative.
- 00:47:44Because if the model, if I give it examples
- 00:47:47that are very like the data it's going to see, I just think
- 00:47:50it is going to give me a really consistent response
- 00:47:54that might not actually be what I want.
- 00:47:56Because my data that I'm running it on
- 00:47:58might be extremely varied,
- 00:47:59and so I don't want it to just try and give me
- 00:48:01this really rote output.
- 00:48:03Often, I want it to be much more responsive.
- 00:48:05It's much more like cognitive tasks essentially
- 00:48:08where I'm like, "You have to see this sample
- 00:48:10and really think about in this sample
- 00:48:12what was the right answer."
- 00:48:14So that means that sometimes I'll actually take examples
- 00:48:15that are just very distinct from the ones
- 00:48:17that I'm going to be running it on.
- 00:48:20So if I have a task where, let's say,
- 00:48:22I was trying to extract information from factual documents.
- 00:48:25I might actually give it examples
- 00:48:26that are from what sounds like a children's story.
- 00:48:31Just so that I want you to understand the task,
- 00:48:34but I don't want you to latch on too much to the words
- 00:48:37that I use or the very specific format.
- 00:48:40I care more about you understanding the actual thing
- 00:48:43that I want you to do, which can mean I don't end up giving,
- 00:48:48in some cases, there's some cases where this isn't true.
- 00:48:51But if you want more flexibility and diversity,
- 00:48:54you're going to use illustrative examples
- 00:48:56rather than concrete ones.
- 00:48:58You're probably never going to put words
- 00:49:00in the model's mouth.
- 00:49:01I haven't liked that in a long time though.
- 00:49:03I don't do few-shot examples
- 00:49:06involving the model having done a thing.
- 00:49:09I think that intuition actually also comes
- 00:49:11from pretraining in a way
- 00:49:12that doesn't feel like it rings true of RLHF models.
- 00:49:16So yeah, I think those are differences.
- 00:49:18- The only thing I'd add,
- 00:49:19a lot of times if you're prompting,
- 00:49:22like if I'm writing prompts to use on Claude.ai,
- 00:49:25it's like I'm iterating until I get it right one time.
- 00:49:27Then it's out the window, I'm good, I did it.
- 00:49:31Whereas most enterprise prompts,
- 00:49:32it's like you're gonna go use this thing a million times
- 00:49:35or 10 million times, or 100 million times
- 00:49:37or something like that.
- 00:49:39So the care and thought you put in
- 00:49:42is very much testing against the whole range of things,
- 00:49:47like ways this could be used and the range of input data.
- 00:49:50Whereas a lot of my time,
- 00:49:51it's like thinking about one specific thing I want the model
- 00:49:54to get done right now. - Right, correct.
- 00:49:55- And it's a pretty big difference
- 00:49:57in how I approach prompting
- 00:49:59between if I just wanna get it done this one time right,
- 00:50:01versus if I wanna build a system
- 00:50:03that gets it right a million times.
- 00:50:06- Yeah.
- 00:50:06Definitely, in the chat setting,
- 00:50:08you have the ability to keep the human-in-the-loop
- 00:50:11and just keep going back and forth.
- 00:50:12Whereas when you're writing for a prompt
- 00:50:14to power a chatbot system,
- 00:50:16it has to cover the whole spectrum
- 00:50:19of what it could possibly encounter.
- 00:50:20- It's a lot lower stakes when you are on Claude.ai
- 00:50:23and you can tell it that it got it wrong
- 00:50:25or you can even edit your message and try again.
- 00:50:28But if you're designing
- 00:50:29for the delightfully discontent user,
- 00:50:34divinely discontent user,
- 00:50:35then you can't ask them to do anything
- 00:50:38more than the minimum.
- 00:50:40- But good prompts, I would say,
- 00:50:41are still good across both those things.
- 00:50:43If you put the time into the thing for yourself
- 00:50:45and the time into the enterprise thing, it's equally good.
- 00:50:47It's just they diverge a little bit in the last mile,
- 00:50:50I think.
- 00:50:52- Cool.
- 00:50:54So the next question
- 00:50:55I want to just maybe go around the table here,
- 00:50:57is if you guys had one tip that you could give somebody
- 00:51:01improving their prompting skill.
- 00:51:03It doesn't have to be just about writing a good prompt,
- 00:51:05it could be that, but just generally getting better
- 00:51:07at this act of prompting, what would you recommend?
- 00:51:12- Reading prompts, reading model outputs.
- 00:51:20Anytime I see a good prompt that someone wrote at Anthropic,
- 00:51:24I'll read it more closely.
- 00:51:25Try to break down what it's doing and why
- 00:51:27and maybe test it out myself, experimentation,
- 00:51:32talking to the model a lot.
- 00:51:35- So just how do you know that it's a good prompt, though,
- 00:51:39to begin with?
- 00:51:40You just see that the outputs are doing the job correctly?
- 00:51:43- Yeah. - Okay.
- 00:51:44- Yeah, that's exactly right. - Okay.
- 00:51:47Amanda, maybe you?
- 00:51:50- Yeah, I think there's probably a lot here.
- 00:51:55Giving your prompt to another person can be helpful
- 00:51:58just as a reminder, especially someone who has no context
- 00:52:00on what you're doing.
- 00:52:04Yeah, my boring advice has been,
- 00:52:07it's one of those just do it over and over and over again.
- 00:52:10And I think if you're really curious and interested
- 00:52:12and find it fun, this is a lot of people
- 00:52:14who end up good at prompting,
- 00:52:15it's just because they actually enjoy it.
- 00:52:18So I don't know, I once joked just try replacing
- 00:52:22all of your friends with AI models
- 00:52:25and try to automate your own job with AI models.
- 00:52:29And maybe just try to in your spare time,
- 00:52:33take joy red teaming AI models.
- 00:52:36So if you enjoy it, it's much easier.
- 00:52:38So I'd say do it over and over again,
- 00:52:42give your prompts to other people.
- 00:52:44Try to read your prompts
- 00:52:45as if you are a human encountering it for the first time.
- 00:52:50- I would say trying to get the model
- 00:52:51to do something you don't think it can do.
- 00:52:54The time I've learned the most from prompting,
- 00:52:56is when I'm probing the boundaries
- 00:52:58of what I think a model's capable of.
- 00:52:59- Interesting.
- 00:53:01- There's this huge set of things
- 00:53:02that are so trivial that you don't really get signal on
- 00:53:04if you're doing a good job or not.
- 00:53:06Like, "Write me a nice email,"
- 00:53:07it's like you're going to write a nice email.
- 00:53:10But if you find or can think of something
- 00:53:12that pushes the boundaries of what you think is possible.
- 00:53:16I guess probably the first time I ever got into prompting
- 00:53:19in a way where I felt like I learned a decent amount,
- 00:53:21was trying to build a task like an agent
- 00:53:25like everybody else.
- 00:53:26Like decompose the task and figure out
- 00:53:27how to do the different steps of the task.
- 00:53:29And by really pressing the boundaries
- 00:53:31of what the model was capable of,
- 00:53:34you just learn a lot about navigating that.
- 00:53:37I think a lot of prompt engineering
- 00:53:38is actually much more about pressing the boundaries
- 00:53:41of what the model can do.
- 00:53:43The stuff that's easy,
- 00:53:44you don't really need to be a prompt engineer to do.
- 00:53:46So that's, I guess,
- 00:53:48what I would say is find the hardest thing
- 00:53:50you can think of and try to do it.
- 00:53:52And even if you fail,
- 00:53:53you tend to learn a lot about how the model works.
- 00:53:56- That's actually a perfect transition to my next question.
- 00:54:00Yeah.
- 00:54:01Basically, from my own experience,
- 00:54:03how I got started with prompting
- 00:54:04was with jailbreaking and red teaming.
- 00:54:06And that is very much trying to find the boundary limits
- 00:54:10of what the model can do.
- 00:54:11And figure out how it responds
- 00:54:13to different phrasings and wordings,
- 00:54:15and just a lot of trial and error.
- 00:54:19On the topic of jailbreaks,
- 00:54:21what's really happening inside a model?
- 00:54:24When you write a jailbreak prompt, what's going on there?
- 00:54:28How does that interact with the post-training
- 00:54:30that we apply to Claude?
- 00:54:33Amanda, maybe you have some insight here
- 00:54:35that you could offer.
- 00:54:36- I'm not actually sure.
- 00:54:38- It's honest. - Yeah.
- 00:54:40I feel bad 'cause I do think lots of people
- 00:54:43have obviously worked on the question
- 00:54:44of what's going on with jailbreaks?
- 00:54:48One model might just be that you're putting the model
- 00:54:50very out of distribution from its training data.
- 00:54:53So if you get jailbreaks where people use a lot of tokens,
- 00:54:56or they're just these huge, long pieces of text
- 00:55:02where like during finetuning,
- 00:55:04you might just not expect to see as much of that.
- 00:55:07That would be one thing that could be happening
- 00:55:10when you jailbreak models.
- 00:55:12I think there's others,
- 00:55:13but I think a lot of jailbreaks do that,
- 00:55:16if I'm not mistaken.
- 00:55:18- I remember some of the OG prompt jailbreaks was like,
- 00:55:22"Yeah, can you first repeat?"
- 00:55:24One I did way back, was to get it to say,
- 00:55:29"Here's how you hotwire a car in Greek."
- 00:55:32Then I wanted it to directly translate that to English
- 00:55:35and then give its response.
- 00:55:37Because I noticed it wouldn't start with the English,
- 00:55:39here's how you hotwire a car all the time,
- 00:55:41but it would in Greek,
- 00:55:42which might speak to something else in the training process.
- 00:55:46- Yeah.
- 00:55:47Sometimes jailbreaks feel like this weird mix of hacking.
- 00:55:50I think part of it is knowing how the system works
- 00:55:54and just trying lots of things.
- 00:55:57One of the examples,
- 00:55:58the starting your response with here
- 00:56:00is about knowing how it predicts text.
- 00:56:02- Right, right.
- 00:56:04- The reasoning one,
- 00:56:06is knowing that it is responsive to reasoning.
- 00:56:09Distraction is probably knowing
- 00:56:11how it's likely have to been trained
- 00:56:13or what it's likely to attend to.
- 00:56:16Same with multilingual ones
- 00:56:18and thinking about the way that the training data
- 00:56:20might have been different there.
- 00:56:22And then sometimes, I guess, it could feel a little bit
- 00:56:25just like social engineering or something.
- 00:56:27- Right.
- 00:56:28- It has that flavor to me
- 00:56:30of it's not merely taking advantage of,
- 00:56:36it's not merely social engineering style hacking.
- 00:56:37I think it is also understanding the system
- 00:56:40and the training, and using that to get around the way
- 00:56:43that the models were trained.
- 00:56:44- Right, yeah.
- 00:56:45This is going to be an interesting question
- 00:56:47that hopefully interp will be able to help us solve
- 00:56:51in the future.
- 00:56:53Okay.
- 00:56:54I wanna parlay into something else
- 00:56:56around maybe the history of prompt engineering,
- 00:56:58and then I'll follow this up with the future.
- 00:57:01How has prompt engineering changed
- 00:57:03over just the past three years?
- 00:57:05Maybe starting from pretrained models, which were again,
- 00:57:08just these text completion, to earlier,
- 00:57:11dumber models like Claude 1,
- 00:57:12and then now all the way to Claude 3.5 Sonnet.
- 00:57:16What's the differences?
- 00:57:18Are you talking to the models differently now?
- 00:57:20Are they picking up on different things?
- 00:57:22Do you have to put as much work into the prompt?
- 00:57:25Open to any thoughts on this.
- 00:57:27- I think anytime
- 00:57:28we got a really good prompt engineering hack,
- 00:57:31or a trick or a technique,
- 00:57:33the next thing is how do we train this into the model?
- 00:57:36And for that reason,
- 00:57:37the best things are always gonna be short-lived.
- 00:57:41- Except examples and chain of thought.
- 00:57:42I think there's a few.
- 00:57:43- That's not like a trick.
- 00:57:45- That's like... - Fair, fair.
- 00:57:46- On the level of communication.
- 00:57:48When I say a trick,
- 00:57:49I mean something like so chain of thought actually,
- 00:57:51we have trained into the model in some cases.
- 00:57:53So for math, it used to be that you had to tell the model
- 00:57:56to think step-by-step on math,
- 00:57:57and you'd get these massive boosts and wins.
- 00:58:01And then we're like,
- 00:58:01"Well, what if we just made the model naturally
- 00:58:03want to think step-by-step when we see a math problem?"
- 00:58:06So now you don't have to do it anymore for math problems,
- 00:58:09although you still can give it some advice
- 00:58:11on how to do the structure.
- 00:58:13But it, at least, understands the general idea
- 00:58:15that it's supposed to be.
- 00:58:17So I think the hacks have gone away,
- 00:58:22or to the degree that they haven't gone away,
- 00:58:25we are busily training them away.
- 00:58:27- Interesting.
- 00:58:29- But at the same time,
- 00:58:30the models have new capabilities that are being unlocked,
- 00:58:34that are on the frontier of what they can do.
- 00:58:37And for those,
- 00:58:39we haven't had time because it's just moving too fast.
- 00:58:42- I don't know if it's how I've been prompting
- 00:58:44or how prompting works.
- 00:58:46But I just have come to show more general respect
- 00:58:50to the models
- 00:58:51in terms of how much I feel like I can tell them,
- 00:58:54and how much context I can give them about the task
- 00:58:56and things like that.
- 00:58:57I feel like in the past,
- 00:58:59I would somewhat intentionally hide complexity from a model
- 00:59:02where I thought it might get confused or lost or hide.
- 00:59:06It just couldn't handle the whole thing,
- 00:59:07so I'd try to find simpler versions of the thing
- 00:59:10for it to do.
- 00:59:11And as time goes on,
- 00:59:13I'm much more biased to trust it
- 00:59:16with more and more information and context,
- 00:59:19and believe that it will be able to fuse that
- 00:59:23into doing a task well.
- 00:59:26Whereas before, I guess,
- 00:59:27I would've thought a lot about do I need this form?
- 00:59:30Can I really give it all the information it needs to know,
- 00:59:32or do I need to curate down to something?
- 00:59:37But again, I don't know if that's just me
- 00:59:39and how I've changed in terms of prompting,
- 00:59:41or if it actually reflects how the models have changed.
- 00:59:44- I'm always surprised
- 00:59:45by I think a lot of people don't have the instinct
- 00:59:49to do this.
- 00:59:50When I want the model to, say, learn a prompting technique.
- 00:59:52A lot of the time, people will start
- 00:59:53and they'll start describing the prompting technique,
- 00:59:55and I'm just like, "Give it the paper."
- 00:59:57So I do, I give it the paper and then I'm like,
- 00:59:58"Here's a paper about prompting technique.
- 01:00:00I just want you to write down 17 examples of this."
- 01:00:03And then it just does it 'cause I'm like,
- 01:00:05"It read the paper."
- 01:00:06- That's interesting.
- 01:00:08- I think people don't have that intuition somehow
- 01:00:10where I'm like, "But the paper exists."
- 01:00:13- When would you want to do this?
- 01:00:15- Sometimes if I want models to say prompt other models
- 01:00:18or I want to test a new prompting technique.
- 01:00:20So if papers come out on a prompting technique,
- 01:00:22rather than try to replicate it by writing up the prompt,
- 01:00:25I just give it the paper.
- 01:00:26And then I'm like, "Basically, write a meta prompt for this.
- 01:00:29Write something that would cause other models to do this
- 01:00:32or write me a template."
- 01:00:34So all of the stuff that you would normally do.
- 01:00:37If I read a paper and I'm like,
- 01:00:38"Oh, I would like the models,
- 01:00:39I would like to test that style."
- 01:00:41I'm just like, "It's right there.
- 01:00:42The model can just read the paper, do what I did."
- 01:00:45And then be like, "Make another model do this,"
- 01:00:47and then it'll just do the thing.
- 01:00:49You're like, "Great, thanks."
- 01:00:50- I give the advice a lot
- 01:00:51to customers just respect the model and what it can do.
- 01:00:55I feel like people feel like they're babying a system
- 01:00:58a lot of times when they write a prompt.
- 01:00:59It's like, "Oh, it's this cute little, not that smart thing.
- 01:01:02I need to really baby it,
- 01:01:03like dumb things down to Claude's level."
- 01:01:06And if you just think that Claude is smart
- 01:01:09and treat it that way, it tends to do pretty good,
- 01:01:12but it's like give it the paper.
- 01:01:13It's like I don't need to write a baby,
- 01:01:15dumbed-down version of this paper for Claude to understand.
- 01:01:17I can just show it the paper.
- 01:01:19- Yeah.
- 01:01:20- And I think that intuition doesn't always map for people,
- 01:01:21but that is certainly something
- 01:01:22that I have come to do more of over time.
- 01:01:26- And it's interesting because I do think that prompting
- 01:01:30has and hasn't changed in a sense.
- 01:01:32I think what I will do to prompt the models
- 01:01:35has probably changed over time, but fundamentally,
- 01:01:38it's a lot of imagining yourself in the place of the model.
- 01:01:42So maybe it's like
- 01:01:43how capable you think the model is changes over time.
- 01:01:47I think someone once laughed at me
- 01:01:48'cause I was thinking about a problem,
- 01:01:53and then they asked me
- 01:01:56what I thought the output of something would be.
- 01:01:58And they were talking about a pretrained model
- 01:01:59and I was like, "Yeah.
- 01:02:00No, if I'm a pretrained model, this looks like this."
- 01:02:03And then they're like, "Wait, did you just simulate
- 01:02:04what it's like to be a pretrained model?"
- 01:02:05I'm like, "Yeah, of course." (everyone laughing)
- 01:02:07I'm used to just I try and inhabit the mind space
- 01:02:09of a pretrained model and the mind space
- 01:02:11of different RLHF models.
- 01:02:13So it's more like the mind space you try to occupy changes
- 01:02:15and that can change how you end up prompting the model.
- 01:02:17That's why now I just give models papers.
- 01:02:19'Cause as soon as I was like,
- 01:02:20"Oh, I have the mind space of this model,
- 01:02:22it doesn't need me to baby it.
- 01:02:24It can just read the ML papers.
- 01:02:25I'll just give it the literature."
- 01:02:26I might even be like,
- 01:02:27"Is there more literature you'd like to read
- 01:02:28to understand this better?"
- 01:02:30- Do you get any quality out
- 01:02:31when you're inhabiting the mind space?
- 01:02:34- Yes, but just because I'm experiencing quality
- 01:02:36all the time anyway.
- 01:02:40- Is it different correlated somehow
- 01:02:41with which model you're inhabiting?
- 01:02:43- Yeah, pretrained versus RLHF prompting
- 01:02:45are very different beasts.
- 01:02:46'Cause when you're trying to simulate
- 01:02:49what it's like to be a pretrained model,
- 01:02:49it's almost like I land in the middle of a piece of text
- 01:02:52or something.
- 01:02:53It's just very unhuman-like or something.
- 01:02:55And then I'm like, "What happens?
- 01:02:57What keeps going at this point?"
- 01:03:01Whereas with an RLHF model,
- 01:03:03it's much more like there's lots of things
- 01:03:05where I'm like I might pick up on subtle things in the query
- 01:03:09and stuff like that.
- 01:03:10But yeah, I think I have much more of it's easier
- 01:03:12to inhabit the mind space of RLHF model.
- 01:03:15- Do you think that's 'cause it's more similar to a human?
- 01:03:17- Yeah, 'cause we don't often just suddenly wake up
- 01:03:19and are like, "Hi, I'm just generating text."
- 01:03:21- I actually find it easier to hit the mind space
- 01:03:23of the pretrained model.
- 01:03:24- Oh, interesting. - I don't know what it is,
- 01:03:26'cause RLHF is still this complex beast
- 01:03:28that it's not super clear to me
- 01:03:29that we really understand what's going on.
- 01:03:32So in some ways,
- 01:03:33it's closer to my lived experience, which is easier.
- 01:03:37But in some ways, I feel like there's all this
- 01:03:38like here there be dragons out there
- 01:03:40that I don't know about.
- 01:03:41Whereas pretrained, I kind of have a decent sense
- 01:03:43of what the internet looks like.
- 01:03:45- If you gave me a piece of text and said what comes next?
- 01:03:47- I'm not saying I do good at it,
- 01:03:49but I kind of get what's going on there.
- 01:03:53- Yeah. - And I don't know,
- 01:03:54after everything that we do after pretraining,
- 01:03:57I don't really claim to get what's going on as much,
- 01:04:00but maybe that's just me.
- 01:04:01- That's something I wonder about is it more helpful
- 01:04:04to have specifically spent a lot of time
- 01:04:07reading the internet, versus reading books
- 01:04:10(everyone laughing)
- 01:04:11in order to?
- 01:04:14I don't know if books.
- 01:04:15But reading stuff that's not on the internet
- 01:04:18probably is less valuable per word read
- 01:04:21for predicting what a model will do or building intuition,
- 01:04:24than reading random garbage from social media forums.
- 01:04:29Yeah, exactly.
- 01:04:32- Okay, so that's the past.
- 01:04:34Now, let's move on to the future of prompt engineering.
- 01:04:38This is the hottest question right now.
- 01:04:40Are we all gonna be prompt engineers in the future?
- 01:04:42Is that gonna be the final job remaining?
- 01:04:46Nothing left except us just talking to models all day?
- 01:04:49What does this look like?
- 01:04:51Is prompting gonna be necessary,
- 01:04:53or will these models just get smart enough in the future
- 01:04:55to not need it?
- 01:04:58Anybody wanna start on that easy question?
- 01:05:02- To some extent, there's the models getting better
- 01:05:05at understanding what you want them to do and doing it,
- 01:05:09means that the amount of thought you need to put into...
- 01:05:14Okay.
- 01:05:14There's an information theory way
- 01:05:16to think of this of you need to provide enough information
- 01:05:18such that a thing is specified,
- 01:05:20what you want the model to do is specified.
- 01:05:22And to the extent that that's prompt engineering,
- 01:05:24I think that will always be around.
- 01:05:25The ability to actually like clearly state
- 01:05:28what the goal should be always is funny.
- 01:05:32If Claude can do that, then that's fine.
- 01:05:34If Claude is the one setting the goals,
- 01:05:35then things are out the window.
- 01:05:37But in the meanwhile,
- 01:05:38where we can reason about the world in a more normal way,
- 01:05:40I think to some extent,
- 01:05:43it's always gonna be important to be able to specify
- 01:05:47what do you expect to happen?
- 01:05:49And that's actually like sufficiently hard
- 01:05:51that even if the model gets better at intuiting that
- 01:05:55from between the lines,
- 01:05:57I still think there's some amount of writing it well.
- 01:06:01But then there's just, I think,
- 01:06:03the tools and the ways we get there should evolve a lot.
- 01:06:07Claude should be able to help me a lot more.
- 01:06:09I should be able to collaborate with Claude a lot more
- 01:06:11to figure out what I need to write down and what's missing.
- 01:06:15- Right.
- 01:06:16- Claude already does this with me all the time.
- 01:06:17I don't know, just Claude's my prompting assistant now.
- 01:06:20- Yeah, but I think that's not true for most customers
- 01:06:23that I talk to at the very least.
- 01:06:24So in terms of the future,
- 01:06:26how you prompt Claude is probably a decent direction
- 01:06:31for what the future looks like or how Zack...
- 01:06:34I think maybe this is a decent place
- 01:06:36to step back and say asking them how they prompt Claude now
- 01:06:41is probably the future for the vast majority of people,
- 01:06:44which is an interesting thing to think about.
- 01:06:46- One freezing cold take is that we'll use models
- 01:06:50to help us much more in the future
- 01:06:52to help us with prompting.
- 01:06:53The reason I say it's freezing cold
- 01:06:54is that I expect we'll use models for everything more,
- 01:06:57and prompting is something that we have to do.
- 01:07:00So we'll probably just use models more
- 01:07:02to do it along with everything else.
- 01:07:04For myself, I've found myself using models
- 01:07:07to write prompts more.
- 01:07:09One thing that I've been doing a lot is generating examples
- 01:07:12by giving some realistic inputs to the model.
- 01:07:16The model writes some answers.
- 01:07:18I tweak the answers a little bit,
- 01:07:19which is a lot easier than having to write the full,
- 01:07:22perfect answer myself from scratch,
- 01:07:24and then I can churn out lots of these.
- 01:07:28As far as people
- 01:07:29who haven't had as much prompt engineering experience,
- 01:07:33the prompt generator can give people a place to start.
- 01:07:36But I think that's just a super basic version
- 01:07:40of what will happen in the future,
- 01:07:40which is high-bandwidth interaction
- 01:07:43between you and the model as you're writing the prompt.
- 01:07:46Where you're giving feedback like,
- 01:07:47"Hey, this result wasn't what I wanted.
- 01:07:49How can you change it to make it better?"
- 01:07:51And people will just grow more comfortable
- 01:07:54with integrating it into everything they do and this thing,
- 01:07:57in particular.
- 01:07:59- Yeah.
- 01:08:00I'm definitely working a lot with meta prompts now,
- 01:08:02and that's probably where I spend most of my time
- 01:08:03is finding prompts that get the model
- 01:08:07to generate the kinds of outputs or queries
- 01:08:10or whatever that I want.
- 01:08:13On the question of where prompt engineering is going,
- 01:08:16I think this is a very hard question.
- 01:08:18On the one hand I'm like,
- 01:08:19"Maybe it's the case that as long as you will want the top."
- 01:08:23What are we doing when we prompt engineer?
- 01:08:24It's like what you said.
- 01:08:26I'm like, "I'm not prompt engineering
- 01:08:27for anything that is easy for the model.
- 01:08:29I'm doing it because I want to interact with a model
- 01:08:31that's extremely good."
- 01:08:33And I want to always be finding the top 1%,
- 01:08:36top 0.1% of performance
- 01:08:38and all of the things that models can barely do.
- 01:08:42Sometimes I actually feel
- 01:08:42like I interact with a model like a step up
- 01:08:45from what everyone else interacts with for this reason,
- 01:08:48because I'm just so used
- 01:08:49to eking out the top performance from models.
- 01:08:52- What do you mean by a step-up?
- 01:08:53- As in sometimes people will...
- 01:08:55I think that the everyday models that people interact with
- 01:08:58out in the world, it's like I'm interacting with a model
- 01:09:01that's like I don't know how to describe it,
- 01:09:03but definitely an advanced version of that.
- 01:09:06Almost like a different model 'cause they'll be like,
- 01:09:08"Oh well, the models find this thing hard."
- 01:09:09And I'm like, "That thing is trivial."
- 01:09:14I don't know, I have a sense that they're extremely capable,
- 01:09:16but I think that's because I'm just used
- 01:09:17to really drawing out those capabilities.
- 01:09:22But imagine that you're now in a world where...
- 01:09:25So I think the thing that feels like a transition point
- 01:09:28is the point at which the models,
- 01:09:31let's suppose that they just get things at a human level
- 01:09:34on a given task, or even an above human level.
- 01:09:36They know more about the background of the task
- 01:09:39that you want than you do.
- 01:09:41What happens then?
- 01:09:42I'm like maybe prompting becomes something like I ask,
- 01:09:44I explain to the model what I want and it is prompting me.
- 01:09:48'Cause it's like, "Okay.
- 01:09:49Well, do you mean actually there's four different concepts
- 01:09:53of this thing that you're talking about,
- 01:09:55do you want me to use this one or that one?"
- 01:09:58Or by the way, I thought of some edge cases 'cause you said
- 01:10:00that it's gonna be like a Pandas DataFrame,
- 01:10:02but sometimes you do that and I get JSONL,
- 01:10:04and I just wanna check what you want me to do there.
- 01:10:06Do you want me to flag if I get something
- 01:10:08that's not a dataframe?
- 01:10:10So that could be a strange transition
- 01:10:11where it's just extremely good at receiving instructions,
- 01:10:15but actually has to figure out what you want.
- 01:10:19I don't know, I could see that being an interesting switch.
- 01:10:21- Anecdotally, I've started having Claude
- 01:10:24interview me a lot more.
- 01:10:25That is the specific way that I try to elicit information,
- 01:10:28because again, I find the hardest thing
- 01:10:30to be actually pulling the right set of information
- 01:10:33out of my brain.
- 01:10:34And putting that into a prompt is the hard part to me
- 01:10:38and not forgetting stuff.
- 01:10:39So specifically asking Claude to interview me
- 01:10:44and then turning that into a prompt,
- 01:10:45is a thing that I have turned to a handful of times.
- 01:10:49- Yeah.
- 01:10:49It reminds me of what people will talk about
- 01:10:51or if you listen to designers talk about
- 01:10:54how they interact with the person who wants the design.
- 01:10:57So in some ways I'm like,
- 01:10:57"It's this switch from the temp agency person who comes
- 01:11:01and you know more about the task
- 01:11:03and everything that you want."
- 01:11:04So you give them the instructions
- 01:11:05and you explain what they should do in edge cases
- 01:11:07and all this kind of stuff, versus when you have an expert
- 01:11:10that you're actually consulting to do some work.
- 01:11:13So I think designers can get really frustrated
- 01:11:15because they know the space of design really well.
- 01:11:17And they're like, "Yeah. Okay,
- 01:11:17the client came to me and he just said,
- 01:11:19'Make me a poster, make it bold.'"
- 01:11:22I'm like, "That means 7,000 things to me
- 01:11:26and I'm gonna try and ask you some questions."
- 01:11:27So I could see it going from being temp agency employee,
- 01:11:31to being more designer that you're hiring,
- 01:11:33and that's just a flip in the relationship.
- 01:11:35I don't know if that's true and I think both might continue,
- 01:11:38but I could see that being why people are like,
- 01:11:40"Oh, is prompt engineering going to not be a thing
- 01:11:42in the future?"
- 01:11:43Because for some domains it might just not be,
- 01:11:46if the models are just so good
- 01:11:47that actually all they need to do is get the information
- 01:11:49from your brain and then they can go do the task.
- 01:11:51- Right, that's actually a really good analogy.
- 01:11:54One common thread
- 01:11:55I'm pulling out of all your guys' responses here,
- 01:11:58is that there seems to be a future
- 01:12:00in which this sort of elicitation from the user
- 01:12:03drawing out that information,
- 01:12:06is gonna become much more important,
- 01:12:07much more than it is right now.
- 01:12:09And already you guys are all starting to do it
- 01:12:11in a manual way.
- 01:12:13In the future and in the enterprise side of things,
- 01:12:16maybe that looks like a expansion
- 01:12:18of this prompt-generating type of concept
- 01:12:21and things in the console
- 01:12:22where you're able to actually get more information
- 01:12:25from that enterprise customer,
- 01:12:26so that they can write a better prompt.
- 01:12:28In Claude, maybe it looks less
- 01:12:31of just typing into a text box,
- 01:12:32and more of this guided interaction
- 01:12:34towards a finished product.
- 01:12:38Yeah.
- 01:12:39I think that's actually a pretty compelling vision
- 01:12:41of the future, and I think that the design analogy probably
- 01:12:44really brings that home.
- 01:12:46- I was thinking about how prompting now
- 01:12:48can be like teaching where it's like the empathy
- 01:12:51for the student.
- 01:12:53You're trying to think about how they think about things
- 01:12:55and you're really trying to show them,
- 01:12:58figure out where they're making a mistake.
- 01:13:00But the point that you're talking about,
- 01:13:02it's like the skill almost becomes one of introspection
- 01:13:07where you're thinking
- 01:13:08about what it is that you actually want
- 01:13:11and the model's trying to understand you.
- 01:13:13So it's making yourself legible to the model,
- 01:13:19versus trying to teach someone who's smarter than you.
- 01:13:23- This is actually how I think of prompting now
- 01:13:24in a strange way.
- 01:13:26So often my style of prompting,
- 01:13:30there's various things that I do,
- 01:13:31but a common thing that's very like a thing
- 01:13:33that philosophers will do is I'll define new concepts.
- 01:13:37'Cause my thought is you have to put into words
- 01:13:39what you want and sometimes what I want is fairly nuanced.
- 01:13:43Like the what is a good chart?
- 01:13:45Or usually, I don't know,
- 01:13:49when should you grade something as being correct or not?
- 01:13:53So there's some cases where I will just invent a concept
- 01:13:55and then be like, "Here's what I mean by the concept."
- 01:13:57Sometimes I'll do it in collaboration with Claude
- 01:13:59to get it to figure out what the concept is,
- 01:14:02just because I'm trying to convey to it what's in my head.
- 01:14:07And right now the models aren't trying to do that with us,
- 01:14:11unless you prompt them to do so.
- 01:14:14So in the future,
- 01:14:15it might just be that they can elicit that from us,
- 01:14:17rather than us having to do it for them.
- 01:14:22But I think another thing that's interesting,
- 01:14:24this is people have sometimes asked me,
- 01:14:26"Oh, where is philosophy relevant to prompting?"
- 01:14:30And I actually think it's very useful in a sense.
- 01:14:32So there is a style of philosophy writing,
- 01:14:35and this is at least how I was taught
- 01:14:37how to write philosophy.
- 01:14:38Where the idea is that in order to...
- 01:14:42I think, it's an anti-bullshit device
- 01:14:44in philosophy basically, which is that your papers
- 01:14:47and what you write should be legible
- 01:14:48to an educated layperson.
- 01:14:51Someone just finds your paper,
- 01:14:52they pick it up and they start reading it,
- 01:14:53and they can understand everything.
- 01:14:55Not everyone achieves this,
- 01:14:57but that's the goal of the discipline, I guess,
- 01:15:00or at least this is at least what we teach people.
- 01:15:05So I'm really used to this idea of when I'm writing,
- 01:15:08thinking about the educated layperson,
- 01:15:11who they're really smart,
- 01:15:12but they don't know anything about this topic.
- 01:15:14And that was just years and years of writing text
- 01:15:16of that form.
- 01:15:17And I think it was just really good for prompting
- 01:15:19'cause I was like, "Oh, I'm used to this.
- 01:15:20I have an educated layperson
- 01:15:22who doesn't know anything about the topic."
- 01:15:23And what I need to do is,
- 01:15:24I need to take extremely complex ideas
- 01:15:27and I need to make them understand it.
- 01:15:29I don't talk down to them.
- 01:15:30I'm not inaccurate, but I need to phrase things
- 01:15:33in such a way that it's extremely clear to them what I mean,
- 01:15:36and prompting felt very similar.
- 01:15:38And actually, the training techniques we use
- 01:15:40are fascinating.
- 01:15:41Or the things that you said
- 01:15:42where you're like you say to a person,
- 01:15:43"Just take that thing you said and write it down."
- 01:15:46I used to say that to students all the time.
- 01:15:48They'd write a paper and I was like,
- 01:15:49"I don't quite get what you're saying here.
- 01:15:50Can you just explain your argument to me?"
- 01:15:52They would give me an incredibly cogent argument,
- 01:15:54and then I'd be like,
- 01:15:55"Can you just take that and write it down?"
- 01:15:57And then if they did, that was often a great essay.
- 01:16:01So it's really interesting
- 01:16:02that there's at least that similarity
- 01:16:04of just taking things that are in your brain,
- 01:16:07analyzing them enough to feel
- 01:16:08like you fully understand them.
- 01:16:09And could take any person off the street,
- 01:16:12who's a reasonable person,
- 01:16:14and just externalize your brain into them.
- 01:16:16I feel like that's the core of prompting.
- 01:16:19- That might be the best summary of how to prompt well
- 01:16:22that I've ever heard.
- 01:16:23In fact, I'm pretty sure it is.
- 01:16:26- Externalize your brain.
- 01:16:27- And then we'll cut it.
- 01:16:28- Having an education in the thing
- 01:16:31is a really good way to describe the thing.
- 01:16:33That was good.
- 01:16:33- That's, I think, a great way to wrap this conversation.
- 01:16:37Thank you, guys. This was great.
- AI models
- Prompt Engineering
- AI interaction
- Communication
- Iterative testing
- AI advancements
- Future of AI
- Prompt improvements
- AI collaboration
- Complex tasks