2 Years of LLM Advice in 35 Minutes (Sully Omar Interview)
الملخص
TLDRIn a detailed interview with the CEO of Cognus, the use and categorization of AI language models are explored. The speaker, Soy Omar, explains the different tiers of language models based on their intelligence and cost, offering insight into how to choose the right model for specific tasks, such as coding, summarizing documents, and generating prompts. He discusses the process and benefits of integrating multiple AI models to leverage their unique strengths and weaknesses. Soy also touches on the importance of model distillation, where larger models are refined to execute tasks better in smaller, faster, and more cost-efficient models. He shares his hands-on experience with model evaluation, stressing the necessity of understanding their nuanced differences and carefully testing them for various applications. The future prospects of prompt engineering, where AI-generated prompts could soon replace traditional manual input, are forecasted, highlighting ongoing advances that make AI tools more intuitive and accessible. This narrative provides a glimpse into how AI models can be seamlessly incorporated into everyday use, showcasing both the current potentials and challenges in maximizing their efficiency.
الوجبات الجاهزة
- 🤖 AI models can be used in every aspect of daily life.
- 📊 Different AI models have distinct strengths and weaknesses.
- 🚀 Model distillation enhances task execution by refining large models.
- 🛠️ Prompt engineering can be optimized with AI-generated meta prompts.
- 🌐 Model routing means selecting the best AI model for each task.
- 💡 Understanding model capabilities is crucial for maximizing potential use.
- 🔄 Iteration and testing improve AI model usage.
- 📈 Combining multiple AI models can improve productivity.
- 📜 Future developments might replace traditional prompt writing.
- ⚖️ Tiered categorization helps in selecting the right AI model.
الجدول الزمني
- 00:00:00 - 00:05:00
AI can enhance everyday tasks but has its limitations. The conversation discusses large language models (LLMs) and their nuanced differences, highlighting the difficulty in perfecting AI performance.
- 00:05:00 - 00:10:00
An interview with Suli Omar reveals insights into AI model ranking systems and prompt development. Omar uses meta prompts to create production-ready prompts and discusses performance distillation in AI models.
- 00:10:00 - 00:15:00
Omar's AI model framework categorizes models based on intelligence and cost, with tier three being cost-effective and used frequently. He provides examples like GPT 40 mini and Gemini Flash.
- 00:15:00 - 00:20:00
Tiered AI models serve different applications, with Omar using tier two models for tasks not requiring the highest intelligence. He often pairs models to optimize task performance.
- 00:20:00 - 00:25:00
AI models have specific strengths and weaknesses. Omar shares an example using Gemini for pinpointing details in text, while GPT 40 mini excels in reasoning, showcasing complementary model usage.
- 00:25:00 - 00:30:00
The future of AI models involves complex routing systems, though challenges exist in achieving the final performance percentage. Current practices involve sophisticated combinations of models.
- 00:30:00 - 00:35:00
Model distillation is powerful yet complex. Good results require robust data evaluation to avoid regressions when simplifying models for efficiency. The future will see better distillation tools and practices.
- 00:35:00 - 00:40:00
Omar demonstrates his prompt optimization process using various AI models, leveraging voice interaction for natural input. He iterates across models to refine prompts before applying them for specific tasks.
- 00:40:00 - 00:49:04
Test-driven development with AI involves using language models to write tests before code, providing checks for accuracy and reliability. Omar adapts this method to improve coding processes.
الخريطة الذهنية
فيديو أسئلة وأجوبة
What is AI model distillation, as discussed in the interview?
AI model distillation involves refining a larger model to perform specific tasks better with smaller models, improving speed and cost efficiency.
How are AI models categorized into tiers in this interview?
Different AI models are categorized by tiers based on intelligence and cost: Tier 1 models are more intelligent but costly, Tier 2 are balanced, and Tier 3 are less intelligent but cheaper.
Does the speaker use various AI models for different tasks?
Yes, the speaker uses different AI models for different tasks, evaluating their strengths and weaknesses for specific use cases like coding, summarizing documents, and generating prompts.
How does the speaker recommend improving the use of AI models?
The speaker emphasizes building a deep understanding of model capabilities and continuously testing them under different conditions to maximize their potential use.
What is model routing, and what does the speaker think about it?
Model routing involves automatically selecting the best model for a given task. It's seen as a future direction for optimizing AI model use but is currently complex to implement effectively.
Does combining multiple AI models improve productivity as suggested in the video?
The speaker finds using different models for their strengths beneficial, for instance, using one model's structured output capabilities while leveraging another's reasoning skills.
What prediction about the future of prompt engineering is mentioned?
The speaker predicts that traditional prompt writing will be replaced by AI-generated prompts, making the process more efficient and refined.
How does the speaker approach prompt generation and optimization?
He uses an iterative, comparison-based approach to generate optimized prompts, often using multiple AI models to refine a prompt for the best outcome.
عرض المزيد من ملخصات الفيديو
The huge problem with youtube ‘advice’... (I lied)
The 3 Minute YouTube Shorts Monetization Update You Need To Know
Being Your Best Self, Part 4: Moral Action | Concepts Unwrapped
ENGLISH SPEECH | SUDHA MURTY: Discipline and Success (English Subtitles)
Towards the Development of Spatial Data Infrastructure - A Standardization Perspective
What's Up, Bro? Season 2 Episode 3: Work-Faith Balance
- 00:00:00it lets you use AI in basically every
- 00:00:02nook and cranny of your day-to-day with
- 00:00:04that model came out it actually opened
- 00:00:06up a lot of things that you could do we
- 00:00:08use a lot of different providers and
- 00:00:09that's because what we've seen with our
- 00:00:11internal evals is that they're all so
- 00:00:15nuanced and different in like a variety
- 00:00:17of different ways but you also start to
- 00:00:19see where they lack you'll get to an AI
- 00:00:22product you get it to 90% even 95% but
- 00:00:25that last 51% is nearly impossible how
- 00:00:27do you think about model distillation
- 00:00:29it's very powerful but you have to be
- 00:00:30very
- 00:00:33[Laughter]
- 00:00:35[Music]
- 00:00:42careful I just had an amazing
- 00:00:45conversation with soy Omar the CEO of
- 00:00:47cognus the company behind auto. not only
- 00:00:51is he one of the best llm practitioners
- 00:00:53that I've met but you can tell he has a
- 00:00:55really deep feeling for how these models
- 00:00:57are actually working he speaks from
- 00:00:58experience in this inter interiew we go
- 00:01:00through his three tier system of
- 00:01:02actually ranking language models he
- 00:01:04shows us how he uses meta prompts to
- 00:01:06develop his real prompts that he uses in
- 00:01:08production he also shows us his cursor
- 00:01:10development flow where he actually has
- 00:01:12the language model write the test first
- 00:01:15and then write the actual code and
- 00:01:17finally he walks us through distilling
- 00:01:18performance from large language models
- 00:01:20to small language models without losing
- 00:01:22performance let's jump into it and let's
- 00:01:24see what wisdom our friend suly has to
- 00:01:26share uh the reason why we're doing this
- 00:01:28interview here is because I see all the
- 00:01:30cool stuff you're sharing on Twitter and
- 00:01:32I'm like this guy clearly has not only
- 00:01:35like a checklist learned uh ability to
- 00:01:39manipulate these models but I can tell
- 00:01:41you you feel them like you really feel
- 00:01:43how these things are actually going and
- 00:01:44the personalities and the nuances and so
- 00:01:46I want to dig in dig into that today
- 00:01:48yeah well thank you and I think it just
- 00:01:50comes from playing with these things
- 00:01:52every day day in day out and using them
- 00:01:56and pushing them to their limit and like
- 00:01:58as cliche as it is is just like
- 00:02:00sometimes you got to use them to Vibe
- 00:02:01with them you know like like right so
- 00:02:04yeah yeah yeah it's so true well I tell
- 00:02:06you what I want to start off with um one
- 00:02:08framework that I saw you document
- 00:02:09recently which was your three tier model
- 00:02:13of language models so tier one through
- 00:02:15tier three so could you tell me like
- 00:02:17starting at tier three what are those
- 00:02:19and how do you work your way up yeah so
- 00:02:21that's a it's a framework that I I mean
- 00:02:23I don't even know if you want to call it
- 00:02:24a framework but it's I like to
- 00:02:26categorize it them into like based on
- 00:02:28intelligence and price which is
- 00:02:30correlated right like the less
- 00:02:32intelligent models are going to be your
- 00:02:33tier three models and then your more
- 00:02:35expensive slower are going to be your uh
- 00:02:38more intelligent model so the reason I I
- 00:02:41thought of it in three tiers was because
- 00:02:43of the application purposes so the way
- 00:02:45that you use something like let's say 01
- 00:02:47so that would be like a tier one is
- 00:02:49different than the way that you use
- 00:02:50something like Gemini flash which is
- 00:02:52tier three um and that's because they
- 00:02:54all provide different purposes one is
- 00:02:56super cheap super fast the other one's
- 00:02:58like really smart and really slow so I I
- 00:03:00broke it down to those three tiers and
- 00:03:02the third tier is basically what I like
- 00:03:04to call just like the you know the
- 00:03:07Workhorse the the ones that you're just
- 00:03:09constantly using 247 and within that
- 00:03:12category I think there was three main
- 00:03:15models but it's kind of come down to two
- 00:03:16for me personally so the first one is
- 00:03:19the one that I think people are probably
- 00:03:21more familiar with which is GPT 40 mini
- 00:03:24now that model and is actually like I I
- 00:03:30really really like it because it lets
- 00:03:32you use AI in a way that previously you
- 00:03:34couldn't like if you were to go back
- 00:03:36let's say six months ago when we had no
- 00:03:38cheap models you had let's say GPT 4 and
- 00:03:41maybe even Claude
- 00:03:433.5 there was a lot of scenarios where
- 00:03:46you couldn't just be like throwing that
- 00:03:47at like random problems like you
- 00:03:49couldn't just be like hey I have this
- 00:03:51you know 20page document I want you to
- 00:03:53go paragraph by paragraph and like
- 00:03:55extract the details because
- 00:03:57realistically like you know you're going
- 00:03:59to be paying a lot of money so with that
- 00:04:01model came out it actually opened up a
- 00:04:03lot of like things that you could do so
- 00:04:05that was the the the the first one with
- 00:04:07was gp4 mini and then the other one that
- 00:04:09I'm starting to really like is Flash so
- 00:04:11Gemini flash is actually half the price
- 00:04:14of GPT 40 mini and those are the tier
- 00:04:17three because like I said they they give
- 00:04:19you a lot of optionality and the
- 00:04:21different things that you could do that
- 00:04:22you couldn't do before it lets you use
- 00:04:23AI in basically every nook and cranny of
- 00:04:26your day-to-day right if whether it's
- 00:04:29your coding and you wanted to look at
- 00:04:31like you know 50 different files to
- 00:04:33summarize to help another model for
- 00:04:35example if you wanted to take a podcast
- 00:04:38and you know look at you know when did
- 00:04:41someone say a specific word in that
- 00:04:42podcast right you're not going to go to
- 00:04:44a bigger model so that was that's what I
- 00:04:46call the tier three um and then the
- 00:04:48second tier that I have is sort of like
- 00:04:50the the middle obviously it's the middle
- 00:04:52tier and this is where I like to slot in
- 00:04:54the actual gp4 Cloud 3.5 Gemini Pro this
- 00:04:57is where I think the majority of people
- 00:04:59you use these models and and kind of get
- 00:05:01the maximum usage out of them um and
- 00:05:03then the last TI is obviously like the
- 00:05:0501 o1 preview and then what I like to
- 00:05:07classify as thinking models yeah that's
- 00:05:09so cool so I want to dig in more into
- 00:05:11the use case side so which use case
- 00:05:13tasks are you doing tier two with and
- 00:05:16then I know that o 01 in tier one is
- 00:05:18going to be um it's not just oh I need
- 00:05:20it smarter it's almost like a different
- 00:05:22type of task you're going to ask it to
- 00:05:23do so how do you differentiate between
- 00:05:24those two right so the way that I like
- 00:05:27to differentiate is I like like I pair
- 00:05:31them so I will use 01 and I use this in
- 00:05:33my dayto day it's like I'll go to Chad
- 00:05:35gbt and if you just go and say hey like
- 00:05:38to1 can you do this task for me one it's
- 00:05:41going to take a little bit of time
- 00:05:42you're probably going to hit some rate
- 00:05:43limits because it's highly limited and
- 00:05:44realistically you're not going to use
- 00:05:46the model the way that I think it was
- 00:05:48intended somewhat to be used so if you
- 00:05:50say like hey how's it going like okay
- 00:05:53sure you could use it like that but
- 00:05:54realistically you're better off using
- 00:05:56you know the the tier two so how I use
- 00:05:58the tier 2 is actually the most I use it
- 00:06:00the most um obviously everyone uses it
- 00:06:02for coding whether it's CLA 3.5 gp4 um
- 00:06:06using it for like function calling or to
- 00:06:08call tool calling like it it is
- 00:06:11obviously like a good balance between
- 00:06:13intelligence and price um and and that's
- 00:06:16kind of like what I use it the most
- 00:06:18whether I'm writing whether I'm asking
- 00:06:20it to like hey help me edit an email or
- 00:06:22things like that I'm using those like
- 00:06:23middle tier ones now how I actually use
- 00:06:26that in tandem with 01 is I'll sort of
- 00:06:29one of the cases I have is I'll come to
- 00:06:31Chad GPT or Claude and I'll sit there
- 00:06:34and I'll just create a giant
- 00:06:35conversation about a specific topic so
- 00:06:37let's say for example you know I'm deep
- 00:06:41diving into a research topic and I want
- 00:06:43to learn more about now I'm not going to
- 00:06:45actually go straight into 01 because I
- 00:06:47feel like one it's a bit slow what I'll
- 00:06:49what I'll do is I'll start the topic
- 00:06:50with gp4 or Claude and I'll like add
- 00:06:54files because obviously I think right
- 00:06:55now 01 doesn't support like files and
- 00:06:57web search so there's a lot of
- 00:06:58capabilities that o1 doesn't support and
- 00:07:00what I like to call is the context
- 00:07:01building so I will just go and build as
- 00:07:04much context in this chat as I possibly
- 00:07:06can or it could be you know in any
- 00:07:08platform and and I'll sit there and
- 00:07:09iterate I'll actually use voice mode as
- 00:07:11well to sort of give context it's a lot
- 00:07:13quicker and that's another workflow and
- 00:07:15as soon as I have like you know let's
- 00:07:18say like two to three pages worth of
- 00:07:20documents I'll actually take that and
- 00:07:23paste it into a chat with 01 or 01
- 00:07:26preview and I'll say Hey you know do
- 00:07:28this gigantic task for me so for example
- 00:07:31I I'll give you one thing to use it for
- 00:07:32is like I was using it to generate use
- 00:07:35cases for my product and I was like okay
- 00:07:37I want to generate use cases and I want
- 00:07:39to understand you know what are some
- 00:07:41potential um customer segments and icps
- 00:07:44it's is like a pretty technical question
- 00:07:46and if I were to just go to 01 and ask
- 00:07:48it that it would have no context it
- 00:07:49doesn't know what my product is it has
- 00:07:51no clue what my product does who my
- 00:07:53customers are and if I were to sit there
- 00:07:55chat with it well I'm going to hit that
- 00:07:56limit but if I go to Claude or Chad gbt
- 00:07:58I can upload documents I can create this
- 00:08:00basically a PDF and copy paste it into
- 00:08:0301 and then I can say generate me you
- 00:08:06know personas icpas it does a lot better
- 00:08:08so that's sort of the the workflow and
- 00:08:09use case that I have currently running
- 00:08:11with like the the tier two and the tier
- 00:08:13one models yeah yeah yeah one of the
- 00:08:15ways that I found 01 works for me really
- 00:08:17well is around actually D duplication so
- 00:08:19if I have a long list of items that say
- 00:08:21I've processed five different chunks
- 00:08:23with the same type of workflow for each
- 00:08:25chunk well I'm going to have a list of
- 00:08:26duplicated items I give that whole thing
- 00:08:28to 01 it's actually really good at D
- 00:08:30duplicating and then I'll use one of the
- 00:08:31tier 2 models to do the structured
- 00:08:33output after that since 01 doesn't yet
- 00:08:34support structured output and go from
- 00:08:36there yeah that actually that's a good
- 00:08:38one that's another thing I do as well is
- 00:08:40I'll take 01 and give me like a long
- 00:08:42verbos output and then take that and
- 00:08:45turn it into structured data sets with
- 00:08:46the uh the tier two and even sometimes
- 00:08:49you could even get away with using that
- 00:08:50with the tier three because it's you
- 00:08:52don't even need to worry about the
- 00:08:53output you're just like hey I want this
- 00:08:55nicely formatted in whatever shape yeah
- 00:08:57yeah yeah for sure so it sounds like
- 00:08:59you're using different models across
- 00:09:01different providers too for different
- 00:09:04use cases or do you stick with one all
- 00:09:06the time yes so we use a lot of
- 00:09:09different providers and that's because
- 00:09:11what we've seen with our internal evals
- 00:09:13is that they're all so nuanced and
- 00:09:17different in like a variety of different
- 00:09:19ways so obviously the big one Gemini
- 00:09:22multimodal right off the bat like
- 00:09:24anything to do with videos or audios
- 00:09:28I'll go you know dive straight into that
- 00:09:30and and kind of use Gemini but you also
- 00:09:33start to see where they lack so for
- 00:09:37example a really interesting one is
- 00:09:39Gemini models are really good at needle
- 00:09:41in the Hast stack and so if you say hey
- 00:09:44I want you to find one or two pieces of
- 00:09:46information in this you know giant long
- 00:09:48piece of text or video it's actually
- 00:09:50really good but then I started to notice
- 00:09:52that something like GPT 40 mini is a
- 00:09:55little bit of a little bit better
- 00:09:57reasoning over that so if I give it a
- 00:09:59long piece of context and I say hey I
- 00:10:01want you to sort of understand the
- 00:10:03context of it I saw I found the GPT 40
- 00:10:05mini is a little bit better so you start
- 00:10:07to see where one model does better than
- 00:10:10the other model in specific area so like
- 00:10:12another example is Claude 3.5 and GPT 40
- 00:10:15now Claude is obviously everyone loves
- 00:10:17that model it's a really good model but
- 00:10:19one thing it's absolutely horrible at is
- 00:10:21tool use with structured outputs and
- 00:10:24you'll start to see this if you it's a
- 00:10:26very complex tool like I want you to
- 00:10:29create the very deep like a a nested
- 00:10:31Json a very you know long structured
- 00:10:34output like a very large amount of the
- 00:10:37time it fails and it gives you XML and
- 00:10:39it just breaks all your parsers whereas
- 00:10:41GPT 40 mini does a lot better job but
- 00:10:44then the caveat is that gp4 o mini is
- 00:10:48not as good at actually like thinking
- 00:10:50through the problem and acting as an
- 00:10:52assistant so there's always these like
- 00:10:53tiny trade-offs that you don't really
- 00:10:55like notice one of the that we did was
- 00:10:58we set up a
- 00:10:59like one of the use case was to get
- 00:11:01around that was we set up Claude and GPT
- 00:11:0340 mini to work together where the tool
- 00:11:06use for Claude would be to call GPT 40
- 00:11:09mini and we basically system where
- 00:11:13Claude could orchestrate GPT 4 mini to
- 00:11:15create the structured output so it would
- 00:11:17say please do this so the user would say
- 00:11:19I want this task all GP all Claude would
- 00:11:22do was relay that information to GPT 40
- 00:11:25mini 40 mini creates the structured
- 00:11:27output and then I guess return so that
- 00:11:28was like another use of like how we mix
- 00:11:30and match so many models across
- 00:11:32different use cases yeah isn't it wild
- 00:11:35how all these little mini Vibe tricks we
- 00:11:38have to kind of like hack together in
- 00:11:40the early days of llms here and then I
- 00:11:42think back to how far we've already come
- 00:11:43like because even like you know like
- 00:11:45January of 23 we're dealing with like
- 00:11:474,000 token context limits and gbt 3.5
- 00:11:50and all the hacks that we had then we've
- 00:11:52upgraded from them now but we still have
- 00:11:54a bunch of hacks like the ones you're
- 00:11:55talking about and so it just makes me
- 00:11:57think we're never going to get rid of
- 00:11:58the hacks and they're always going to to
- 00:11:59be there for a long time I would say so
- 00:12:02too because yeah like you're right it's
- 00:12:05funny looking back at it the hacks that
- 00:12:07you used in 2023 were so different you
- 00:12:09were hacking around context window and
- 00:12:11now you're hacking around well tool use
- 00:12:14which didn't even exist a year ago right
- 00:12:16or like you know a year and a half ago
- 00:12:17so I I agree with you that we're always
- 00:12:19going to be Min maxing as a user of
- 00:12:22multiple models you're going to be Min
- 00:12:23maxing trying to figure out for your use
- 00:12:25case for your product for your company
- 00:12:28where can I you know masch these
- 00:12:29together so that I get the best possible
- 00:12:31outcome for my users um and I know a lot
- 00:12:34of people have and I'm curious what you
- 00:12:35think a lot of people have spoken about
- 00:12:37like model routers and how you know at
- 00:12:39the end of the day like a model is just
- 00:12:40going to pick it but my my personal
- 00:12:43opinion is I I think that it's going to
- 00:12:45cause a lot of unintended side like you
- 00:12:47know side effects but I'm curious what
- 00:12:49you think on like this whole idea of
- 00:12:50like model routing because you know
- 00:12:51we're talking what we're basically doing
- 00:12:53we're internally with code model routing
- 00:12:55but I'm curious what you think so
- 00:12:57whenever I get asked a question like
- 00:12:58this I think is there behavior in
- 00:12:59practice that tells me um what the
- 00:13:02prediction should be and you just
- 00:13:03describe basically you're doing model
- 00:13:05routing on your own like in in in and of
- 00:13:08itself so that tells me yes model
- 00:13:10routing will be a thing and I do still
- 00:13:12think that fine-tuning models and having
- 00:13:14Boke small models is still too much
- 00:13:16overhead like it's really hard to do
- 00:13:18that and manage them and do with them
- 00:13:19all right now all that is going to get
- 00:13:21so much easier so I would imagine that
- 00:13:23not only will we have model routing for
- 00:13:25task specific things against like some
- 00:13:26of the big ones where you have Vibe
- 00:13:28based feels whether regards to
- 00:13:30structured output or tool use or
- 00:13:31whatever it may be but then also um for
- 00:13:34task specific things um I will
- 00:13:36absolutely do model routing so um I'm a
- 00:13:38fan I think it's hard I think it will be
- 00:13:40the future we're not quite there yet
- 00:13:42though that's for sure gotcha yeah like
- 00:13:45my my my sentiment there was that there
- 00:13:47and it could be just because the models
- 00:13:49just where we're at right now what I've
- 00:13:51noticed is and I'm sure you've seen the
- 00:13:53same is where you'll get to an AI
- 00:13:55product you get it to 90% even 95% but
- 00:13:59that last 5% is last 10 5 10% is nearly
- 00:14:03impossible I find like it even you can
- 00:14:05run all the evals you want you can run
- 00:14:07all the benchmarks getting that last 10%
- 00:14:10and I my thought process there is
- 00:14:13that if you have the model sort of
- 00:14:16choosing other models that adds to the
- 00:14:19variance so it causes a lot more
- 00:14:22potential like you know that that's kind
- 00:14:24of where my thinking is and that could
- 00:14:25just be because like we're early like
- 00:14:27realistically we're so early models have
- 00:14:30you know multiple generations to get
- 00:14:31better uh so that was my thought was
- 00:14:33that maybe in the future but right now
- 00:14:36probably not because it's it's so hard
- 00:14:39to get a product in specifically like
- 00:14:42llms into production where you're
- 00:14:44handling every potential Edge case uh in
- 00:14:47a manner that gives you as high of an
- 00:14:49accuracy as you can and adding models
- 00:14:51that you might not have an eval 4 could
- 00:14:55give you an output that you didn't
- 00:14:56expect yeah yeah totally uh well well I
- 00:14:59tell you what one of the other
- 00:15:00interesting things that came up during
- 00:15:01research was your opinion on what is
- 00:15:04kind of becoming known as model
- 00:15:05distillation so you have a really really
- 00:15:07good model you perfect the output from
- 00:15:09there but then you realize wow I can
- 00:15:11actually come up with a little bit of a
- 00:15:12better prompt here and give it to a
- 00:15:14smaller model so that you have it's
- 00:15:16faster and it's cheaper so can you talk
- 00:15:18me or walk me through how do you think
- 00:15:19about model distillation in your own
- 00:15:21workflow yeah so that's a something I
- 00:15:23think about a lot and it's one of those
- 00:15:26things where you need to be very careful
- 00:15:28because it's very it's very powerful but
- 00:15:30you have to be very careful because it
- 00:15:32requires a lot of work and the reason it
- 00:15:35needs a lot of work is
- 00:15:37because you need to have a a good data
- 00:15:40Pipeline and understand what you're
- 00:15:42distilling so one of the things and
- 00:15:44mistakes I made previously with the
- 00:15:45product was that we went we had GPT 40
- 00:15:49this was actually before GPT 40 it was
- 00:15:50gp4 turbo and we used it and it was slow
- 00:15:54and we're like hey let's distill that to
- 00:15:553.5 open AI has a has a really nice um
- 00:15:59way to do it so we did that and the
- 00:16:02problem was that we didn't have good
- 00:16:04enough evals we didn't have a good
- 00:16:05enough data set so as the potential you
- 00:16:09know the various areas grew that people
- 00:16:11could use the product we would notice
- 00:16:13okay we have to revert back to gp4
- 00:16:15because 3.5 was at that time not good
- 00:16:18enough now where I do see distillation
- 00:16:20in our workflow is when you have a
- 00:16:22defined eval set you have like all your
- 00:16:24benchmarks and you have a very good data
- 00:16:27pipeline where you can say okay
- 00:16:29in this 500 example set I'm using Claude
- 00:16:333.5 Sonet or or you know 0an for example
- 00:16:36I have my data set and you can use a
- 00:16:39bunch of different there's a lot of
- 00:16:40different companies that provide you
- 00:16:41with like ways to manage your impr
- 00:16:43prompts and evals whether it's Brain
- 00:16:45Trust or Langs Smith and then you can
- 00:16:48very accurately uh detect and determine
- 00:16:51the accuracy of the distilled model then
- 00:16:5410 out of 10 times I would use it um and
- 00:16:57the easy and it's actually really easy
- 00:16:58like to actually distill the model down
- 00:17:01it's like it's like it's a single API
- 00:17:03call the challenging part is making sure
- 00:17:06that you don't regress your product when
- 00:17:08you do uh the distillation but I think
- 00:17:11it's one of those things that it's going
- 00:17:13to become more and more apparent as the
- 00:17:15tooling around distillation becomes like
- 00:17:17better I know there's a couple companies
- 00:17:19working on it like open pipe is one of
- 00:17:20them um and I know open AI straight up
- 00:17:23offers you that so I think as the
- 00:17:25tooling gets better you're going to see
- 00:17:27this pattern in production
- 00:17:29of companies launching with the biggest
- 00:17:31best model they collect a bunch of data
- 00:17:33they have a good e set and engineering
- 00:17:35team to support that then they go and
- 00:17:37they distill it to whether open you know
- 00:17:39GPT 40 mini or an open source model yeah
- 00:17:42that's beautiful my favorite line with
- 00:17:43that is the whole make it work make it
- 00:17:45right make it fast and so it's like look
- 00:17:47you're going to use the biggest one to
- 00:17:48start us off but then you're going to
- 00:17:49make it fast eventually and go from
- 00:17:51there um this is awesome I tell you what
- 00:17:54though so I know you're a practical
- 00:17:56person I would love to jump into like
- 00:17:58you actually showing us some of the ways
- 00:17:59that you use these tools and I think a
- 00:18:01really cool starting off point would be
- 00:18:03I know that you're a fan of prompt
- 00:18:05optimizers or like meta prompt writing
- 00:18:08and so yes because you had you had a
- 00:18:11tweet and literally said pretty good
- 00:18:13chance you won't be prompting from
- 00:18:14scratch in two to three months so I
- 00:18:17would love to see the way you kind of
- 00:18:18prompt engineer your way from like an
- 00:18:20idea to like I'm going to go use this
- 00:18:23thing okay yeah hopefully my prediction
- 00:18:26uh ages well because I feel like it's
- 00:18:28been a month since I said that and I
- 00:18:29don't know if we're two to three months
- 00:18:31away from it but okay let me yeah I just
- 00:18:35to add some context I do a lot of this
- 00:18:36sort of meta prompting where I'll come
- 00:18:39in with a problem what is what is meta
- 00:18:40prompting let's start there you come in
- 00:18:42with a general idea of what you're
- 00:18:44trying to do you have a problem that
- 00:18:46you're trying to solve like
- 00:18:47realistically if you're coming in you
- 00:18:48don't know what problem you have that
- 00:18:49you're trying to solve with an AI it's
- 00:18:51it's sort of useless so an example would
- 00:18:53be um the other day I was trying
- 00:18:56to get uh one of the models to write
- 00:18:58like me which to this day I I cannot for
- 00:19:02whatever reason and I was like I came
- 00:19:05into it and I came into Chad GPT and I
- 00:19:07had all my examples and I was like okay
- 00:19:09what do I write and I normally I would
- 00:19:11write something like you know you you
- 00:19:12write like a basic promp structure and
- 00:19:15the reality is that prompt is probably
- 00:19:16not that good so what meta prompting or
- 00:19:19what I like to think about this work
- 00:19:20this idea is that you come in with an
- 00:19:21idea hey I want to have an AI right like
- 00:19:24me I have examples and then I just give
- 00:19:27that to 01 or claw and I say please
- 00:19:30create the prompt for me and that's sort
- 00:19:31of what I like to think of like this I
- 00:19:34come in with a a rough idea of what I'm
- 00:19:35trying to do I don't really know
- 00:19:37specifically how to optimize it I'll go
- 00:19:39to these models and say hey like
- 00:19:40actually give me this promp structure
- 00:19:42and it does a pretty good job so that's
- 00:19:43kind of the the rough idea of how it
- 00:19:45works but let's let me should we just
- 00:19:47hop into like yeah I would love to jump
- 00:19:49into it if you could share your screen
- 00:19:50and then are you using just a regular
- 00:19:52chat interface or are you going to
- 00:19:54anthropics workbench and doing their
- 00:19:56prompt Optimizer I I just used the chat
- 00:19:59interface because I feel like the prompt
- 00:20:01I mean people some people do use it I
- 00:20:03and I think you can start with it um but
- 00:20:06I just find it easier because I can
- 00:20:07iterate a lot better I can say hey start
- 00:20:10like this and do that so let's actually
- 00:20:12do it but I I want to start and say do
- 00:20:14you have some sort of task that like we
- 00:20:16should we start we should start with
- 00:20:17like a rough idea because I like do you
- 00:20:19have any like what what's the task we
- 00:20:21could Dem let's do a straightforward one
- 00:20:24let's do what I guess I'll give you a
- 00:20:26few options you tell me what you think
- 00:20:27is best we could do the classification
- 00:20:29one which is very standard hey I have
- 00:20:30some data sources or can you please
- 00:20:32label them for me um we could do either
- 00:20:36like uh unstructured to structured
- 00:20:38extraction so like extracting insights
- 00:20:40from a piece of text or we could do uh
- 00:20:43idea generation that's always a fun one
- 00:20:45too okay let's do the let's do the
- 00:20:49extracting text one and I think that's a
- 00:20:51good one so let's say we I like to
- 00:20:53always preface it with like the problem
- 00:20:54or what we're trying to do so again what
- 00:20:56I like to come into it is like all right
- 00:20:57I have a problem I'm trying trying to do
- 00:20:59a specific task and usually this is like
- 00:21:01my blank State slate starting point so
- 00:21:03let's say the task that I'm trying to do
- 00:21:05is I have a large piece of text and I
- 00:21:08want to you know turn that piece of text
- 00:21:10into something else some sort of
- 00:21:11structured output and it's it's funny
- 00:21:14because a lot of people say like oh is
- 00:21:15it complicated it's really like I just
- 00:21:18come to Chad GPT and I or or claw and I
- 00:21:20basically say that so the way that I go
- 00:21:22is I'll say you know you could use
- 00:21:24Claude or or chagy PT I haven't found
- 00:21:26which one is really better again and
- 00:21:29this is kind of going back to my
- 00:21:30original workflow is what I'll do is
- 00:21:32I'll actually start with gp4 or Claude
- 00:21:34and I'll get like a rough idea for a
- 00:21:36prompt and I'll copy that and I'll give
- 00:21:38it to 01 and then I'll start to compare
- 00:21:40across all three to see which one like
- 00:21:42makes the most sense so let's say for
- 00:21:44example in this one I am grabbing
- 00:21:48transcripts from podcasts and I want to
- 00:21:50know like you know I want a nice like
- 00:21:54structured output for all of the key
- 00:21:57exciting moments let's say that that
- 00:21:58like the problem space so now you could
- 00:22:00come in and you could create a prompt
- 00:22:01and says okay given this video I want
- 00:22:04you to do this or I come to CL and say
- 00:22:05look like and actually the other
- 00:22:07workflow that I I wish I could demo is I
- 00:22:09use voice a lot so I don't know if um if
- 00:22:12you use voice a lot but I've notice that
- 00:22:16with voice here I don't use it a ton
- 00:22:17yeah it hasn't entered my workflow yet
- 00:22:19but I'm I'm voice curious so I want to
- 00:22:21try actually see this let's see this
- 00:22:23okay so I have I have something here I
- 00:22:25want to show you the whole workflow that
- 00:22:27I use so that I
- 00:22:30so and let me know if you need a
- 00:22:31transcript I have one handy for us
- 00:22:34actually yeah could you could you toss
- 00:22:35me it there and then I will use it I'll
- 00:22:38copy paste this okay let me know when
- 00:22:40you have the transcript and then mm
- 00:22:43small plug this is MFM Vault website I
- 00:22:45put together that does insight
- 00:22:47extraction from my first milon there we
- 00:22:48go Okay cool so let's say our goal is to
- 00:22:51extract insights now my workflow is I
- 00:22:54have a tool that transcribes it so I
- 00:22:56think it works so let's say I'll just
- 00:22:58exactly show you how to do it okay hey
- 00:23:01uh I need a bit of help creating a
- 00:23:02prompt uh for a uh use case so what
- 00:23:05we're doing right now is taking podcast
- 00:23:08transcripts and trying to extract all of
- 00:23:10the key moments key insights so I need
- 00:23:13you to create a a nice uh prompt that
- 00:23:15will you know help us do that and I'll
- 00:23:18I'll give I'm going to put in the prompt
- 00:23:19as well later on the actual transcript
- 00:23:21but I need you to create the prompt SL
- 00:23:22system
- 00:23:23prompt so boom so that's that's actually
- 00:23:26sort of how I do it it's there's no
- 00:23:28signs to it and I I'll sit there and
- 00:23:30kind of like here and I'll copy this and
- 00:23:32I'll actually do this I'll go into Chad
- 00:23:33GPT I'll paste it and I'll actually also
- 00:23:34place it into
- 00:23:36CLA and it's going to go and it's going
- 00:23:39to give me like a uh starting
- 00:23:42point and so right off the bat like if
- 00:23:44you're maybe not as good at prompting or
- 00:23:47you're new to prompting like you can
- 00:23:50read this like obviously if you're more
- 00:23:51experienced and you kind of know like
- 00:23:54what you're doing these kind of prompts
- 00:23:55are like pretty obvious but for a lot of
- 00:23:57people they'll come in and be like okay
- 00:23:59cool I have a a good starting point so
- 00:24:01then all I'll do is I'll look at this
- 00:24:02say okay the following is a podcast
- 00:24:04transcript identify so and I'll compare
- 00:24:07it to here so right off the bat I don't
- 00:24:09know if you which one you think is
- 00:24:10better but I'm looking at this and I
- 00:24:12like the claw output better um little
- 00:24:15bit
- 00:24:16more uh what's it called clear Direction
- 00:24:19so I'll actually copy this and I'll be
- 00:24:22like okay we have a rough outline I
- 00:24:24liked the first pass I liked the one
- 00:24:27from
- 00:24:29uh cloth I'll take that and I'll go back
- 00:24:31to Chad GPT and I'll open up a new tab
- 00:24:34and then I'll
- 00:24:35say let's go to o1 preview so then I'll
- 00:24:37actually do the same thing um I'll say
- 00:24:40and I'll actually give it more context
- 00:24:41so I'll say something along the lines of
- 00:24:43and again I I'll go back to the voice
- 00:24:45mode here I'll say hey um you're going
- 00:24:47to help me optimize a prompt so I
- 00:24:49already got another AI model to give me
- 00:24:51a rough idea for this prompt I want you
- 00:24:53to look at it and tell me if there's any
- 00:24:54areas in the prompt that we could
- 00:24:55improve um so I'll give you the prompt
- 00:24:57and I'll actually give you the prompt
- 00:24:58that I gave to the I AI that generated
- 00:25:00this
- 00:25:01prompt so it's going to go and then I'm
- 00:25:04going to go like this so this is sort of
- 00:25:06here you
- 00:25:08know
- 00:25:10original prompt to AI I'll paste that in
- 00:25:13a sec um what's amazing is just how you
- 00:25:17speak to it just like a human like it's
- 00:25:19not complicated it's literally just
- 00:25:20being clear in your
- 00:25:22directions it's something
- 00:25:25that I recently started to do and
- 00:25:29I think it's a very a lot of people talk
- 00:25:31to the AI as if it's not a human but
- 00:25:33they perform the best when you just
- 00:25:34speak to it naturally and I found that
- 00:25:37voice is the best modality to do that in
- 00:25:39because it's very hard to sound robotic
- 00:25:42when you're talking to like the the chat
- 00:25:45it's like you have to just talk
- 00:25:46naturally um and then I found that it's
- 00:25:48it's also a lot faster like if I were to
- 00:25:50sit here and type that it would take me
- 00:25:51a lot so here I'll go here I'll T I'll
- 00:25:54paste this um original prompt you know
- 00:25:58and then I'll say Okay cool so I like
- 00:26:00that one and now this is the second pass
- 00:26:02and now this is where again kind of
- 00:26:04going back to the workflow that I use
- 00:26:05right is I'll come in here and iterate
- 00:26:07with voice on this specific subset of a
- 00:26:09problem which is generating this kind of
- 00:26:11like like a prompt we sat there with
- 00:26:13gp24 we sat there with Claude iterated a
- 00:26:16bit um and then I'm I'm like okay I have
- 00:26:18a rough idea this prompt looks somewhat
- 00:26:20good and then I'll come back to 01
- 00:26:22preview and I'll say okay cool I want
- 00:26:24you to optimize this and I haven't found
- 00:26:27like I don't have a real scientific
- 00:26:29method to which one is best because I
- 00:26:31just kind of sit there and and this is
- 00:26:32kind of where I have like a good first
- 00:26:34generation of the prompt realistically
- 00:26:36I'll put this into production I'll write
- 00:26:38a couple of like you know uh evals I'll
- 00:26:40say okay how does this actually perform
- 00:26:42and then kind of iterate back but this
- 00:26:43is sort of my starting point so we'll
- 00:26:46let this go
- 00:26:49um okay so
- 00:26:53here and then it gives me some
- 00:26:56things can you please generate the new
- 00:27:00prompt now all right cool it gives me
- 00:27:03the revised prompt so it it did gives
- 00:27:07you finally the answer yeah and and sort
- 00:27:10of you can see here and you can OB say
- 00:27:13here this is just for the sake of this
- 00:27:14and now what I'll do is I will take this
- 00:27:17and then I will actually go to and this
- 00:27:19is my full workflow we can use any model
- 00:27:22but let's say we're going to use um you
- 00:27:25have a preference of which model you
- 00:27:26want to test out the actual uh
- 00:27:29transcription we can actually do I'd
- 00:27:31love to hear which one you think and why
- 00:27:33and let's just test it out let's let's
- 00:27:36test it out so now we go to studio so
- 00:27:38and you see what I mean it's like
- 00:27:40there's all these different models I'll
- 00:27:43go to Studio which is Gemini now we're
- 00:27:45going to go to Gemini which I found so
- 00:27:48specifically Gemini Pro uh better at
- 00:27:51sorts of these these sort of tasks um
- 00:27:53and now I'm here with Gemini Pro which
- 00:27:55I'm going to take and grab the prompt I
- 00:27:58crafted with 01 put it into the system
- 00:28:01prompt of uh what's it called Gemini Pro
- 00:28:04paste in the the transcript and we'll
- 00:28:06see how it goes all right beautiful yeah
- 00:28:08that sounds
- 00:28:10great all right let's copy this
- 00:28:14here okay this is how the sausage is
- 00:28:17made yeah it's it's this is how I like
- 00:28:20to think of like the first generation of
- 00:28:21a promp or I'm not really sure where I'm
- 00:28:24starting off with obviously like is this
- 00:28:26something that I would use in production
- 00:28:27probably not because you want to test it
- 00:28:29out and have a lot of back and forth um
- 00:28:31but okay cool can I is there a way to
- 00:28:34copy paste the transcript you're just
- 00:28:36gonna have to select all down at the
- 00:28:38bottom
- 00:28:39there that would be nice to copy the
- 00:28:41transcript actually I think I might add
- 00:28:43that feature in there yeah it's a let me
- 00:28:46see if I can just this I'm
- 00:28:50on all
- 00:28:53right cool now we go grab this okay and
- 00:28:58then I'll obviously like do a second
- 00:28:59pass to make sure that this actually
- 00:29:01makes sense key moments obviously yeah
- 00:29:05okay this looks pretty good time stamp
- 00:29:08three to takeaways extract one sentence
- 00:29:11discussion themes theme name
- 00:29:14um yeah like Okay cool so here I'll
- 00:29:17paste this in and we'll let it we'll let
- 00:29:19it run here so I'm using Gemini
- 00:29:21Pro um all right 177,000 tokens and and
- 00:29:24for for people who are curious like
- 00:29:26Gemini Pro
- 00:29:28I I talked about this recently is that a
- 00:29:30lot of models can't actually reason over
- 00:29:32a large context like um but for
- 00:29:36something like Gemini Pro anything under
- 00:29:37100K tokens it's uh it's pretty good at
- 00:29:40like being able to synthesize a a
- 00:29:42relatively intelligent answer so
- 00:29:45here okay that's really
- 00:29:49cool and now yeah key moments how you
- 00:29:52leverage CrossFit I'm actually curious
- 00:29:55to like see how it this would do against
- 00:29:57like you know other benchmarks because
- 00:29:59we don't really know if this is a good
- 00:30:00output or not and that's where the whole
- 00:30:02point of evals is but there you go you
- 00:30:05have how I went from an idea to
- 00:30:10generating like a full I guess optim air
- 00:30:13quote here optimize prompt and the
- 00:30:16reason for that is just like for me to
- 00:30:18sit here and write this probably would
- 00:30:20have taken like an hour hour and a half
- 00:30:22maybe like give or take depending on how
- 00:30:24good you are but you know we just did it
- 00:30:26live in whatever 10 minutes so yeah
- 00:30:28that's super super cool I love that um
- 00:30:31so then out of curiosity what are you
- 00:30:33using for prompt management so I saw a
- 00:30:36um a tweet by the CEO of prompt layer
- 00:30:39Jared and he's like yeah I see everybody
- 00:30:40they go through the same they go through
- 00:30:42the same world first their prompts are
- 00:30:44just hard-coded in their code and then
- 00:30:46second their prompts are hard-coded in
- 00:30:47text files but they're still in their
- 00:30:49code base and then third you actually go
- 00:30:50to a prompt manager what what are you
- 00:30:52using for prompt management so for
- 00:30:55that's an interesting one we obviously
- 00:30:57we use GitHub for our our our prompts
- 00:31:01yeah so we use a lot of a couple of
- 00:31:03different things maybe maybe we're not
- 00:31:05like we're not prompt managing correctly
- 00:31:08but we just have our prompts that we
- 00:31:11store in Langs Smith and sort of I'll
- 00:31:14just have data sets and I'll compare
- 00:31:18that prompt to that data set so for
- 00:31:20example we have a giant data set of like
- 00:31:22a thousand examples that I I run or test
- 00:31:25against different models different
- 00:31:26prompts and that prompt is just like
- 00:31:29stored you know in in the data set and
- 00:31:33then whenever I want to change the
- 00:31:34prompt I'll actually change it and
- 00:31:36duplicate data set paste in the new
- 00:31:38prompt and like my version so to speak
- 00:31:41so the actual prompt stays in my
- 00:31:44codebase with the latest version of like
- 00:31:46this is the the source of Truth and all
- 00:31:49previous other versions are different
- 00:31:51data sets where I can see how they
- 00:31:53perform so for example if I want to go
- 00:31:55back to a prompt that was like you know
- 00:31:56let's say from a week ago I just look at
- 00:31:58the data set that was from a week ago
- 00:32:00and I can see the prompt is there and I
- 00:32:01can also see how it performs so that's
- 00:32:03how I manage uh like an inversion it I'm
- 00:32:06not sure if that right approach but
- 00:32:07that's how way I do it sir so in your
- 00:32:09code is the prompt that's being called
- 00:32:11is it actually in your code or are you
- 00:32:13calling out to langub and Lang Smith
- 00:32:14every single time it's in the code so
- 00:32:17the the our code it's in GitHub and the
- 00:32:19nice part is because it's just all
- 00:32:21Version Control like I could look at the
- 00:32:23git history and I can actually see okay
- 00:32:26this person changed this line is as well
- 00:32:28which is nice so I have the line by line
- 00:32:30version controlled from git um and then
- 00:32:32if I want to see the full prompt I can
- 00:32:34look back at like a you know the the
- 00:32:37data management tool yeah that's very
- 00:32:38cool um I tell you what I had one more
- 00:32:40demo on here that I was like this would
- 00:32:42be so cool if so Su would show us how we
- 00:32:44use this um it's a cursor one actually
- 00:32:46so I saw that you tweet you you said do
- 00:32:49I actually have the llm write the test
- 00:32:52first then the code it helps a ton which
- 00:32:55that's a framework I don't see too many
- 00:32:57people doing of course there's test
- 00:32:58driven development but like not in
- 00:33:00practice not usually I'm not seeing a
- 00:33:01lot of people do that could you walk us
- 00:33:03through like how do you write that test
- 00:33:05first and then how do you ask it to
- 00:33:06write code right after that yeah okay
- 00:33:09this is one that I the reason I started
- 00:33:11to do was because the problem I was
- 00:33:13facing the model just kept messing up
- 00:33:14like every single time it was within our
- 00:33:16code base and I was like this is this is
- 00:33:19a waste of my time the model can't
- 00:33:20figure it out how about I just get it to
- 00:33:23generate the test first and then if the
- 00:33:25test works then it can maybe look at the
- 00:33:28code and say where the issues are
- 00:33:29because models guess what if a test
- 00:33:31fails you can grab the error output give
- 00:33:34it back to the model and say hey like
- 00:33:36please decipher that so let's actually
- 00:33:38see if I can like I can spin up um like
- 00:33:41a little mini project or something or
- 00:33:43yeah yeah let's see here if I can spin
- 00:33:45up something new I actually think this
- 00:33:47is really cool and this is like
- 00:33:48something like really truly not enough
- 00:33:50people are doing this and if it legit
- 00:33:52helps you write better code because it
- 00:33:54makes sense you have the test that's
- 00:33:56supposed to run successfully and it can
- 00:33:58use that as instructions and it can use
- 00:33:59that to like test to make sure it's
- 00:34:00actually working I'm surprised not a lot
- 00:34:02of people not more people are doing this
- 00:34:04where it's like right that's like it's
- 00:34:05just a lot easier for the llm to like do
- 00:34:08that and then your code
- 00:34:10is I guess like you know less spaghetti
- 00:34:12because you're not you don't you're not
- 00:34:13worried about you know if something
- 00:34:15changes like the model like you start
- 00:34:17with the tests and it's really easy for
- 00:34:18the model to generate it okay so I got I
- 00:34:21that took a little time I got a uh a
- 00:34:24cursor here so this is just a super
- 00:34:26quick um let me just grab the screen
- 00:34:29here super quick here so I have this you
- 00:34:33know super basic thing we can just
- 00:34:34terminal we can run it and I can go you
- 00:34:37know button
- 00:34:40index.ts Hello World um now I should
- 00:34:43like to start with cursor and I'll just
- 00:34:45say something along the lines of like
- 00:34:47literally and again I actually don't
- 00:34:49know how to write tests and fun so I can
- 00:34:51just go to cursor I open up command I
- 00:34:53and for those you don't this is like the
- 00:34:54composer it lets you uh coordinate and
- 00:34:57create file so I'm going to say you know
- 00:34:58I'm using FN uh for now create a test uh
- 00:35:04file for a method and then make the
- 00:35:08method uh that let's say for now
- 00:35:11reverses a string super simple um and oh
- 00:35:15I guess I'm out of slow request
- 00:35:17unfortunately okay wow so it what it'll
- 00:35:20first do is it'll create the test right
- 00:35:22and this is obviously a really simple
- 00:35:23example and so here I'm happy with this
- 00:35:27all right I'll I'll just accept this um
- 00:35:31and now right off the bat like there's
- 00:35:34you know how many whatever five tests
- 00:35:36here so obviously I have the actual
- 00:35:39function so here in this example just
- 00:35:40reversing string now the nice part is I
- 00:35:43can go here I can say you know bun I
- 00:35:45guess it's uh reverse
- 00:35:49test.ts um and I can again debug with
- 00:35:53composer this is a nice part I can just
- 00:35:54go debug with I
- 00:35:59got up I got up
- 00:36:01my man I'm out of the
- 00:36:07free that's how much I use cursor I I
- 00:36:10just always blow through the budget but
- 00:36:11Okay cool so here it like you know
- 00:36:14passes the test but let's actually say
- 00:36:15that like we are using something a
- 00:36:16little bit more complicated than
- 00:36:18reversing uh a string now I can go into
- 00:36:20here and I can say let's just not
- 00:36:23reverse it let's just say like let's
- 00:36:24just break the code let's just say here
- 00:36:27we'll split it like this okay um return
- 00:36:31dot okay so now if I go here I go test
- 00:36:35if I go test file so all these tests
- 00:36:36fail right now obviously this like a
- 00:36:38pretty simple example and it's almost as
- 00:36:41simple as just clicking this button that
- 00:36:43says add to composer and then I say um
- 00:36:47you
- 00:36:48know please fix the reverse method due
- 00:36:53to errors and now the nice part is here
- 00:36:56cursor will pull in that terminal
- 00:36:58that'll throw you know the errors where
- 00:36:59it happens and what cursor will do is
- 00:37:03they'll look at that and they'll say hey
- 00:37:04look I see what the issue is and it'll
- 00:37:06just fix it so this is kind of what I
- 00:37:08like to call of like I don't actually
- 00:37:09have a name for it yet maybe llm test
- 00:37:12driven development whatever you want to
- 00:37:14call it but it's like you come in and
- 00:37:15you describe what you're trying to do
- 00:37:17here the llm writes the tests for for it
- 00:37:20and then it's going to write the method
- 00:37:22and then what you can do is have it run
- 00:37:24and now if the method itself like this
- 00:37:25function which is reversing a string is
- 00:37:27is complex or confusing it will be able
- 00:37:30to sort of like essentially agentically
- 00:37:32air quote here fix itself if that makes
- 00:37:34sense it'll test the code see if it
- 00:37:36passes the tests if not it'll update the
- 00:37:39code and then sort of do that until it
- 00:37:41can you know pass the test and all you
- 00:37:43have to do is make sure that the tests
- 00:37:45you're writing are correct and and I use
- 00:37:47this a lot for obviously for simple
- 00:37:49functions it's not that useful but when
- 00:37:51you have code that is across a couple
- 00:37:53different files you know in a in a
- 00:37:55modern code base it's not just a single
- 00:37:57function it's like you have like you
- 00:37:58know a bunch of different files and and
- 00:38:00stuff connecting and ones that require a
- 00:38:04lot of like conditionals or
- 00:38:07like they're not as simple as this it's
- 00:38:09like that's where I found that whenever
- 00:38:11I would try to get like cursor or sorry
- 00:38:13I get like son it to oneshot it it would
- 00:38:14fail every single time but then a second
- 00:38:16that I was like okay please let's write
- 00:38:19the test for it and then I would sit
- 00:38:20there and kind of help it write the test
- 00:38:21it was able to debug itself a lot better
- 00:38:23and go through these like bigger maybe
- 00:38:26meteor functions that normally wouldn't
- 00:38:28be able to even like 01 and 01 mini
- 00:38:30couldn't solve but a second that I would
- 00:38:32apply this like test driven development
- 00:38:34whatever you want to call it the model
- 00:38:35was able to look at the output see where
- 00:38:37it messes up adjust the code and kind of
- 00:38:39iterate on itself like that that's cool
- 00:38:41so not only does this test first mindset
- 00:38:45um it's kind of like a prompt
- 00:38:46engineering technique it's almost like
- 00:38:47think out loud but it's almost like
- 00:38:49write the goal first and then tell me
- 00:38:50what you think we should do for it but
- 00:38:52you also get tests out the other end and
- 00:38:54so you get a little bit of extra utility
- 00:38:55as a byproduct
- 00:38:59exactly it's a it's a win-win you get a
- 00:39:01little bit of both and to me that was
- 00:39:03the one thing I never understood why
- 00:39:05people haven't done more of because you
- 00:39:07would think well if it pass all the
- 00:39:08tests the the code is like you know
- 00:39:11you're happy that it passed the test but
- 00:39:12it's something that I haven't seen a lot
- 00:39:13of people do yeah yeah for sure well
- 00:39:15that's awesome well that's fabulous
- 00:39:17thank you for showing me the cursor
- 00:39:18example one of the questions I love
- 00:39:20asking is I want to know what the smart
- 00:39:22people are talking about right now like
- 00:39:24in AI so like as You observe on Twitter
- 00:39:27in your circles what are the smart
- 00:39:29people talking about that's a good
- 00:39:31question oh man I
- 00:39:33think what I see a lot of people talking
- 00:39:36about is sort of the you know what's it
- 00:39:39called like test time compute like 01
- 00:39:41thinking I see a lot of people talking
- 00:39:42about those I see a lot of people
- 00:39:44talking about having think those
- 00:39:47thinking models do more agentic sorts of
- 00:39:50tasks um and basically bringing this
- 00:39:54what I like to think of as an agent as a
- 00:39:56for Loop inside to the model uh thinking
- 00:39:59process having and training the the
- 00:40:02model to just innately be able to call
- 00:40:04tools like and we saw that I think a
- 00:40:07good example that is uh computer use
- 00:40:09right from anthropic right they they
- 00:40:11obviously fine-tuned in on that so I see
- 00:40:13a lot of people talking about that um I
- 00:40:15do see what I started to notice is
- 00:40:17people starting to talk about whether
- 00:40:19we've hit some variation of a wall I
- 00:40:21don't know if you've seen it too and
- 00:40:22I've hearing a little rumors that you
- 00:40:24know Cloud 3.5 Opus is not up to par and
- 00:40:28like the the new Gemini model is not as
- 00:40:31good so I I've hearing that as well um
- 00:40:35and what else are people really talking
- 00:40:37about and I think I think we spoke a lot
- 00:40:39about the other things model
- 00:40:40distillation um and the other thing I'm
- 00:40:42starting to see more of is people being
- 00:40:44a little bit not I guess talking more
- 00:40:47about evals like I I think a lot of
- 00:40:48people didn't really talk about it and
- 00:40:50people are saying hey like from a
- 00:40:51product perspective if you want your
- 00:40:53product to be good you need to write
- 00:40:54evals which are just a way of writing
- 00:40:56test so that's kind of what I seeing and
- 00:40:57I don't know if you've SE anything
- 00:40:58different but just from what I've heard
- 00:41:00from people talking yeah let me think is
- 00:41:02there any anything else I would
- 00:41:04add to that list um the one thing people
- 00:41:07aren't talking about it but I think it
- 00:41:09will be a big deal when it actually
- 00:41:10comes out is the whole feature
- 00:41:11engineering um weight manipulation uh
- 00:41:14like the Golden Gate uh Claude
- 00:41:17anthropic I'm still waiting for access
- 00:41:20to that because that is going to be an
- 00:41:21alternative to prompt engineering and I
- 00:41:22have no idea like how easy it's going to
- 00:41:24be to work with what kind of results
- 00:41:26we're going to get but I'm excited test
- 00:41:27that whenever it comes out yeah I I I
- 00:41:29remember seeing that I was like I was
- 00:41:31blown away and I kind of forgot about it
- 00:41:32so that I'm actually interested to see
- 00:41:34if they ever will ever let you have that
- 00:41:36much inoperability with those models
- 00:41:38like maybe there's like no no we're good
- 00:41:40sorry we're shelving it like you're not
- 00:41:41allow to touch it right but that be
- 00:41:43really interesting yeah for sure for
- 00:41:45sure um awesome two more questions here
- 00:41:48last one I love hearing about what is in
- 00:41:50people's tool kit so I've seen you use
- 00:41:52Exel draw on Excel draw on your YouTube
- 00:41:56videos I've seen you use repet I've
- 00:41:58heard Rumblings about VZ what else is in
- 00:42:00your toolkit that is in your kind of
- 00:42:02day-to-day workflows okay okay so
- 00:42:04there's a lot I guess yeah you got you
- 00:42:05got a couple V Zer obviously there's
- 00:42:07cursor um excal draw I like it for
- 00:42:10drawing little diagrams um the other one
- 00:42:13I guess that I use a lot is the
- 00:42:15playground from anthropic and from open
- 00:42:17AI uh which is like different than chat
- 00:42:19GPT I use that to iterate on prompts
- 00:42:22um I use this yeah the the one that I
- 00:42:25use for transcribing uh the actual audio
- 00:42:28is called whisper flow it's the one
- 00:42:30where I like I have a hotkey that I
- 00:42:31press and it takes the voice and
- 00:42:34transcribes it into the inputs that you
- 00:42:35saw me use um the other tooling that I
- 00:42:38use I mean we can go do you want to go
- 00:42:40into the technical side or are we just
- 00:42:42going to leave it at like the high level
- 00:42:44I let's let's not go like I don't want
- 00:42:45to know your entire teex stack but like
- 00:42:47what is in like the cool AI stuff that
- 00:42:49like you're you're you're grabbing for I
- 00:42:52think that's pretty much it I think um I
- 00:42:55think you got it there I I there's not
- 00:42:57many other tools that I honestly use
- 00:42:58like I just like I a lot of it's yeah
- 00:43:02like just writing the code Langs Smith
- 00:43:03is one actually I will say that we we we
- 00:43:06use Lang Smith a lot for eval that's
- 00:43:08like the other one um but yeah that's
- 00:43:10pretty much it from from me I think you
- 00:43:12nailed it vzero cursor excal draw um OBS
- 00:43:16if you're recording videos yeah yeah
- 00:43:18yeah yeah for sure um all right last
- 00:43:20question and this is kind of off topic
- 00:43:22from the AI side but I know people would
- 00:43:23be interested in it so you've had a few
- 00:43:25bangers on Twitter like just some things
- 00:43:27that just absolutely pop and as somebody
- 00:43:29who does a little bit of Twitter himself
- 00:43:30too I can look at a tweet and be like
- 00:43:32that person thought about it and they
- 00:43:33did a really good job as to how they
- 00:43:34architected and constructed it and I
- 00:43:36noticed that with yourself so what hits
- 00:43:38on Twitter and what what's your advice
- 00:43:40for people who like want to do better on
- 00:43:43it oh man okay so Twitter is just this
- 00:43:46hilarious platform that the algorithm
- 00:43:49changes a lot so it's you kind of got to
- 00:43:51get a feel for what works and what
- 00:43:53doesn't and luckily the cost so for
- 00:43:55anyone's looking to grow the cost to
- 00:43:57post on X Twitter is zero like you don't
- 00:44:00pay anything if it doesn't do well no
- 00:44:02one cares so it's the one platform where
- 00:44:05the cost is literally zero because
- 00:44:07you're just typing so type things away
- 00:44:09how I craft a banger it's like a mixture
- 00:44:12of what I see trending so what I see
- 00:44:15what people are talking about and
- 00:44:17there's two ways to craft a banger one
- 00:44:20is you have to be controversial I'm
- 00:44:22you're are not going to craft a banger
- 00:44:23if you're not controversial now there's
- 00:44:25pros and cons if you're posting that
- 00:44:27kind of stuff all the time people will
- 00:44:28be like hey you're just posting
- 00:44:30clickbait so you got to be careful with
- 00:44:31it you can't be like this is insane and
- 00:44:34every single tweet starts with that like
- 00:44:36no one and no one's going to believe you
- 00:44:37but start saying something controversial
- 00:44:40and the most important part of crafting
- 00:44:42a banger is your hook it I can tell like
- 00:44:45honestly I'll post something and I can
- 00:44:47tell within 20 minutes if it's going to
- 00:44:49be a banger or not and it's basically
- 00:44:52how natural does it come that's one
- 00:44:54that's like how natural did this thought
- 00:44:55come to me and how well did I craft that
- 00:44:57hook everything in
- 00:44:59between like you could you can kind of
- 00:45:01sit there in minmax but the the that's
- 00:45:04how I sit there and sometimes I'll sit
- 00:45:05on something and I'll be like oh man
- 00:45:07like I just don't know the right way to
- 00:45:09say it so I won't post it but then it'll
- 00:45:11just come to me and I'll be like all
- 00:45:13right I got this I all the words I'm
- 00:45:16using the right structure it's like the
- 00:45:18the right timing and and that's kind of
- 00:45:21what goes into crafting it so um the one
- 00:45:23piece of advice that I will give from my
- 00:45:25personal experience is don't spend too
- 00:45:27much time on a tweet because I unless
- 00:45:30you're doing educational there's there
- 00:45:32should be a diagram where the more time
- 00:45:34you spend thinking about a tweet the
- 00:45:36worst it does because I swear the
- 00:45:38majority of my bangers I spend like 15
- 00:45:40minutes thinking about I'm like all
- 00:45:41right I'm just going to post it you know
- 00:45:43grab a coffee I come back and end blow
- 00:45:44it up and then all of a sudden you see
- 00:45:461.4 million
- 00:45:48views oh man do I have time I have I
- 00:45:51have to I have to tell you the story of
- 00:45:53how the started do I have time for that
- 00:45:55yeah yeah let's hear it okay so because
- 00:45:57it's so relevant to the Banger tweet
- 00:45:59so my company we we started like a year
- 00:46:03and a half ago and right this is around
- 00:46:05the time that agents like people were
- 00:46:07talking about them but didn't have any
- 00:46:08clue this was let's say March
- 00:46:122023 and at this time I I was no one
- 00:46:16actually knew of my account I literally
- 00:46:18had I had been posting tweets and no one
- 00:46:21replied you know the classic zero views
- 00:46:23you know that's just what happens and
- 00:46:25then and I remember I saw someone else
- 00:46:28post something about Auto GPT and I saw
- 00:46:30it and I was like it looks pretty cool
- 00:46:32but I ignored it and then it came up
- 00:46:34again and I was like no I can't I cannot
- 00:46:36not ignore this like this seems
- 00:46:38something very interesting and i' been
- 00:46:39building actually like AI projects side
- 00:46:41project before this and I was like you
- 00:46:43know what let me like try this thing out
- 00:46:44and obviously I tried it and back then I
- 00:46:46was like dude this is insane agents AI
- 00:46:49is gonna be crazy so when I was like I
- 00:46:52just posted about it and like I didn't
- 00:46:54post anything crazy and I was like oh
- 00:46:56yeah this is thing is kind of cool it's
- 00:46:57pretty crazy and it like got like I
- 00:46:59think that was the first post that got
- 00:47:01over a thousand likes and I was like
- 00:47:02wait a minute wow and then I was like
- 00:47:04hold up hold a second then I saw this
- 00:47:06trend that people wanted to do something
- 00:47:09about like AI agents and it's
- 00:47:11interestingly enough I like thought back
- 00:47:13to an episode of am like my first
- 00:47:15million so funny and and I remember them
- 00:47:17talking about like there's sometimes you
- 00:47:19see like this opportunity and I was like
- 00:47:20dude I got to sit here and I got to do
- 00:47:22two things first I got to craft
- 00:47:24something I got to make a product that
- 00:47:25people want to use and I got to figure
- 00:47:27out the right Twitter thread and
- 00:47:30narrative and story to craft to get
- 00:47:31people on it so that weekend I spent the
- 00:47:34whole weekend building vzero of cognosis
- 00:47:37which was like our previous product in
- 00:47:39the meantime posting Twitter bangers and
- 00:47:43threads about how AI agents were going
- 00:47:46to change everyone's life and every
- 00:47:48single post was getting like a million
- 00:47:50views I'm not even exaggerating oh and I
- 00:47:52was like dude and and I was like okay
- 00:47:55and all I would be posting I was like it
- 00:47:56was was kind of Click baity I was like
- 00:47:58this is going to change your life and
- 00:47:59then getting like million view million
- 00:48:01views and I post the product like I was
- 00:48:03like Hey like here I built this thing
- 00:48:04for you people to go and try because I
- 00:48:06know from what you've been telling me um
- 00:48:09you don't want to go through GitHub and
- 00:48:10I and I posted out and it was literally
- 00:48:12built it in like three days and within
- 00:48:15like two days we got 50,000 users so my
- 00:48:19goodness that is so crazy the the
- 00:48:22craziest two weeks and the most
- 00:48:23stressful two weeks of my life and it
- 00:48:26start started all from how can I craft a
- 00:48:28banger tweet so I I will say that that
- 00:48:31was why it's so relevant and so funny it
- 00:48:33just shows how powerful uh writing well
- 00:48:36and writing with the right timing and
- 00:48:38structure given what's happening can
- 00:48:40potentially you know help you start a
- 00:48:42company so and with that that is an
- 00:48:44absolutely beautiful story to end on
- 00:48:46suly thank you very much for joining us
- 00:48:48today oh dude it it was a pleasure I I
- 00:48:50enjoyed it and hopefully my workflow is
- 00:48:53applicable to other people people can
- 00:48:54look at it and see that like hey using
- 00:48:57AI is just not that hard you just got to
- 00:48:59talk to the computer and it'll do stuff
- 00:49:02for you
- AI models
- model distillation
- prompt engineering
- model routing
- AI evaluation
- task optimization
- language models
- efficient AI use