2 Years of LLM Advice in 35 Minutes (Sully Omar Interview)

00:49:04
https://www.youtube.com/watch?v=nMORNaE_qe4

Resumen

TLDRIn a detailed interview with the CEO of Cognus, the use and categorization of AI language models are explored. The speaker, Soy Omar, explains the different tiers of language models based on their intelligence and cost, offering insight into how to choose the right model for specific tasks, such as coding, summarizing documents, and generating prompts. He discusses the process and benefits of integrating multiple AI models to leverage their unique strengths and weaknesses. Soy also touches on the importance of model distillation, where larger models are refined to execute tasks better in smaller, faster, and more cost-efficient models. He shares his hands-on experience with model evaluation, stressing the necessity of understanding their nuanced differences and carefully testing them for various applications. The future prospects of prompt engineering, where AI-generated prompts could soon replace traditional manual input, are forecasted, highlighting ongoing advances that make AI tools more intuitive and accessible. This narrative provides a glimpse into how AI models can be seamlessly incorporated into everyday use, showcasing both the current potentials and challenges in maximizing their efficiency.

Para llevar

  • πŸ€– AI models can be used in every aspect of daily life.
  • πŸ“Š Different AI models have distinct strengths and weaknesses.
  • πŸš€ Model distillation enhances task execution by refining large models.
  • πŸ› οΈ Prompt engineering can be optimized with AI-generated meta prompts.
  • 🌐 Model routing means selecting the best AI model for each task.
  • πŸ’‘ Understanding model capabilities is crucial for maximizing potential use.
  • πŸ”„ Iteration and testing improve AI model usage.
  • πŸ“ˆ Combining multiple AI models can improve productivity.
  • πŸ“œ Future developments might replace traditional prompt writing.
  • βš–οΈ Tiered categorization helps in selecting the right AI model.

CronologΓ­a

  • 00:00:00 - 00:05:00

    AI can enhance everyday tasks but has its limitations. The conversation discusses large language models (LLMs) and their nuanced differences, highlighting the difficulty in perfecting AI performance.

  • 00:05:00 - 00:10:00

    An interview with Suli Omar reveals insights into AI model ranking systems and prompt development. Omar uses meta prompts to create production-ready prompts and discusses performance distillation in AI models.

  • 00:10:00 - 00:15:00

    Omar's AI model framework categorizes models based on intelligence and cost, with tier three being cost-effective and used frequently. He provides examples like GPT 40 mini and Gemini Flash.

  • 00:15:00 - 00:20:00

    Tiered AI models serve different applications, with Omar using tier two models for tasks not requiring the highest intelligence. He often pairs models to optimize task performance.

  • 00:20:00 - 00:25:00

    AI models have specific strengths and weaknesses. Omar shares an example using Gemini for pinpointing details in text, while GPT 40 mini excels in reasoning, showcasing complementary model usage.

  • 00:25:00 - 00:30:00

    The future of AI models involves complex routing systems, though challenges exist in achieving the final performance percentage. Current practices involve sophisticated combinations of models.

  • 00:30:00 - 00:35:00

    Model distillation is powerful yet complex. Good results require robust data evaluation to avoid regressions when simplifying models for efficiency. The future will see better distillation tools and practices.

  • 00:35:00 - 00:40:00

    Omar demonstrates his prompt optimization process using various AI models, leveraging voice interaction for natural input. He iterates across models to refine prompts before applying them for specific tasks.

  • 00:40:00 - 00:49:04

    Test-driven development with AI involves using language models to write tests before code, providing checks for accuracy and reliability. Omar adapts this method to improve coding processes.

Ver mΓ‘s

Mapa mental

VΓ­deo de preguntas y respuestas

  • What is AI model distillation, as discussed in the interview?

    AI model distillation involves refining a larger model to perform specific tasks better with smaller models, improving speed and cost efficiency.

  • How are AI models categorized into tiers in this interview?

    Different AI models are categorized by tiers based on intelligence and cost: Tier 1 models are more intelligent but costly, Tier 2 are balanced, and Tier 3 are less intelligent but cheaper.

  • Does the speaker use various AI models for different tasks?

    Yes, the speaker uses different AI models for different tasks, evaluating their strengths and weaknesses for specific use cases like coding, summarizing documents, and generating prompts.

  • How does the speaker recommend improving the use of AI models?

    The speaker emphasizes building a deep understanding of model capabilities and continuously testing them under different conditions to maximize their potential use.

  • What is model routing, and what does the speaker think about it?

    Model routing involves automatically selecting the best model for a given task. It's seen as a future direction for optimizing AI model use but is currently complex to implement effectively.

  • Does combining multiple AI models improve productivity as suggested in the video?

    The speaker finds using different models for their strengths beneficial, for instance, using one model's structured output capabilities while leveraging another's reasoning skills.

  • What prediction about the future of prompt engineering is mentioned?

    The speaker predicts that traditional prompt writing will be replaced by AI-generated prompts, making the process more efficient and refined.

  • How does the speaker approach prompt generation and optimization?

    He uses an iterative, comparison-based approach to generate optimized prompts, often using multiple AI models to refine a prompt for the best outcome.

Ver mΓ‘s resΓΊmenes de vΓ­deos

ObtΓ©n acceso instantΓ‘neo a resΓΊmenes gratuitos de vΓ­deos de YouTube gracias a la IA.
SubtΓ­tulos
en
Desplazamiento automΓ‘tico:
  • 00:00:00
    it lets you use AI in basically every
  • 00:00:02
    nook and cranny of your day-to-day with
  • 00:00:04
    that model came out it actually opened
  • 00:00:06
    up a lot of things that you could do we
  • 00:00:08
    use a lot of different providers and
  • 00:00:09
    that's because what we've seen with our
  • 00:00:11
    internal evals is that they're all so
  • 00:00:15
    nuanced and different in like a variety
  • 00:00:17
    of different ways but you also start to
  • 00:00:19
    see where they lack you'll get to an AI
  • 00:00:22
    product you get it to 90% even 95% but
  • 00:00:25
    that last 51% is nearly impossible how
  • 00:00:27
    do you think about model distillation
  • 00:00:29
    it's very powerful but you have to be
  • 00:00:30
    very
  • 00:00:33
    [Laughter]
  • 00:00:35
    [Music]
  • 00:00:42
    careful I just had an amazing
  • 00:00:45
    conversation with soy Omar the CEO of
  • 00:00:47
    cognus the company behind auto. not only
  • 00:00:51
    is he one of the best llm practitioners
  • 00:00:53
    that I've met but you can tell he has a
  • 00:00:55
    really deep feeling for how these models
  • 00:00:57
    are actually working he speaks from
  • 00:00:58
    experience in this inter interiew we go
  • 00:01:00
    through his three tier system of
  • 00:01:02
    actually ranking language models he
  • 00:01:04
    shows us how he uses meta prompts to
  • 00:01:06
    develop his real prompts that he uses in
  • 00:01:08
    production he also shows us his cursor
  • 00:01:10
    development flow where he actually has
  • 00:01:12
    the language model write the test first
  • 00:01:15
    and then write the actual code and
  • 00:01:17
    finally he walks us through distilling
  • 00:01:18
    performance from large language models
  • 00:01:20
    to small language models without losing
  • 00:01:22
    performance let's jump into it and let's
  • 00:01:24
    see what wisdom our friend suly has to
  • 00:01:26
    share uh the reason why we're doing this
  • 00:01:28
    interview here is because I see all the
  • 00:01:30
    cool stuff you're sharing on Twitter and
  • 00:01:32
    I'm like this guy clearly has not only
  • 00:01:35
    like a checklist learned uh ability to
  • 00:01:39
    manipulate these models but I can tell
  • 00:01:41
    you you feel them like you really feel
  • 00:01:43
    how these things are actually going and
  • 00:01:44
    the personalities and the nuances and so
  • 00:01:46
    I want to dig in dig into that today
  • 00:01:48
    yeah well thank you and I think it just
  • 00:01:50
    comes from playing with these things
  • 00:01:52
    every day day in day out and using them
  • 00:01:56
    and pushing them to their limit and like
  • 00:01:58
    as cliche as it is is just like
  • 00:02:00
    sometimes you got to use them to Vibe
  • 00:02:01
    with them you know like like right so
  • 00:02:04
    yeah yeah yeah it's so true well I tell
  • 00:02:06
    you what I want to start off with um one
  • 00:02:08
    framework that I saw you document
  • 00:02:09
    recently which was your three tier model
  • 00:02:13
    of language models so tier one through
  • 00:02:15
    tier three so could you tell me like
  • 00:02:17
    starting at tier three what are those
  • 00:02:19
    and how do you work your way up yeah so
  • 00:02:21
    that's a it's a framework that I I mean
  • 00:02:23
    I don't even know if you want to call it
  • 00:02:24
    a framework but it's I like to
  • 00:02:26
    categorize it them into like based on
  • 00:02:28
    intelligence and price which is
  • 00:02:30
    correlated right like the less
  • 00:02:32
    intelligent models are going to be your
  • 00:02:33
    tier three models and then your more
  • 00:02:35
    expensive slower are going to be your uh
  • 00:02:38
    more intelligent model so the reason I I
  • 00:02:41
    thought of it in three tiers was because
  • 00:02:43
    of the application purposes so the way
  • 00:02:45
    that you use something like let's say 01
  • 00:02:47
    so that would be like a tier one is
  • 00:02:49
    different than the way that you use
  • 00:02:50
    something like Gemini flash which is
  • 00:02:52
    tier three um and that's because they
  • 00:02:54
    all provide different purposes one is
  • 00:02:56
    super cheap super fast the other one's
  • 00:02:58
    like really smart and really slow so I I
  • 00:03:00
    broke it down to those three tiers and
  • 00:03:02
    the third tier is basically what I like
  • 00:03:04
    to call just like the you know the
  • 00:03:07
    Workhorse the the ones that you're just
  • 00:03:09
    constantly using 247 and within that
  • 00:03:12
    category I think there was three main
  • 00:03:15
    models but it's kind of come down to two
  • 00:03:16
    for me personally so the first one is
  • 00:03:19
    the one that I think people are probably
  • 00:03:21
    more familiar with which is GPT 40 mini
  • 00:03:24
    now that model and is actually like I I
  • 00:03:30
    really really like it because it lets
  • 00:03:32
    you use AI in a way that previously you
  • 00:03:34
    couldn't like if you were to go back
  • 00:03:36
    let's say six months ago when we had no
  • 00:03:38
    cheap models you had let's say GPT 4 and
  • 00:03:41
    maybe even Claude
  • 00:03:43
    3.5 there was a lot of scenarios where
  • 00:03:46
    you couldn't just be like throwing that
  • 00:03:47
    at like random problems like you
  • 00:03:49
    couldn't just be like hey I have this
  • 00:03:51
    you know 20page document I want you to
  • 00:03:53
    go paragraph by paragraph and like
  • 00:03:55
    extract the details because
  • 00:03:57
    realistically like you know you're going
  • 00:03:59
    to be paying a lot of money so with that
  • 00:04:01
    model came out it actually opened up a
  • 00:04:03
    lot of like things that you could do so
  • 00:04:05
    that was the the the the first one with
  • 00:04:07
    was gp4 mini and then the other one that
  • 00:04:09
    I'm starting to really like is Flash so
  • 00:04:11
    Gemini flash is actually half the price
  • 00:04:14
    of GPT 40 mini and those are the tier
  • 00:04:17
    three because like I said they they give
  • 00:04:19
    you a lot of optionality and the
  • 00:04:21
    different things that you could do that
  • 00:04:22
    you couldn't do before it lets you use
  • 00:04:23
    AI in basically every nook and cranny of
  • 00:04:26
    your day-to-day right if whether it's
  • 00:04:29
    your coding and you wanted to look at
  • 00:04:31
    like you know 50 different files to
  • 00:04:33
    summarize to help another model for
  • 00:04:35
    example if you wanted to take a podcast
  • 00:04:38
    and you know look at you know when did
  • 00:04:41
    someone say a specific word in that
  • 00:04:42
    podcast right you're not going to go to
  • 00:04:44
    a bigger model so that was that's what I
  • 00:04:46
    call the tier three um and then the
  • 00:04:48
    second tier that I have is sort of like
  • 00:04:50
    the the middle obviously it's the middle
  • 00:04:52
    tier and this is where I like to slot in
  • 00:04:54
    the actual gp4 Cloud 3.5 Gemini Pro this
  • 00:04:57
    is where I think the majority of people
  • 00:04:59
    you use these models and and kind of get
  • 00:05:01
    the maximum usage out of them um and
  • 00:05:03
    then the last TI is obviously like the
  • 00:05:05
    01 o1 preview and then what I like to
  • 00:05:07
    classify as thinking models yeah that's
  • 00:05:09
    so cool so I want to dig in more into
  • 00:05:11
    the use case side so which use case
  • 00:05:13
    tasks are you doing tier two with and
  • 00:05:16
    then I know that o 01 in tier one is
  • 00:05:18
    going to be um it's not just oh I need
  • 00:05:20
    it smarter it's almost like a different
  • 00:05:22
    type of task you're going to ask it to
  • 00:05:23
    do so how do you differentiate between
  • 00:05:24
    those two right so the way that I like
  • 00:05:27
    to differentiate is I like like I pair
  • 00:05:31
    them so I will use 01 and I use this in
  • 00:05:33
    my dayto day it's like I'll go to Chad
  • 00:05:35
    gbt and if you just go and say hey like
  • 00:05:38
    to1 can you do this task for me one it's
  • 00:05:41
    going to take a little bit of time
  • 00:05:42
    you're probably going to hit some rate
  • 00:05:43
    limits because it's highly limited and
  • 00:05:44
    realistically you're not going to use
  • 00:05:46
    the model the way that I think it was
  • 00:05:48
    intended somewhat to be used so if you
  • 00:05:50
    say like hey how's it going like okay
  • 00:05:53
    sure you could use it like that but
  • 00:05:54
    realistically you're better off using
  • 00:05:56
    you know the the tier two so how I use
  • 00:05:58
    the tier 2 is actually the most I use it
  • 00:06:00
    the most um obviously everyone uses it
  • 00:06:02
    for coding whether it's CLA 3.5 gp4 um
  • 00:06:06
    using it for like function calling or to
  • 00:06:08
    call tool calling like it it is
  • 00:06:11
    obviously like a good balance between
  • 00:06:13
    intelligence and price um and and that's
  • 00:06:16
    kind of like what I use it the most
  • 00:06:18
    whether I'm writing whether I'm asking
  • 00:06:20
    it to like hey help me edit an email or
  • 00:06:22
    things like that I'm using those like
  • 00:06:23
    middle tier ones now how I actually use
  • 00:06:26
    that in tandem with 01 is I'll sort of
  • 00:06:29
    one of the cases I have is I'll come to
  • 00:06:31
    Chad GPT or Claude and I'll sit there
  • 00:06:34
    and I'll just create a giant
  • 00:06:35
    conversation about a specific topic so
  • 00:06:37
    let's say for example you know I'm deep
  • 00:06:41
    diving into a research topic and I want
  • 00:06:43
    to learn more about now I'm not going to
  • 00:06:45
    actually go straight into 01 because I
  • 00:06:47
    feel like one it's a bit slow what I'll
  • 00:06:49
    what I'll do is I'll start the topic
  • 00:06:50
    with gp4 or Claude and I'll like add
  • 00:06:54
    files because obviously I think right
  • 00:06:55
    now 01 doesn't support like files and
  • 00:06:57
    web search so there's a lot of
  • 00:06:58
    capabilities that o1 doesn't support and
  • 00:07:00
    what I like to call is the context
  • 00:07:01
    building so I will just go and build as
  • 00:07:04
    much context in this chat as I possibly
  • 00:07:06
    can or it could be you know in any
  • 00:07:08
    platform and and I'll sit there and
  • 00:07:09
    iterate I'll actually use voice mode as
  • 00:07:11
    well to sort of give context it's a lot
  • 00:07:13
    quicker and that's another workflow and
  • 00:07:15
    as soon as I have like you know let's
  • 00:07:18
    say like two to three pages worth of
  • 00:07:20
    documents I'll actually take that and
  • 00:07:23
    paste it into a chat with 01 or 01
  • 00:07:26
    preview and I'll say Hey you know do
  • 00:07:28
    this gigantic task for me so for example
  • 00:07:31
    I I'll give you one thing to use it for
  • 00:07:32
    is like I was using it to generate use
  • 00:07:35
    cases for my product and I was like okay
  • 00:07:37
    I want to generate use cases and I want
  • 00:07:39
    to understand you know what are some
  • 00:07:41
    potential um customer segments and icps
  • 00:07:44
    it's is like a pretty technical question
  • 00:07:46
    and if I were to just go to 01 and ask
  • 00:07:48
    it that it would have no context it
  • 00:07:49
    doesn't know what my product is it has
  • 00:07:51
    no clue what my product does who my
  • 00:07:53
    customers are and if I were to sit there
  • 00:07:55
    chat with it well I'm going to hit that
  • 00:07:56
    limit but if I go to Claude or Chad gbt
  • 00:07:58
    I can upload documents I can create this
  • 00:08:00
    basically a PDF and copy paste it into
  • 00:08:03
    01 and then I can say generate me you
  • 00:08:06
    know personas icpas it does a lot better
  • 00:08:08
    so that's sort of the the workflow and
  • 00:08:09
    use case that I have currently running
  • 00:08:11
    with like the the tier two and the tier
  • 00:08:13
    one models yeah yeah yeah one of the
  • 00:08:15
    ways that I found 01 works for me really
  • 00:08:17
    well is around actually D duplication so
  • 00:08:19
    if I have a long list of items that say
  • 00:08:21
    I've processed five different chunks
  • 00:08:23
    with the same type of workflow for each
  • 00:08:25
    chunk well I'm going to have a list of
  • 00:08:26
    duplicated items I give that whole thing
  • 00:08:28
    to 01 it's actually really good at D
  • 00:08:30
    duplicating and then I'll use one of the
  • 00:08:31
    tier 2 models to do the structured
  • 00:08:33
    output after that since 01 doesn't yet
  • 00:08:34
    support structured output and go from
  • 00:08:36
    there yeah that actually that's a good
  • 00:08:38
    one that's another thing I do as well is
  • 00:08:40
    I'll take 01 and give me like a long
  • 00:08:42
    verbos output and then take that and
  • 00:08:45
    turn it into structured data sets with
  • 00:08:46
    the uh the tier two and even sometimes
  • 00:08:49
    you could even get away with using that
  • 00:08:50
    with the tier three because it's you
  • 00:08:52
    don't even need to worry about the
  • 00:08:53
    output you're just like hey I want this
  • 00:08:55
    nicely formatted in whatever shape yeah
  • 00:08:57
    yeah yeah for sure so it sounds like
  • 00:08:59
    you're using different models across
  • 00:09:01
    different providers too for different
  • 00:09:04
    use cases or do you stick with one all
  • 00:09:06
    the time yes so we use a lot of
  • 00:09:09
    different providers and that's because
  • 00:09:11
    what we've seen with our internal evals
  • 00:09:13
    is that they're all so nuanced and
  • 00:09:17
    different in like a variety of different
  • 00:09:19
    ways so obviously the big one Gemini
  • 00:09:22
    multimodal right off the bat like
  • 00:09:24
    anything to do with videos or audios
  • 00:09:28
    I'll go you know dive straight into that
  • 00:09:30
    and and kind of use Gemini but you also
  • 00:09:33
    start to see where they lack so for
  • 00:09:37
    example a really interesting one is
  • 00:09:39
    Gemini models are really good at needle
  • 00:09:41
    in the Hast stack and so if you say hey
  • 00:09:44
    I want you to find one or two pieces of
  • 00:09:46
    information in this you know giant long
  • 00:09:48
    piece of text or video it's actually
  • 00:09:50
    really good but then I started to notice
  • 00:09:52
    that something like GPT 40 mini is a
  • 00:09:55
    little bit of a little bit better
  • 00:09:57
    reasoning over that so if I give it a
  • 00:09:59
    long piece of context and I say hey I
  • 00:10:01
    want you to sort of understand the
  • 00:10:03
    context of it I saw I found the GPT 40
  • 00:10:05
    mini is a little bit better so you start
  • 00:10:07
    to see where one model does better than
  • 00:10:10
    the other model in specific area so like
  • 00:10:12
    another example is Claude 3.5 and GPT 40
  • 00:10:15
    now Claude is obviously everyone loves
  • 00:10:17
    that model it's a really good model but
  • 00:10:19
    one thing it's absolutely horrible at is
  • 00:10:21
    tool use with structured outputs and
  • 00:10:24
    you'll start to see this if you it's a
  • 00:10:26
    very complex tool like I want you to
  • 00:10:29
    create the very deep like a a nested
  • 00:10:31
    Json a very you know long structured
  • 00:10:34
    output like a very large amount of the
  • 00:10:37
    time it fails and it gives you XML and
  • 00:10:39
    it just breaks all your parsers whereas
  • 00:10:41
    GPT 40 mini does a lot better job but
  • 00:10:44
    then the caveat is that gp4 o mini is
  • 00:10:48
    not as good at actually like thinking
  • 00:10:50
    through the problem and acting as an
  • 00:10:52
    assistant so there's always these like
  • 00:10:53
    tiny trade-offs that you don't really
  • 00:10:55
    like notice one of the that we did was
  • 00:10:58
    we set up a
  • 00:10:59
    like one of the use case was to get
  • 00:11:01
    around that was we set up Claude and GPT
  • 00:11:03
    40 mini to work together where the tool
  • 00:11:06
    use for Claude would be to call GPT 40
  • 00:11:09
    mini and we basically system where
  • 00:11:13
    Claude could orchestrate GPT 4 mini to
  • 00:11:15
    create the structured output so it would
  • 00:11:17
    say please do this so the user would say
  • 00:11:19
    I want this task all GP all Claude would
  • 00:11:22
    do was relay that information to GPT 40
  • 00:11:25
    mini 40 mini creates the structured
  • 00:11:27
    output and then I guess return so that
  • 00:11:28
    was like another use of like how we mix
  • 00:11:30
    and match so many models across
  • 00:11:32
    different use cases yeah isn't it wild
  • 00:11:35
    how all these little mini Vibe tricks we
  • 00:11:38
    have to kind of like hack together in
  • 00:11:40
    the early days of llms here and then I
  • 00:11:42
    think back to how far we've already come
  • 00:11:43
    like because even like you know like
  • 00:11:45
    January of 23 we're dealing with like
  • 00:11:47
    4,000 token context limits and gbt 3.5
  • 00:11:50
    and all the hacks that we had then we've
  • 00:11:52
    upgraded from them now but we still have
  • 00:11:54
    a bunch of hacks like the ones you're
  • 00:11:55
    talking about and so it just makes me
  • 00:11:57
    think we're never going to get rid of
  • 00:11:58
    the hacks and they're always going to to
  • 00:11:59
    be there for a long time I would say so
  • 00:12:02
    too because yeah like you're right it's
  • 00:12:05
    funny looking back at it the hacks that
  • 00:12:07
    you used in 2023 were so different you
  • 00:12:09
    were hacking around context window and
  • 00:12:11
    now you're hacking around well tool use
  • 00:12:14
    which didn't even exist a year ago right
  • 00:12:16
    or like you know a year and a half ago
  • 00:12:17
    so I I agree with you that we're always
  • 00:12:19
    going to be Min maxing as a user of
  • 00:12:22
    multiple models you're going to be Min
  • 00:12:23
    maxing trying to figure out for your use
  • 00:12:25
    case for your product for your company
  • 00:12:28
    where can I you know masch these
  • 00:12:29
    together so that I get the best possible
  • 00:12:31
    outcome for my users um and I know a lot
  • 00:12:34
    of people have and I'm curious what you
  • 00:12:35
    think a lot of people have spoken about
  • 00:12:37
    like model routers and how you know at
  • 00:12:39
    the end of the day like a model is just
  • 00:12:40
    going to pick it but my my personal
  • 00:12:43
    opinion is I I think that it's going to
  • 00:12:45
    cause a lot of unintended side like you
  • 00:12:47
    know side effects but I'm curious what
  • 00:12:49
    you think on like this whole idea of
  • 00:12:50
    like model routing because you know
  • 00:12:51
    we're talking what we're basically doing
  • 00:12:53
    we're internally with code model routing
  • 00:12:55
    but I'm curious what you think so
  • 00:12:57
    whenever I get asked a question like
  • 00:12:58
    this I think is there behavior in
  • 00:12:59
    practice that tells me um what the
  • 00:13:02
    prediction should be and you just
  • 00:13:03
    describe basically you're doing model
  • 00:13:05
    routing on your own like in in in and of
  • 00:13:08
    itself so that tells me yes model
  • 00:13:10
    routing will be a thing and I do still
  • 00:13:12
    think that fine-tuning models and having
  • 00:13:14
    Boke small models is still too much
  • 00:13:16
    overhead like it's really hard to do
  • 00:13:18
    that and manage them and do with them
  • 00:13:19
    all right now all that is going to get
  • 00:13:21
    so much easier so I would imagine that
  • 00:13:23
    not only will we have model routing for
  • 00:13:25
    task specific things against like some
  • 00:13:26
    of the big ones where you have Vibe
  • 00:13:28
    based feels whether regards to
  • 00:13:30
    structured output or tool use or
  • 00:13:31
    whatever it may be but then also um for
  • 00:13:34
    task specific things um I will
  • 00:13:36
    absolutely do model routing so um I'm a
  • 00:13:38
    fan I think it's hard I think it will be
  • 00:13:40
    the future we're not quite there yet
  • 00:13:42
    though that's for sure gotcha yeah like
  • 00:13:45
    my my my sentiment there was that there
  • 00:13:47
    and it could be just because the models
  • 00:13:49
    just where we're at right now what I've
  • 00:13:51
    noticed is and I'm sure you've seen the
  • 00:13:53
    same is where you'll get to an AI
  • 00:13:55
    product you get it to 90% even 95% but
  • 00:13:59
    that last 5% is last 10 5 10% is nearly
  • 00:14:03
    impossible I find like it even you can
  • 00:14:05
    run all the evals you want you can run
  • 00:14:07
    all the benchmarks getting that last 10%
  • 00:14:10
    and I my thought process there is
  • 00:14:13
    that if you have the model sort of
  • 00:14:16
    choosing other models that adds to the
  • 00:14:19
    variance so it causes a lot more
  • 00:14:22
    potential like you know that that's kind
  • 00:14:24
    of where my thinking is and that could
  • 00:14:25
    just be because like we're early like
  • 00:14:27
    realistically we're so early models have
  • 00:14:30
    you know multiple generations to get
  • 00:14:31
    better uh so that was my thought was
  • 00:14:33
    that maybe in the future but right now
  • 00:14:36
    probably not because it's it's so hard
  • 00:14:39
    to get a product in specifically like
  • 00:14:42
    llms into production where you're
  • 00:14:44
    handling every potential Edge case uh in
  • 00:14:47
    a manner that gives you as high of an
  • 00:14:49
    accuracy as you can and adding models
  • 00:14:51
    that you might not have an eval 4 could
  • 00:14:55
    give you an output that you didn't
  • 00:14:56
    expect yeah yeah totally uh well well I
  • 00:14:59
    tell you what one of the other
  • 00:15:00
    interesting things that came up during
  • 00:15:01
    research was your opinion on what is
  • 00:15:04
    kind of becoming known as model
  • 00:15:05
    distillation so you have a really really
  • 00:15:07
    good model you perfect the output from
  • 00:15:09
    there but then you realize wow I can
  • 00:15:11
    actually come up with a little bit of a
  • 00:15:12
    better prompt here and give it to a
  • 00:15:14
    smaller model so that you have it's
  • 00:15:16
    faster and it's cheaper so can you talk
  • 00:15:18
    me or walk me through how do you think
  • 00:15:19
    about model distillation in your own
  • 00:15:21
    workflow yeah so that's a something I
  • 00:15:23
    think about a lot and it's one of those
  • 00:15:26
    things where you need to be very careful
  • 00:15:28
    because it's very it's very powerful but
  • 00:15:30
    you have to be very careful because it
  • 00:15:32
    requires a lot of work and the reason it
  • 00:15:35
    needs a lot of work is
  • 00:15:37
    because you need to have a a good data
  • 00:15:40
    Pipeline and understand what you're
  • 00:15:42
    distilling so one of the things and
  • 00:15:44
    mistakes I made previously with the
  • 00:15:45
    product was that we went we had GPT 40
  • 00:15:49
    this was actually before GPT 40 it was
  • 00:15:50
    gp4 turbo and we used it and it was slow
  • 00:15:54
    and we're like hey let's distill that to
  • 00:15:55
    3.5 open AI has a has a really nice um
  • 00:15:59
    way to do it so we did that and the
  • 00:16:02
    problem was that we didn't have good
  • 00:16:04
    enough evals we didn't have a good
  • 00:16:05
    enough data set so as the potential you
  • 00:16:09
    know the various areas grew that people
  • 00:16:11
    could use the product we would notice
  • 00:16:13
    okay we have to revert back to gp4
  • 00:16:15
    because 3.5 was at that time not good
  • 00:16:18
    enough now where I do see distillation
  • 00:16:20
    in our workflow is when you have a
  • 00:16:22
    defined eval set you have like all your
  • 00:16:24
    benchmarks and you have a very good data
  • 00:16:27
    pipeline where you can say okay
  • 00:16:29
    in this 500 example set I'm using Claude
  • 00:16:33
    3.5 Sonet or or you know 0an for example
  • 00:16:36
    I have my data set and you can use a
  • 00:16:39
    bunch of different there's a lot of
  • 00:16:40
    different companies that provide you
  • 00:16:41
    with like ways to manage your impr
  • 00:16:43
    prompts and evals whether it's Brain
  • 00:16:45
    Trust or Langs Smith and then you can
  • 00:16:48
    very accurately uh detect and determine
  • 00:16:51
    the accuracy of the distilled model then
  • 00:16:54
    10 out of 10 times I would use it um and
  • 00:16:57
    the easy and it's actually really easy
  • 00:16:58
    like to actually distill the model down
  • 00:17:01
    it's like it's like it's a single API
  • 00:17:03
    call the challenging part is making sure
  • 00:17:06
    that you don't regress your product when
  • 00:17:08
    you do uh the distillation but I think
  • 00:17:11
    it's one of those things that it's going
  • 00:17:13
    to become more and more apparent as the
  • 00:17:15
    tooling around distillation becomes like
  • 00:17:17
    better I know there's a couple companies
  • 00:17:19
    working on it like open pipe is one of
  • 00:17:20
    them um and I know open AI straight up
  • 00:17:23
    offers you that so I think as the
  • 00:17:25
    tooling gets better you're going to see
  • 00:17:27
    this pattern in production
  • 00:17:29
    of companies launching with the biggest
  • 00:17:31
    best model they collect a bunch of data
  • 00:17:33
    they have a good e set and engineering
  • 00:17:35
    team to support that then they go and
  • 00:17:37
    they distill it to whether open you know
  • 00:17:39
    GPT 40 mini or an open source model yeah
  • 00:17:42
    that's beautiful my favorite line with
  • 00:17:43
    that is the whole make it work make it
  • 00:17:45
    right make it fast and so it's like look
  • 00:17:47
    you're going to use the biggest one to
  • 00:17:48
    start us off but then you're going to
  • 00:17:49
    make it fast eventually and go from
  • 00:17:51
    there um this is awesome I tell you what
  • 00:17:54
    though so I know you're a practical
  • 00:17:56
    person I would love to jump into like
  • 00:17:58
    you actually showing us some of the ways
  • 00:17:59
    that you use these tools and I think a
  • 00:18:01
    really cool starting off point would be
  • 00:18:03
    I know that you're a fan of prompt
  • 00:18:05
    optimizers or like meta prompt writing
  • 00:18:08
    and so yes because you had you had a
  • 00:18:11
    tweet and literally said pretty good
  • 00:18:13
    chance you won't be prompting from
  • 00:18:14
    scratch in two to three months so I
  • 00:18:17
    would love to see the way you kind of
  • 00:18:18
    prompt engineer your way from like an
  • 00:18:20
    idea to like I'm going to go use this
  • 00:18:23
    thing okay yeah hopefully my prediction
  • 00:18:26
    uh ages well because I feel like it's
  • 00:18:28
    been a month since I said that and I
  • 00:18:29
    don't know if we're two to three months
  • 00:18:31
    away from it but okay let me yeah I just
  • 00:18:35
    to add some context I do a lot of this
  • 00:18:36
    sort of meta prompting where I'll come
  • 00:18:39
    in with a problem what is what is meta
  • 00:18:40
    prompting let's start there you come in
  • 00:18:42
    with a general idea of what you're
  • 00:18:44
    trying to do you have a problem that
  • 00:18:46
    you're trying to solve like
  • 00:18:47
    realistically if you're coming in you
  • 00:18:48
    don't know what problem you have that
  • 00:18:49
    you're trying to solve with an AI it's
  • 00:18:51
    it's sort of useless so an example would
  • 00:18:53
    be um the other day I was trying
  • 00:18:56
    to get uh one of the models to write
  • 00:18:58
    like me which to this day I I cannot for
  • 00:19:02
    whatever reason and I was like I came
  • 00:19:05
    into it and I came into Chad GPT and I
  • 00:19:07
    had all my examples and I was like okay
  • 00:19:09
    what do I write and I normally I would
  • 00:19:11
    write something like you know you you
  • 00:19:12
    write like a basic promp structure and
  • 00:19:15
    the reality is that prompt is probably
  • 00:19:16
    not that good so what meta prompting or
  • 00:19:19
    what I like to think about this work
  • 00:19:20
    this idea is that you come in with an
  • 00:19:21
    idea hey I want to have an AI right like
  • 00:19:24
    me I have examples and then I just give
  • 00:19:27
    that to 01 or claw and I say please
  • 00:19:30
    create the prompt for me and that's sort
  • 00:19:31
    of what I like to think of like this I
  • 00:19:34
    come in with a a rough idea of what I'm
  • 00:19:35
    trying to do I don't really know
  • 00:19:37
    specifically how to optimize it I'll go
  • 00:19:39
    to these models and say hey like
  • 00:19:40
    actually give me this promp structure
  • 00:19:42
    and it does a pretty good job so that's
  • 00:19:43
    kind of the the rough idea of how it
  • 00:19:45
    works but let's let me should we just
  • 00:19:47
    hop into like yeah I would love to jump
  • 00:19:49
    into it if you could share your screen
  • 00:19:50
    and then are you using just a regular
  • 00:19:52
    chat interface or are you going to
  • 00:19:54
    anthropics workbench and doing their
  • 00:19:56
    prompt Optimizer I I just used the chat
  • 00:19:59
    interface because I feel like the prompt
  • 00:20:01
    I mean people some people do use it I
  • 00:20:03
    and I think you can start with it um but
  • 00:20:06
    I just find it easier because I can
  • 00:20:07
    iterate a lot better I can say hey start
  • 00:20:10
    like this and do that so let's actually
  • 00:20:12
    do it but I I want to start and say do
  • 00:20:14
    you have some sort of task that like we
  • 00:20:16
    should we start we should start with
  • 00:20:17
    like a rough idea because I like do you
  • 00:20:19
    have any like what what's the task we
  • 00:20:21
    could Dem let's do a straightforward one
  • 00:20:24
    let's do what I guess I'll give you a
  • 00:20:26
    few options you tell me what you think
  • 00:20:27
    is best we could do the classification
  • 00:20:29
    one which is very standard hey I have
  • 00:20:30
    some data sources or can you please
  • 00:20:32
    label them for me um we could do either
  • 00:20:36
    like uh unstructured to structured
  • 00:20:38
    extraction so like extracting insights
  • 00:20:40
    from a piece of text or we could do uh
  • 00:20:43
    idea generation that's always a fun one
  • 00:20:45
    too okay let's do the let's do the
  • 00:20:49
    extracting text one and I think that's a
  • 00:20:51
    good one so let's say we I like to
  • 00:20:53
    always preface it with like the problem
  • 00:20:54
    or what we're trying to do so again what
  • 00:20:56
    I like to come into it is like all right
  • 00:20:57
    I have a problem I'm trying trying to do
  • 00:20:59
    a specific task and usually this is like
  • 00:21:01
    my blank State slate starting point so
  • 00:21:03
    let's say the task that I'm trying to do
  • 00:21:05
    is I have a large piece of text and I
  • 00:21:08
    want to you know turn that piece of text
  • 00:21:10
    into something else some sort of
  • 00:21:11
    structured output and it's it's funny
  • 00:21:14
    because a lot of people say like oh is
  • 00:21:15
    it complicated it's really like I just
  • 00:21:18
    come to Chad GPT and I or or claw and I
  • 00:21:20
    basically say that so the way that I go
  • 00:21:22
    is I'll say you know you could use
  • 00:21:24
    Claude or or chagy PT I haven't found
  • 00:21:26
    which one is really better again and
  • 00:21:29
    this is kind of going back to my
  • 00:21:30
    original workflow is what I'll do is
  • 00:21:32
    I'll actually start with gp4 or Claude
  • 00:21:34
    and I'll get like a rough idea for a
  • 00:21:36
    prompt and I'll copy that and I'll give
  • 00:21:38
    it to 01 and then I'll start to compare
  • 00:21:40
    across all three to see which one like
  • 00:21:42
    makes the most sense so let's say for
  • 00:21:44
    example in this one I am grabbing
  • 00:21:48
    transcripts from podcasts and I want to
  • 00:21:50
    know like you know I want a nice like
  • 00:21:54
    structured output for all of the key
  • 00:21:57
    exciting moments let's say that that
  • 00:21:58
    like the problem space so now you could
  • 00:22:00
    come in and you could create a prompt
  • 00:22:01
    and says okay given this video I want
  • 00:22:04
    you to do this or I come to CL and say
  • 00:22:05
    look like and actually the other
  • 00:22:07
    workflow that I I wish I could demo is I
  • 00:22:09
    use voice a lot so I don't know if um if
  • 00:22:12
    you use voice a lot but I've notice that
  • 00:22:16
    with voice here I don't use it a ton
  • 00:22:17
    yeah it hasn't entered my workflow yet
  • 00:22:19
    but I'm I'm voice curious so I want to
  • 00:22:21
    try actually see this let's see this
  • 00:22:23
    okay so I have I have something here I
  • 00:22:25
    want to show you the whole workflow that
  • 00:22:27
    I use so that I
  • 00:22:30
    so and let me know if you need a
  • 00:22:31
    transcript I have one handy for us
  • 00:22:34
    actually yeah could you could you toss
  • 00:22:35
    me it there and then I will use it I'll
  • 00:22:38
    copy paste this okay let me know when
  • 00:22:40
    you have the transcript and then mm
  • 00:22:43
    small plug this is MFM Vault website I
  • 00:22:45
    put together that does insight
  • 00:22:47
    extraction from my first milon there we
  • 00:22:48
    go Okay cool so let's say our goal is to
  • 00:22:51
    extract insights now my workflow is I
  • 00:22:54
    have a tool that transcribes it so I
  • 00:22:56
    think it works so let's say I'll just
  • 00:22:58
    exactly show you how to do it okay hey
  • 00:23:01
    uh I need a bit of help creating a
  • 00:23:02
    prompt uh for a uh use case so what
  • 00:23:05
    we're doing right now is taking podcast
  • 00:23:08
    transcripts and trying to extract all of
  • 00:23:10
    the key moments key insights so I need
  • 00:23:13
    you to create a a nice uh prompt that
  • 00:23:15
    will you know help us do that and I'll
  • 00:23:18
    I'll give I'm going to put in the prompt
  • 00:23:19
    as well later on the actual transcript
  • 00:23:21
    but I need you to create the prompt SL
  • 00:23:22
    system
  • 00:23:23
    prompt so boom so that's that's actually
  • 00:23:26
    sort of how I do it it's there's no
  • 00:23:28
    signs to it and I I'll sit there and
  • 00:23:30
    kind of like here and I'll copy this and
  • 00:23:32
    I'll actually do this I'll go into Chad
  • 00:23:33
    GPT I'll paste it and I'll actually also
  • 00:23:34
    place it into
  • 00:23:36
    CLA and it's going to go and it's going
  • 00:23:39
    to give me like a uh starting
  • 00:23:42
    point and so right off the bat like if
  • 00:23:44
    you're maybe not as good at prompting or
  • 00:23:47
    you're new to prompting like you can
  • 00:23:50
    read this like obviously if you're more
  • 00:23:51
    experienced and you kind of know like
  • 00:23:54
    what you're doing these kind of prompts
  • 00:23:55
    are like pretty obvious but for a lot of
  • 00:23:57
    people they'll come in and be like okay
  • 00:23:59
    cool I have a a good starting point so
  • 00:24:01
    then all I'll do is I'll look at this
  • 00:24:02
    say okay the following is a podcast
  • 00:24:04
    transcript identify so and I'll compare
  • 00:24:07
    it to here so right off the bat I don't
  • 00:24:09
    know if you which one you think is
  • 00:24:10
    better but I'm looking at this and I
  • 00:24:12
    like the claw output better um little
  • 00:24:15
    bit
  • 00:24:16
    more uh what's it called clear Direction
  • 00:24:19
    so I'll actually copy this and I'll be
  • 00:24:22
    like okay we have a rough outline I
  • 00:24:24
    liked the first pass I liked the one
  • 00:24:27
    from
  • 00:24:29
    uh cloth I'll take that and I'll go back
  • 00:24:31
    to Chad GPT and I'll open up a new tab
  • 00:24:34
    and then I'll
  • 00:24:35
    say let's go to o1 preview so then I'll
  • 00:24:37
    actually do the same thing um I'll say
  • 00:24:40
    and I'll actually give it more context
  • 00:24:41
    so I'll say something along the lines of
  • 00:24:43
    and again I I'll go back to the voice
  • 00:24:45
    mode here I'll say hey um you're going
  • 00:24:47
    to help me optimize a prompt so I
  • 00:24:49
    already got another AI model to give me
  • 00:24:51
    a rough idea for this prompt I want you
  • 00:24:53
    to look at it and tell me if there's any
  • 00:24:54
    areas in the prompt that we could
  • 00:24:55
    improve um so I'll give you the prompt
  • 00:24:57
    and I'll actually give you the prompt
  • 00:24:58
    that I gave to the I AI that generated
  • 00:25:00
    this
  • 00:25:01
    prompt so it's going to go and then I'm
  • 00:25:04
    going to go like this so this is sort of
  • 00:25:06
    here you
  • 00:25:08
    know
  • 00:25:10
    original prompt to AI I'll paste that in
  • 00:25:13
    a sec um what's amazing is just how you
  • 00:25:17
    speak to it just like a human like it's
  • 00:25:19
    not complicated it's literally just
  • 00:25:20
    being clear in your
  • 00:25:22
    directions it's something
  • 00:25:25
    that I recently started to do and
  • 00:25:29
    I think it's a very a lot of people talk
  • 00:25:31
    to the AI as if it's not a human but
  • 00:25:33
    they perform the best when you just
  • 00:25:34
    speak to it naturally and I found that
  • 00:25:37
    voice is the best modality to do that in
  • 00:25:39
    because it's very hard to sound robotic
  • 00:25:42
    when you're talking to like the the chat
  • 00:25:45
    it's like you have to just talk
  • 00:25:46
    naturally um and then I found that it's
  • 00:25:48
    it's also a lot faster like if I were to
  • 00:25:50
    sit here and type that it would take me
  • 00:25:51
    a lot so here I'll go here I'll T I'll
  • 00:25:54
    paste this um original prompt you know
  • 00:25:58
    and then I'll say Okay cool so I like
  • 00:26:00
    that one and now this is the second pass
  • 00:26:02
    and now this is where again kind of
  • 00:26:04
    going back to the workflow that I use
  • 00:26:05
    right is I'll come in here and iterate
  • 00:26:07
    with voice on this specific subset of a
  • 00:26:09
    problem which is generating this kind of
  • 00:26:11
    like like a prompt we sat there with
  • 00:26:13
    gp24 we sat there with Claude iterated a
  • 00:26:16
    bit um and then I'm I'm like okay I have
  • 00:26:18
    a rough idea this prompt looks somewhat
  • 00:26:20
    good and then I'll come back to 01
  • 00:26:22
    preview and I'll say okay cool I want
  • 00:26:24
    you to optimize this and I haven't found
  • 00:26:27
    like I don't have a real scientific
  • 00:26:29
    method to which one is best because I
  • 00:26:31
    just kind of sit there and and this is
  • 00:26:32
    kind of where I have like a good first
  • 00:26:34
    generation of the prompt realistically
  • 00:26:36
    I'll put this into production I'll write
  • 00:26:38
    a couple of like you know uh evals I'll
  • 00:26:40
    say okay how does this actually perform
  • 00:26:42
    and then kind of iterate back but this
  • 00:26:43
    is sort of my starting point so we'll
  • 00:26:46
    let this go
  • 00:26:49
    um okay so
  • 00:26:53
    here and then it gives me some
  • 00:26:56
    things can you please generate the new
  • 00:27:00
    prompt now all right cool it gives me
  • 00:27:03
    the revised prompt so it it did gives
  • 00:27:07
    you finally the answer yeah and and sort
  • 00:27:10
    of you can see here and you can OB say
  • 00:27:13
    here this is just for the sake of this
  • 00:27:14
    and now what I'll do is I will take this
  • 00:27:17
    and then I will actually go to and this
  • 00:27:19
    is my full workflow we can use any model
  • 00:27:22
    but let's say we're going to use um you
  • 00:27:25
    have a preference of which model you
  • 00:27:26
    want to test out the actual uh
  • 00:27:29
    transcription we can actually do I'd
  • 00:27:31
    love to hear which one you think and why
  • 00:27:33
    and let's just test it out let's let's
  • 00:27:36
    test it out so now we go to studio so
  • 00:27:38
    and you see what I mean it's like
  • 00:27:40
    there's all these different models I'll
  • 00:27:43
    go to Studio which is Gemini now we're
  • 00:27:45
    going to go to Gemini which I found so
  • 00:27:48
    specifically Gemini Pro uh better at
  • 00:27:51
    sorts of these these sort of tasks um
  • 00:27:53
    and now I'm here with Gemini Pro which
  • 00:27:55
    I'm going to take and grab the prompt I
  • 00:27:58
    crafted with 01 put it into the system
  • 00:28:01
    prompt of uh what's it called Gemini Pro
  • 00:28:04
    paste in the the transcript and we'll
  • 00:28:06
    see how it goes all right beautiful yeah
  • 00:28:08
    that sounds
  • 00:28:10
    great all right let's copy this
  • 00:28:14
    here okay this is how the sausage is
  • 00:28:17
    made yeah it's it's this is how I like
  • 00:28:20
    to think of like the first generation of
  • 00:28:21
    a promp or I'm not really sure where I'm
  • 00:28:24
    starting off with obviously like is this
  • 00:28:26
    something that I would use in production
  • 00:28:27
    probably not because you want to test it
  • 00:28:29
    out and have a lot of back and forth um
  • 00:28:31
    but okay cool can I is there a way to
  • 00:28:34
    copy paste the transcript you're just
  • 00:28:36
    gonna have to select all down at the
  • 00:28:38
    bottom
  • 00:28:39
    there that would be nice to copy the
  • 00:28:41
    transcript actually I think I might add
  • 00:28:43
    that feature in there yeah it's a let me
  • 00:28:46
    see if I can just this I'm
  • 00:28:50
    on all
  • 00:28:53
    right cool now we go grab this okay and
  • 00:28:58
    then I'll obviously like do a second
  • 00:28:59
    pass to make sure that this actually
  • 00:29:01
    makes sense key moments obviously yeah
  • 00:29:05
    okay this looks pretty good time stamp
  • 00:29:08
    three to takeaways extract one sentence
  • 00:29:11
    discussion themes theme name
  • 00:29:14
    um yeah like Okay cool so here I'll
  • 00:29:17
    paste this in and we'll let it we'll let
  • 00:29:19
    it run here so I'm using Gemini
  • 00:29:21
    Pro um all right 177,000 tokens and and
  • 00:29:24
    for for people who are curious like
  • 00:29:26
    Gemini Pro
  • 00:29:28
    I I talked about this recently is that a
  • 00:29:30
    lot of models can't actually reason over
  • 00:29:32
    a large context like um but for
  • 00:29:36
    something like Gemini Pro anything under
  • 00:29:37
    100K tokens it's uh it's pretty good at
  • 00:29:40
    like being able to synthesize a a
  • 00:29:42
    relatively intelligent answer so
  • 00:29:45
    here okay that's really
  • 00:29:49
    cool and now yeah key moments how you
  • 00:29:52
    leverage CrossFit I'm actually curious
  • 00:29:55
    to like see how it this would do against
  • 00:29:57
    like you know other benchmarks because
  • 00:29:59
    we don't really know if this is a good
  • 00:30:00
    output or not and that's where the whole
  • 00:30:02
    point of evals is but there you go you
  • 00:30:05
    have how I went from an idea to
  • 00:30:10
    generating like a full I guess optim air
  • 00:30:13
    quote here optimize prompt and the
  • 00:30:16
    reason for that is just like for me to
  • 00:30:18
    sit here and write this probably would
  • 00:30:20
    have taken like an hour hour and a half
  • 00:30:22
    maybe like give or take depending on how
  • 00:30:24
    good you are but you know we just did it
  • 00:30:26
    live in whatever 10 minutes so yeah
  • 00:30:28
    that's super super cool I love that um
  • 00:30:31
    so then out of curiosity what are you
  • 00:30:33
    using for prompt management so I saw a
  • 00:30:36
    um a tweet by the CEO of prompt layer
  • 00:30:39
    Jared and he's like yeah I see everybody
  • 00:30:40
    they go through the same they go through
  • 00:30:42
    the same world first their prompts are
  • 00:30:44
    just hard-coded in their code and then
  • 00:30:46
    second their prompts are hard-coded in
  • 00:30:47
    text files but they're still in their
  • 00:30:49
    code base and then third you actually go
  • 00:30:50
    to a prompt manager what what are you
  • 00:30:52
    using for prompt management so for
  • 00:30:55
    that's an interesting one we obviously
  • 00:30:57
    we use GitHub for our our our prompts
  • 00:31:01
    yeah so we use a lot of a couple of
  • 00:31:03
    different things maybe maybe we're not
  • 00:31:05
    like we're not prompt managing correctly
  • 00:31:08
    but we just have our prompts that we
  • 00:31:11
    store in Langs Smith and sort of I'll
  • 00:31:14
    just have data sets and I'll compare
  • 00:31:18
    that prompt to that data set so for
  • 00:31:20
    example we have a giant data set of like
  • 00:31:22
    a thousand examples that I I run or test
  • 00:31:25
    against different models different
  • 00:31:26
    prompts and that prompt is just like
  • 00:31:29
    stored you know in in the data set and
  • 00:31:33
    then whenever I want to change the
  • 00:31:34
    prompt I'll actually change it and
  • 00:31:36
    duplicate data set paste in the new
  • 00:31:38
    prompt and like my version so to speak
  • 00:31:41
    so the actual prompt stays in my
  • 00:31:44
    codebase with the latest version of like
  • 00:31:46
    this is the the source of Truth and all
  • 00:31:49
    previous other versions are different
  • 00:31:51
    data sets where I can see how they
  • 00:31:53
    perform so for example if I want to go
  • 00:31:55
    back to a prompt that was like you know
  • 00:31:56
    let's say from a week ago I just look at
  • 00:31:58
    the data set that was from a week ago
  • 00:32:00
    and I can see the prompt is there and I
  • 00:32:01
    can also see how it performs so that's
  • 00:32:03
    how I manage uh like an inversion it I'm
  • 00:32:06
    not sure if that right approach but
  • 00:32:07
    that's how way I do it sir so in your
  • 00:32:09
    code is the prompt that's being called
  • 00:32:11
    is it actually in your code or are you
  • 00:32:13
    calling out to langub and Lang Smith
  • 00:32:14
    every single time it's in the code so
  • 00:32:17
    the the our code it's in GitHub and the
  • 00:32:19
    nice part is because it's just all
  • 00:32:21
    Version Control like I could look at the
  • 00:32:23
    git history and I can actually see okay
  • 00:32:26
    this person changed this line is as well
  • 00:32:28
    which is nice so I have the line by line
  • 00:32:30
    version controlled from git um and then
  • 00:32:32
    if I want to see the full prompt I can
  • 00:32:34
    look back at like a you know the the
  • 00:32:37
    data management tool yeah that's very
  • 00:32:38
    cool um I tell you what I had one more
  • 00:32:40
    demo on here that I was like this would
  • 00:32:42
    be so cool if so Su would show us how we
  • 00:32:44
    use this um it's a cursor one actually
  • 00:32:46
    so I saw that you tweet you you said do
  • 00:32:49
    I actually have the llm write the test
  • 00:32:52
    first then the code it helps a ton which
  • 00:32:55
    that's a framework I don't see too many
  • 00:32:57
    people doing of course there's test
  • 00:32:58
    driven development but like not in
  • 00:33:00
    practice not usually I'm not seeing a
  • 00:33:01
    lot of people do that could you walk us
  • 00:33:03
    through like how do you write that test
  • 00:33:05
    first and then how do you ask it to
  • 00:33:06
    write code right after that yeah okay
  • 00:33:09
    this is one that I the reason I started
  • 00:33:11
    to do was because the problem I was
  • 00:33:13
    facing the model just kept messing up
  • 00:33:14
    like every single time it was within our
  • 00:33:16
    code base and I was like this is this is
  • 00:33:19
    a waste of my time the model can't
  • 00:33:20
    figure it out how about I just get it to
  • 00:33:23
    generate the test first and then if the
  • 00:33:25
    test works then it can maybe look at the
  • 00:33:28
    code and say where the issues are
  • 00:33:29
    because models guess what if a test
  • 00:33:31
    fails you can grab the error output give
  • 00:33:34
    it back to the model and say hey like
  • 00:33:36
    please decipher that so let's actually
  • 00:33:38
    see if I can like I can spin up um like
  • 00:33:41
    a little mini project or something or
  • 00:33:43
    yeah yeah let's see here if I can spin
  • 00:33:45
    up something new I actually think this
  • 00:33:47
    is really cool and this is like
  • 00:33:48
    something like really truly not enough
  • 00:33:50
    people are doing this and if it legit
  • 00:33:52
    helps you write better code because it
  • 00:33:54
    makes sense you have the test that's
  • 00:33:56
    supposed to run successfully and it can
  • 00:33:58
    use that as instructions and it can use
  • 00:33:59
    that to like test to make sure it's
  • 00:34:00
    actually working I'm surprised not a lot
  • 00:34:02
    of people not more people are doing this
  • 00:34:04
    where it's like right that's like it's
  • 00:34:05
    just a lot easier for the llm to like do
  • 00:34:08
    that and then your code
  • 00:34:10
    is I guess like you know less spaghetti
  • 00:34:12
    because you're not you don't you're not
  • 00:34:13
    worried about you know if something
  • 00:34:15
    changes like the model like you start
  • 00:34:17
    with the tests and it's really easy for
  • 00:34:18
    the model to generate it okay so I got I
  • 00:34:21
    that took a little time I got a uh a
  • 00:34:24
    cursor here so this is just a super
  • 00:34:26
    quick um let me just grab the screen
  • 00:34:29
    here super quick here so I have this you
  • 00:34:33
    know super basic thing we can just
  • 00:34:34
    terminal we can run it and I can go you
  • 00:34:37
    know button
  • 00:34:40
    index.ts Hello World um now I should
  • 00:34:43
    like to start with cursor and I'll just
  • 00:34:45
    say something along the lines of like
  • 00:34:47
    literally and again I actually don't
  • 00:34:49
    know how to write tests and fun so I can
  • 00:34:51
    just go to cursor I open up command I
  • 00:34:53
    and for those you don't this is like the
  • 00:34:54
    composer it lets you uh coordinate and
  • 00:34:57
    create file so I'm going to say you know
  • 00:34:58
    I'm using FN uh for now create a test uh
  • 00:35:04
    file for a method and then make the
  • 00:35:08
    method uh that let's say for now
  • 00:35:11
    reverses a string super simple um and oh
  • 00:35:15
    I guess I'm out of slow request
  • 00:35:17
    unfortunately okay wow so it what it'll
  • 00:35:20
    first do is it'll create the test right
  • 00:35:22
    and this is obviously a really simple
  • 00:35:23
    example and so here I'm happy with this
  • 00:35:27
    all right I'll I'll just accept this um
  • 00:35:31
    and now right off the bat like there's
  • 00:35:34
    you know how many whatever five tests
  • 00:35:36
    here so obviously I have the actual
  • 00:35:39
    function so here in this example just
  • 00:35:40
    reversing string now the nice part is I
  • 00:35:43
    can go here I can say you know bun I
  • 00:35:45
    guess it's uh reverse
  • 00:35:49
    test.ts um and I can again debug with
  • 00:35:53
    composer this is a nice part I can just
  • 00:35:54
    go debug with I
  • 00:35:59
    got up I got up
  • 00:36:01
    my man I'm out of the
  • 00:36:07
    free that's how much I use cursor I I
  • 00:36:10
    just always blow through the budget but
  • 00:36:11
    Okay cool so here it like you know
  • 00:36:14
    passes the test but let's actually say
  • 00:36:15
    that like we are using something a
  • 00:36:16
    little bit more complicated than
  • 00:36:18
    reversing uh a string now I can go into
  • 00:36:20
    here and I can say let's just not
  • 00:36:23
    reverse it let's just say like let's
  • 00:36:24
    just break the code let's just say here
  • 00:36:27
    we'll split it like this okay um return
  • 00:36:31
    dot okay so now if I go here I go test
  • 00:36:35
    if I go test file so all these tests
  • 00:36:36
    fail right now obviously this like a
  • 00:36:38
    pretty simple example and it's almost as
  • 00:36:41
    simple as just clicking this button that
  • 00:36:43
    says add to composer and then I say um
  • 00:36:47
    you
  • 00:36:48
    know please fix the reverse method due
  • 00:36:53
    to errors and now the nice part is here
  • 00:36:56
    cursor will pull in that terminal
  • 00:36:58
    that'll throw you know the errors where
  • 00:36:59
    it happens and what cursor will do is
  • 00:37:03
    they'll look at that and they'll say hey
  • 00:37:04
    look I see what the issue is and it'll
  • 00:37:06
    just fix it so this is kind of what I
  • 00:37:08
    like to call of like I don't actually
  • 00:37:09
    have a name for it yet maybe llm test
  • 00:37:12
    driven development whatever you want to
  • 00:37:14
    call it but it's like you come in and
  • 00:37:15
    you describe what you're trying to do
  • 00:37:17
    here the llm writes the tests for for it
  • 00:37:20
    and then it's going to write the method
  • 00:37:22
    and then what you can do is have it run
  • 00:37:24
    and now if the method itself like this
  • 00:37:25
    function which is reversing a string is
  • 00:37:27
    is complex or confusing it will be able
  • 00:37:30
    to sort of like essentially agentically
  • 00:37:32
    air quote here fix itself if that makes
  • 00:37:34
    sense it'll test the code see if it
  • 00:37:36
    passes the tests if not it'll update the
  • 00:37:39
    code and then sort of do that until it
  • 00:37:41
    can you know pass the test and all you
  • 00:37:43
    have to do is make sure that the tests
  • 00:37:45
    you're writing are correct and and I use
  • 00:37:47
    this a lot for obviously for simple
  • 00:37:49
    functions it's not that useful but when
  • 00:37:51
    you have code that is across a couple
  • 00:37:53
    different files you know in a in a
  • 00:37:55
    modern code base it's not just a single
  • 00:37:57
    function it's like you have like you
  • 00:37:58
    know a bunch of different files and and
  • 00:38:00
    stuff connecting and ones that require a
  • 00:38:04
    lot of like conditionals or
  • 00:38:07
    like they're not as simple as this it's
  • 00:38:09
    like that's where I found that whenever
  • 00:38:11
    I would try to get like cursor or sorry
  • 00:38:13
    I get like son it to oneshot it it would
  • 00:38:14
    fail every single time but then a second
  • 00:38:16
    that I was like okay please let's write
  • 00:38:19
    the test for it and then I would sit
  • 00:38:20
    there and kind of help it write the test
  • 00:38:21
    it was able to debug itself a lot better
  • 00:38:23
    and go through these like bigger maybe
  • 00:38:26
    meteor functions that normally wouldn't
  • 00:38:28
    be able to even like 01 and 01 mini
  • 00:38:30
    couldn't solve but a second that I would
  • 00:38:32
    apply this like test driven development
  • 00:38:34
    whatever you want to call it the model
  • 00:38:35
    was able to look at the output see where
  • 00:38:37
    it messes up adjust the code and kind of
  • 00:38:39
    iterate on itself like that that's cool
  • 00:38:41
    so not only does this test first mindset
  • 00:38:45
    um it's kind of like a prompt
  • 00:38:46
    engineering technique it's almost like
  • 00:38:47
    think out loud but it's almost like
  • 00:38:49
    write the goal first and then tell me
  • 00:38:50
    what you think we should do for it but
  • 00:38:52
    you also get tests out the other end and
  • 00:38:54
    so you get a little bit of extra utility
  • 00:38:55
    as a byproduct
  • 00:38:59
    exactly it's a it's a win-win you get a
  • 00:39:01
    little bit of both and to me that was
  • 00:39:03
    the one thing I never understood why
  • 00:39:05
    people haven't done more of because you
  • 00:39:07
    would think well if it pass all the
  • 00:39:08
    tests the the code is like you know
  • 00:39:11
    you're happy that it passed the test but
  • 00:39:12
    it's something that I haven't seen a lot
  • 00:39:13
    of people do yeah yeah for sure well
  • 00:39:15
    that's awesome well that's fabulous
  • 00:39:17
    thank you for showing me the cursor
  • 00:39:18
    example one of the questions I love
  • 00:39:20
    asking is I want to know what the smart
  • 00:39:22
    people are talking about right now like
  • 00:39:24
    in AI so like as You observe on Twitter
  • 00:39:27
    in your circles what are the smart
  • 00:39:29
    people talking about that's a good
  • 00:39:31
    question oh man I
  • 00:39:33
    think what I see a lot of people talking
  • 00:39:36
    about is sort of the you know what's it
  • 00:39:39
    called like test time compute like 01
  • 00:39:41
    thinking I see a lot of people talking
  • 00:39:42
    about those I see a lot of people
  • 00:39:44
    talking about having think those
  • 00:39:47
    thinking models do more agentic sorts of
  • 00:39:50
    tasks um and basically bringing this
  • 00:39:54
    what I like to think of as an agent as a
  • 00:39:56
    for Loop inside to the model uh thinking
  • 00:39:59
    process having and training the the
  • 00:40:02
    model to just innately be able to call
  • 00:40:04
    tools like and we saw that I think a
  • 00:40:07
    good example that is uh computer use
  • 00:40:09
    right from anthropic right they they
  • 00:40:11
    obviously fine-tuned in on that so I see
  • 00:40:13
    a lot of people talking about that um I
  • 00:40:15
    do see what I started to notice is
  • 00:40:17
    people starting to talk about whether
  • 00:40:19
    we've hit some variation of a wall I
  • 00:40:21
    don't know if you've seen it too and
  • 00:40:22
    I've hearing a little rumors that you
  • 00:40:24
    know Cloud 3.5 Opus is not up to par and
  • 00:40:28
    like the the new Gemini model is not as
  • 00:40:31
    good so I I've hearing that as well um
  • 00:40:35
    and what else are people really talking
  • 00:40:37
    about and I think I think we spoke a lot
  • 00:40:39
    about the other things model
  • 00:40:40
    distillation um and the other thing I'm
  • 00:40:42
    starting to see more of is people being
  • 00:40:44
    a little bit not I guess talking more
  • 00:40:47
    about evals like I I think a lot of
  • 00:40:48
    people didn't really talk about it and
  • 00:40:50
    people are saying hey like from a
  • 00:40:51
    product perspective if you want your
  • 00:40:53
    product to be good you need to write
  • 00:40:54
    evals which are just a way of writing
  • 00:40:56
    test so that's kind of what I seeing and
  • 00:40:57
    I don't know if you've SE anything
  • 00:40:58
    different but just from what I've heard
  • 00:41:00
    from people talking yeah let me think is
  • 00:41:02
    there any anything else I would
  • 00:41:04
    add to that list um the one thing people
  • 00:41:07
    aren't talking about it but I think it
  • 00:41:09
    will be a big deal when it actually
  • 00:41:10
    comes out is the whole feature
  • 00:41:11
    engineering um weight manipulation uh
  • 00:41:14
    like the Golden Gate uh Claude
  • 00:41:17
    anthropic I'm still waiting for access
  • 00:41:20
    to that because that is going to be an
  • 00:41:21
    alternative to prompt engineering and I
  • 00:41:22
    have no idea like how easy it's going to
  • 00:41:24
    be to work with what kind of results
  • 00:41:26
    we're going to get but I'm excited test
  • 00:41:27
    that whenever it comes out yeah I I I
  • 00:41:29
    remember seeing that I was like I was
  • 00:41:31
    blown away and I kind of forgot about it
  • 00:41:32
    so that I'm actually interested to see
  • 00:41:34
    if they ever will ever let you have that
  • 00:41:36
    much inoperability with those models
  • 00:41:38
    like maybe there's like no no we're good
  • 00:41:40
    sorry we're shelving it like you're not
  • 00:41:41
    allow to touch it right but that be
  • 00:41:43
    really interesting yeah for sure for
  • 00:41:45
    sure um awesome two more questions here
  • 00:41:48
    last one I love hearing about what is in
  • 00:41:50
    people's tool kit so I've seen you use
  • 00:41:52
    Exel draw on Excel draw on your YouTube
  • 00:41:56
    videos I've seen you use repet I've
  • 00:41:58
    heard Rumblings about VZ what else is in
  • 00:42:00
    your toolkit that is in your kind of
  • 00:42:02
    day-to-day workflows okay okay so
  • 00:42:04
    there's a lot I guess yeah you got you
  • 00:42:05
    got a couple V Zer obviously there's
  • 00:42:07
    cursor um excal draw I like it for
  • 00:42:10
    drawing little diagrams um the other one
  • 00:42:13
    I guess that I use a lot is the
  • 00:42:15
    playground from anthropic and from open
  • 00:42:17
    AI uh which is like different than chat
  • 00:42:19
    GPT I use that to iterate on prompts
  • 00:42:22
    um I use this yeah the the one that I
  • 00:42:25
    use for transcribing uh the actual audio
  • 00:42:28
    is called whisper flow it's the one
  • 00:42:30
    where I like I have a hotkey that I
  • 00:42:31
    press and it takes the voice and
  • 00:42:34
    transcribes it into the inputs that you
  • 00:42:35
    saw me use um the other tooling that I
  • 00:42:38
    use I mean we can go do you want to go
  • 00:42:40
    into the technical side or are we just
  • 00:42:42
    going to leave it at like the high level
  • 00:42:44
    I let's let's not go like I don't want
  • 00:42:45
    to know your entire teex stack but like
  • 00:42:47
    what is in like the cool AI stuff that
  • 00:42:49
    like you're you're you're grabbing for I
  • 00:42:52
    think that's pretty much it I think um I
  • 00:42:55
    think you got it there I I there's not
  • 00:42:57
    many other tools that I honestly use
  • 00:42:58
    like I just like I a lot of it's yeah
  • 00:43:02
    like just writing the code Langs Smith
  • 00:43:03
    is one actually I will say that we we we
  • 00:43:06
    use Lang Smith a lot for eval that's
  • 00:43:08
    like the other one um but yeah that's
  • 00:43:10
    pretty much it from from me I think you
  • 00:43:12
    nailed it vzero cursor excal draw um OBS
  • 00:43:16
    if you're recording videos yeah yeah
  • 00:43:18
    yeah yeah for sure um all right last
  • 00:43:20
    question and this is kind of off topic
  • 00:43:22
    from the AI side but I know people would
  • 00:43:23
    be interested in it so you've had a few
  • 00:43:25
    bangers on Twitter like just some things
  • 00:43:27
    that just absolutely pop and as somebody
  • 00:43:29
    who does a little bit of Twitter himself
  • 00:43:30
    too I can look at a tweet and be like
  • 00:43:32
    that person thought about it and they
  • 00:43:33
    did a really good job as to how they
  • 00:43:34
    architected and constructed it and I
  • 00:43:36
    noticed that with yourself so what hits
  • 00:43:38
    on Twitter and what what's your advice
  • 00:43:40
    for people who like want to do better on
  • 00:43:43
    it oh man okay so Twitter is just this
  • 00:43:46
    hilarious platform that the algorithm
  • 00:43:49
    changes a lot so it's you kind of got to
  • 00:43:51
    get a feel for what works and what
  • 00:43:53
    doesn't and luckily the cost so for
  • 00:43:55
    anyone's looking to grow the cost to
  • 00:43:57
    post on X Twitter is zero like you don't
  • 00:44:00
    pay anything if it doesn't do well no
  • 00:44:02
    one cares so it's the one platform where
  • 00:44:05
    the cost is literally zero because
  • 00:44:07
    you're just typing so type things away
  • 00:44:09
    how I craft a banger it's like a mixture
  • 00:44:12
    of what I see trending so what I see
  • 00:44:15
    what people are talking about and
  • 00:44:17
    there's two ways to craft a banger one
  • 00:44:20
    is you have to be controversial I'm
  • 00:44:22
    you're are not going to craft a banger
  • 00:44:23
    if you're not controversial now there's
  • 00:44:25
    pros and cons if you're posting that
  • 00:44:27
    kind of stuff all the time people will
  • 00:44:28
    be like hey you're just posting
  • 00:44:30
    clickbait so you got to be careful with
  • 00:44:31
    it you can't be like this is insane and
  • 00:44:34
    every single tweet starts with that like
  • 00:44:36
    no one and no one's going to believe you
  • 00:44:37
    but start saying something controversial
  • 00:44:40
    and the most important part of crafting
  • 00:44:42
    a banger is your hook it I can tell like
  • 00:44:45
    honestly I'll post something and I can
  • 00:44:47
    tell within 20 minutes if it's going to
  • 00:44:49
    be a banger or not and it's basically
  • 00:44:52
    how natural does it come that's one
  • 00:44:54
    that's like how natural did this thought
  • 00:44:55
    come to me and how well did I craft that
  • 00:44:57
    hook everything in
  • 00:44:59
    between like you could you can kind of
  • 00:45:01
    sit there in minmax but the the that's
  • 00:45:04
    how I sit there and sometimes I'll sit
  • 00:45:05
    on something and I'll be like oh man
  • 00:45:07
    like I just don't know the right way to
  • 00:45:09
    say it so I won't post it but then it'll
  • 00:45:11
    just come to me and I'll be like all
  • 00:45:13
    right I got this I all the words I'm
  • 00:45:16
    using the right structure it's like the
  • 00:45:18
    the right timing and and that's kind of
  • 00:45:21
    what goes into crafting it so um the one
  • 00:45:23
    piece of advice that I will give from my
  • 00:45:25
    personal experience is don't spend too
  • 00:45:27
    much time on a tweet because I unless
  • 00:45:30
    you're doing educational there's there
  • 00:45:32
    should be a diagram where the more time
  • 00:45:34
    you spend thinking about a tweet the
  • 00:45:36
    worst it does because I swear the
  • 00:45:38
    majority of my bangers I spend like 15
  • 00:45:40
    minutes thinking about I'm like all
  • 00:45:41
    right I'm just going to post it you know
  • 00:45:43
    grab a coffee I come back and end blow
  • 00:45:44
    it up and then all of a sudden you see
  • 00:45:46
    1.4 million
  • 00:45:48
    views oh man do I have time I have I
  • 00:45:51
    have to I have to tell you the story of
  • 00:45:53
    how the started do I have time for that
  • 00:45:55
    yeah yeah let's hear it okay so because
  • 00:45:57
    it's so relevant to the Banger tweet
  • 00:45:59
    so my company we we started like a year
  • 00:46:03
    and a half ago and right this is around
  • 00:46:05
    the time that agents like people were
  • 00:46:07
    talking about them but didn't have any
  • 00:46:08
    clue this was let's say March
  • 00:46:12
    2023 and at this time I I was no one
  • 00:46:16
    actually knew of my account I literally
  • 00:46:18
    had I had been posting tweets and no one
  • 00:46:21
    replied you know the classic zero views
  • 00:46:23
    you know that's just what happens and
  • 00:46:25
    then and I remember I saw someone else
  • 00:46:28
    post something about Auto GPT and I saw
  • 00:46:30
    it and I was like it looks pretty cool
  • 00:46:32
    but I ignored it and then it came up
  • 00:46:34
    again and I was like no I can't I cannot
  • 00:46:36
    not ignore this like this seems
  • 00:46:38
    something very interesting and i' been
  • 00:46:39
    building actually like AI projects side
  • 00:46:41
    project before this and I was like you
  • 00:46:43
    know what let me like try this thing out
  • 00:46:44
    and obviously I tried it and back then I
  • 00:46:46
    was like dude this is insane agents AI
  • 00:46:49
    is gonna be crazy so when I was like I
  • 00:46:52
    just posted about it and like I didn't
  • 00:46:54
    post anything crazy and I was like oh
  • 00:46:56
    yeah this is thing is kind of cool it's
  • 00:46:57
    pretty crazy and it like got like I
  • 00:46:59
    think that was the first post that got
  • 00:47:01
    over a thousand likes and I was like
  • 00:47:02
    wait a minute wow and then I was like
  • 00:47:04
    hold up hold a second then I saw this
  • 00:47:06
    trend that people wanted to do something
  • 00:47:09
    about like AI agents and it's
  • 00:47:11
    interestingly enough I like thought back
  • 00:47:13
    to an episode of am like my first
  • 00:47:15
    million so funny and and I remember them
  • 00:47:17
    talking about like there's sometimes you
  • 00:47:19
    see like this opportunity and I was like
  • 00:47:20
    dude I got to sit here and I got to do
  • 00:47:22
    two things first I got to craft
  • 00:47:24
    something I got to make a product that
  • 00:47:25
    people want to use and I got to figure
  • 00:47:27
    out the right Twitter thread and
  • 00:47:30
    narrative and story to craft to get
  • 00:47:31
    people on it so that weekend I spent the
  • 00:47:34
    whole weekend building vzero of cognosis
  • 00:47:37
    which was like our previous product in
  • 00:47:39
    the meantime posting Twitter bangers and
  • 00:47:43
    threads about how AI agents were going
  • 00:47:46
    to change everyone's life and every
  • 00:47:48
    single post was getting like a million
  • 00:47:50
    views I'm not even exaggerating oh and I
  • 00:47:52
    was like dude and and I was like okay
  • 00:47:55
    and all I would be posting I was like it
  • 00:47:56
    was was kind of Click baity I was like
  • 00:47:58
    this is going to change your life and
  • 00:47:59
    then getting like million view million
  • 00:48:01
    views and I post the product like I was
  • 00:48:03
    like Hey like here I built this thing
  • 00:48:04
    for you people to go and try because I
  • 00:48:06
    know from what you've been telling me um
  • 00:48:09
    you don't want to go through GitHub and
  • 00:48:10
    I and I posted out and it was literally
  • 00:48:12
    built it in like three days and within
  • 00:48:15
    like two days we got 50,000 users so my
  • 00:48:19
    goodness that is so crazy the the
  • 00:48:22
    craziest two weeks and the most
  • 00:48:23
    stressful two weeks of my life and it
  • 00:48:26
    start started all from how can I craft a
  • 00:48:28
    banger tweet so I I will say that that
  • 00:48:31
    was why it's so relevant and so funny it
  • 00:48:33
    just shows how powerful uh writing well
  • 00:48:36
    and writing with the right timing and
  • 00:48:38
    structure given what's happening can
  • 00:48:40
    potentially you know help you start a
  • 00:48:42
    company so and with that that is an
  • 00:48:44
    absolutely beautiful story to end on
  • 00:48:46
    suly thank you very much for joining us
  • 00:48:48
    today oh dude it it was a pleasure I I
  • 00:48:50
    enjoyed it and hopefully my workflow is
  • 00:48:53
    applicable to other people people can
  • 00:48:54
    look at it and see that like hey using
  • 00:48:57
    AI is just not that hard you just got to
  • 00:48:59
    talk to the computer and it'll do stuff
  • 00:49:02
    for you
Etiquetas
  • AI models
  • model distillation
  • prompt engineering
  • model routing
  • AI evaluation
  • task optimization
  • language models
  • efficient AI use