A conversation with OpenAI's CPO Kevin Weil, Anthropic's CPO Mike Krieger, and Sarah Guo

00:40:58
https://www.youtube.com/watch?v=IxkvVZua28k

Summary

TLDRIn a panel discussion, Kevin and Mike, leaders in AI product development, share their insights on the rapid transformation within the AI landscape and the implications for product management. They articulate their excitement for AI’s evolving capabilities and underscore the challenges of developing AI technologies that meet user needs. They highlight the importance of user feedback in shaping product experiences and the evolving role of product managers in an AI-dominated environment. Key themes include the need for product managers to master evaluation skills, the opportunity to create proactive AI interactions, and the way users start forming emotional bonds with AI entities. The discussion reflects on real-world applications of AI, the adaptability of users, and forecasts the future of AI integration in everyday tasks.

Takeaways

  • 👑 Kevin and Mike are excited about their new roles in AI product development.
  • 💡 AI capabilities are evolving rapidly, changing product experiences.
  • 🤝 User feedback is crucial for refining and enhancing AI products.
  • 🚀 Future interactions with AI may become more proactive and personalized.
  • 🛠️ Product managers need to develop new skills for working with AI technologies.
  • 📈 Observing how users adapt can inform better design strategies.
  • 🍕 A memorable anecdote involved an AI ordering pizza during internal testing.
  • 🧠 Users form emotional connections with AI, seeing them as entities with personality.
  • ⚙️ Effective evaluation methods can significantly enhance AI product quality.
  • 🌍 Future AI may facilitate international communication through real-time translation.

Timeline

  • 00:00:00 - 00:05:00

    The discussion opened with Sarah expressing excitement to be with Kevin and Mike, both known for their expertise in AI and previous roles at Instagram. She proposed discussing new product ideas but settled for a casual exchange of insights.

  • 00:05:00 - 00:10:00

    Kevin shared that he finds his new role in AI product management both challenging and fascinating, as it involves constantly adapting to new technological capabilities. He described the experience as sleepless but rewarding and highlighted the rapid evolution of AI technology.

  • 00:10:00 - 00:15:00

    Mike reflected on the various reactions he received from peers upon joining the AI team. While some were supportive, others questioned his choice given his previous semi-retirement. He emphasized that his passion for innovation drove him back into the tech world.

  • 00:15:00 - 00:20:00

    Both Kevin and Mike discussed their transition to enterprise roles, noting the differences from their previous consumer-focused experiences. They expressed excitement about engagement with enterprise clients and receiving direct feedback on how products are used, which differs from consumer feedback.

  • 00:20:00 - 00:25:00

    Kevin pointed out the unique challenges in enterprise product management, such as aligning with buyer goals and predicting market needs. He emphasized that understanding the audience and specific use cases is critical for successful AI product development.

  • 00:25:00 - 00:30:00

    The conversation turned toward the unpredictable nature of AI products, with Mike noting that product managers must remain flexible as they await the outcomes of model training and research insights. They both acknowledged the challenge of navigating evolving AI capabilities.

  • 00:30:00 - 00:35:00

    Mike shared insights on the importance of evaluation skills for product managers, stressing that a deeper understanding of AI models and effective eval writing will be essential for successful project outcomes moving forward.

  • 00:35:00 - 00:40:58

    The discussion concluded with a glimpse into the future of AI products, focusing on more personalized and proactive interactions, improving user experience through intuitive features, and how innovations will rapidly reshape both consumer and enterprise landscapes.

Show more

Mind Map

Video Q&A

  • What is one major topic discussed in the panel?

    The challenges and excitement of working in AI product development.

  • How do Kevin and Mike view the evolution of AI capabilities?

    They see it as rapidly advancing, with AI models becoming more sophisticated and useful.

  • What is one way AI is expected to change user interactions?

    Through more proactive and personalized communication with users.

  • Why is user feedback important in AI product development?

    It helps improve the AI's understanding and capability to meet user needs.

  • What kind of skills should product managers develop according to the discussion?

    Skills in writing evaluations and prototyping with AI models.

  • How do Kevin and Mike suggest educating users about AI?

    By leveraging power users in organizations to teach and share their experiences.

  • What interesting anecdote is mentioned regarding internal AI usage?

    An AI model successfully ordered pizza for the office during beta testing.

  • What do the speakers say about user adaptation to AI?

    They express amazement at how quickly users adapt to new technologies like AI.

  • What key aspect of future AI interactions is discussed?

    The idea of AI being proactive and able to handle complex tasks asynchronously.

  • What reflections do Mike and Kevin have on the emotional relationship users form with AI?

    They discuss users developing empathy towards the AI and how they respond to its personality.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:18
    all right hello
  • 00:00:20
    everyone okay okay Sarah you're the
  • 00:00:23
    queen of AI investing this is a phrase
  • 00:00:25
    never ever to be used again but it's
  • 00:00:27
    great to be here with both of you um so
  • 00:00:30
    I had two different ideas for our final
  • 00:00:33
    discussion the first was a product off
  • 00:00:37
    because these two men have the merge to
  • 00:00:40
    to prod button both of them and I was
  • 00:00:42
    like oh please just release everything
  • 00:00:43
    we know is coming over the next six or
  • 00:00:45
    12 months ignore all internal guidelines
  • 00:00:48
    um the second was we just redesigned
  • 00:00:50
    Instagram together since they've both
  • 00:00:52
    actually run Instagram
  • 00:00:54
    before uh both of these got fully shot
  • 00:00:56
    down and so instead I think we'll just
  • 00:00:59
    like trade notes among friends like lame
  • 00:01:01
    I know but uh but really excited to hear
  • 00:01:03
    from you both anyway uh so this is
  • 00:01:06
    actually a relatively new role for both
  • 00:01:09
    of you Kevin let's start with you like
  • 00:01:12
    you've done a bunch of really different
  • 00:01:13
    interesting things like what was the
  • 00:01:14
    reaction you got when you took the job
  • 00:01:16
    from friends and the
  • 00:01:18
    team uh generally excitement I mean uh
  • 00:01:21
    it's I think it's one of the most
  • 00:01:23
    interesting and impactful roles there's
  • 00:01:24
    so much to figure out um I've never had
  • 00:01:29
    such a challenging
  • 00:01:32
    interesting uh Sleepless product role in
  • 00:01:35
    my life uh it's it's got all the
  • 00:01:38
    challenges of a normal product role
  • 00:01:40
    where you're trying to figure out who
  • 00:01:41
    you're building for and what problems
  • 00:01:43
    you can solve and things like that but
  • 00:01:46
    normally when you're Building Product
  • 00:01:47
    You're Building off of kind of a fixed
  • 00:01:49
    technology base right you know what you
  • 00:01:50
    have to work with and uh and you're
  • 00:01:53
    trying to build the best product you can
  • 00:01:56
    here it's like every two months
  • 00:01:58
    computers can do something computers
  • 00:01:59
    have never been able to do before in the
  • 00:02:01
    history of the world and you're trying
  • 00:02:02
    to figure out how that changes your
  • 00:02:05
    product and the answer should probably
  • 00:02:07
    be a fair amount um and so it just it's
  • 00:02:11
    so
  • 00:02:12
    interesting uh and fascinating to see on
  • 00:02:15
    the inside as AI gets developed but uh
  • 00:02:17
    I've been having a
  • 00:02:18
    blast Mike what about you I I remember
  • 00:02:21
    hearing the news I was like oh I didn't
  • 00:02:23
    know you could convince the founder of
  • 00:02:24
    Instagram to go work on something that
  • 00:02:26
    existed already yeah my favorite three
  • 00:02:28
    reactions like people who know me were
  • 00:02:29
    like that makes sense like you're going
  • 00:02:31
    to like have fun there uh the middle
  • 00:02:33
    people were like why like you don't have
  • 00:02:35
    to work like why are you doing this and
  • 00:02:37
    like then if you knew me you know me
  • 00:02:38
    like I can't not and I think that like I
  • 00:02:40
    couldn't stop myself and the third was
  • 00:02:42
    like oh you could hire the founder of
  • 00:02:44
    Instagram which was also fun and it's
  • 00:02:45
    like I mean not many people could but
  • 00:02:47
    like there's like probably a list of
  • 00:02:49
    three companies that would have been
  • 00:02:50
    interesting um and so yeah there's like
  • 00:02:52
    a range of reactions depending on how
  • 00:02:54
    well you knew me and how like you've
  • 00:02:56
    seen me in my like uh semi-retired state
  • 00:02:59
    which lasted like six weeks and I was
  • 00:03:00
    like all right what are we doing
  • 00:03:02
    next um so we had dinner together with a
  • 00:03:05
    bunch of friends recently and you I was
  • 00:03:08
    like impressed by the childish Delight
  • 00:03:11
    that you had around like yeah I'm
  • 00:03:12
    learning about all this Enterprise stuff
  • 00:03:15
    like tell me if it's about serving
  • 00:03:18
    customers that are not all of us with
  • 00:03:19
    Instagram or uh just you know working in
  • 00:03:22
    a a an organization that's research
  • 00:03:24
    driven like what's the biggest surprise
  • 00:03:26
    so far I those are two I think both very
  • 00:03:29
    worthwhile pieces of this role that are
  • 00:03:31
    like very new to me as well like I um
  • 00:03:33
    when I was 18 I made this like very
  • 00:03:34
    18-year-old vow which was like every
  • 00:03:36
    year of my life I wanted to be different
  • 00:03:37
    like I don't have the same year twice
  • 00:03:39
    and I was like it's why like you know I
  • 00:03:41
    didn't there's been times where I was
  • 00:03:42
    like oh like another social product I'm
  • 00:03:44
    like doing that again first of all like
  • 00:03:46
    your bar is like really distorted and
  • 00:03:48
    second of all like just would feel like
  • 00:03:49
    too much of the same thing so yeah
  • 00:03:51
    Enterprise has been wild I'm really
  • 00:03:53
    curious about your experience with that
  • 00:03:54
    as well like you're like you know uh
  • 00:03:57
    your your feedback GL I actually imagine
  • 00:03:59
    it's a lot more like investing is far
  • 00:04:00
    longer right you're like you have that
  • 00:04:02
    initial convo and you're like I think
  • 00:04:03
    they like me and then you're like oh no
  • 00:04:05
    it's now in like some requisition State
  • 00:04:07
    and it's going to take like six months
  • 00:04:09
    before they like even get to deployment
  • 00:04:11
    before you know whether it's right and
  • 00:04:12
    so like getting used to that pace we're
  • 00:04:14
    like what why hasn't this shipped yet
  • 00:04:16
    they're like Mike you've been here two
  • 00:04:17
    months like this is like it's making its
  • 00:04:19
    way through the VPS like it's going to
  • 00:04:21
    get there eventually so like getting
  • 00:04:22
    used to different timelines for sure but
  • 00:04:24
    like the part that is fun is actually
  • 00:04:26
    getting the feedback in the kind of
  • 00:04:28
    Engagement where you're like once it
  • 00:04:29
    gets the deployed you have somebody that
  • 00:04:31
    can call you and you can call them and
  • 00:04:32
    be like how's it working for you like is
  • 00:04:34
    this good like whereas users like you're
  • 00:04:36
    doing like data science and Aggregate
  • 00:04:37
    and like sure you can bring in like one
  • 00:04:39
    or two people but it's like they won't
  • 00:04:41
    they don't have enough financial
  • 00:04:42
    incentive writing on telling you where
  • 00:04:44
    you suck and where you're doing well and
  • 00:04:45
    like that's been a different but also
  • 00:04:47
    like rewarding side of that for sure ke
  • 00:04:50
    Kevin you've worked on such a wide range
  • 00:04:52
    of products before how much do your
  • 00:04:54
    instincts like apply yeah I was going to
  • 00:04:57
    add on to the Enterprise Point too um
  • 00:04:58
    and then I'll get to like the other
  • 00:05:01
    interesting thing about Enterprise is
  • 00:05:02
    it's not necessarily about the product
  • 00:05:04
    right there's a buyer and they have
  • 00:05:07
    goals and you could build the best
  • 00:05:09
    product in the world that all the people
  • 00:05:10
    at the company might be happy to use and
  • 00:05:12
    it still doesn't necessarily matter
  • 00:05:14
    exactly um I was in a a meeting with one
  • 00:05:18
    of our big Enterprise customers and they
  • 00:05:20
    were like um this is great we're really
  • 00:05:23
    happy da D you know the one thing we
  • 00:05:25
    need is we really need you to tell us 60
  • 00:05:29
    days before you launch
  • 00:05:31
    anything and I was like I also would
  • 00:05:34
    like to know 60
  • 00:05:37
    days so very very different actually and
  • 00:05:40
    it's interesting right because at open
  • 00:05:41
    AI we have a consumer product and we
  • 00:05:44
    have an Enterprise product and we have a
  • 00:05:46
    developer product so we're kind of doing
  • 00:05:48
    all at once um Instinct
  • 00:05:53
    wise in I'd say in like half the job it
  • 00:05:56
    works you know when you have when you
  • 00:05:59
    have a sense of the product you're
  • 00:06:00
    trying to build you know we're getting
  • 00:06:03
    towards uh you know the end of shipping
  • 00:06:06
    uh Advanced speech mode or something or
  • 00:06:08
    you're getting towards shipping canvas
  • 00:06:10
    and you're making final touches trying
  • 00:06:12
    to understand who you're building for
  • 00:06:14
    and like what you know exactly what
  • 00:06:15
    problems you're trying to solve it it
  • 00:06:18
    works then because it's that's a little
  • 00:06:19
    bit more like the tail end of it is
  • 00:06:21
    shipping a normal product but the
  • 00:06:23
    beginning of these things is nothing
  • 00:06:25
    like that
  • 00:06:27
    um so uh
  • 00:06:31
    like there will just be these
  • 00:06:33
    capabilities that we don't
  • 00:06:35
    know you you have some sense as you're
  • 00:06:38
    training some new model that it might
  • 00:06:41
    have capability X you don't really know
  • 00:06:44
    nor does the research team nor does
  • 00:06:46
    anybody right you're like I think this
  • 00:06:48
    might be possible and it's kind of like
  • 00:06:50
    coming through the M statue but it's
  • 00:06:52
    this emergent property of a model and
  • 00:06:55
    you know so you don't know whether it's
  • 00:06:56
    going to really work and you don't know
  • 00:06:58
    whether it's going to be like 60% good
  • 00:07:01
    or 90% good or 99% good and the product
  • 00:07:06
    that you would build that would make
  • 00:07:08
    sense with something that works 60% of
  • 00:07:10
    the time is Super different than 90 or
  • 00:07:12
    99% of the time right so you're kind of
  • 00:07:14
    just waiting and you're you know at
  • 00:07:16
    least like I don't know if you feel this
  • 00:07:18
    checking in with the research team from
  • 00:07:20
    time to time like hey guys how's it
  • 00:07:22
    going how's that model training uh any
  • 00:07:25
    any insight on this and they're like
  • 00:07:27
    it's research we're working on it you
  • 00:07:29
    know it's uh we don't know either we're
  • 00:07:31
    we're we're we're working through this
  • 00:07:32
    at the same time and it's I mean it
  • 00:07:35
    makes it super fun because you're kind
  • 00:07:36
    of like discovering things together but
  • 00:07:39
    very sort of stochastic too it's the
  • 00:07:41
    thing it most reminds me of like from
  • 00:07:43
    the Instagram days where like apple like
  • 00:07:45
    wwc announcements you're like this could
  • 00:07:47
    either be awesome for us or could like
  • 00:07:49
    absolutely like cause chaos for it it's
  • 00:07:51
    like that but your own company is the
  • 00:07:53
    one kind of disrupting you from within
  • 00:07:55
    which is like a very like it's very cool
  • 00:07:57
    but also like oh this might totally in
  • 00:07:59
    my product road map now
  • 00:08:02
    yeah uh what does that cycle look like
  • 00:08:05
    for both of you you described it as um
  • 00:08:08
    you know like peering through the Mist
  • 00:08:09
    trying to look at the next set of
  • 00:08:11
    capabilities I mean can you can you plan
  • 00:08:14
    if you don't know exactly what is coming
  • 00:08:16
    and what is the iteration cycle to
  • 00:08:17
    discover new things that should belong
  • 00:08:18
    in your product I think like on the
  • 00:08:20
    intelligence side you can sort of squint
  • 00:08:22
    and see like all right it's advancing
  • 00:08:24
    this way and so the kinds of things that
  • 00:08:26
    you'll want to do with the model and
  • 00:08:28
    start building the product around that
  • 00:08:29
    so I I there's three ways right
  • 00:08:31
    intelligence feels not predictable but
  • 00:08:33
    at least on like a slope that you can
  • 00:08:35
    kind of watch there's the capabilities
  • 00:08:37
    you decide to invest in from the product
  • 00:08:38
    side and then do fine-tuning with the
  • 00:08:40
    actual research teams um so something
  • 00:08:42
    like artifacts we spend a lot of time
  • 00:08:44
    between research I think the same was
  • 00:08:45
    true with campus right like you're doing
  • 00:08:47
    a like co-design co- research co-
  • 00:08:49
    finetune and that's like I think a real
  • 00:08:51
    privilege of getting to work at this
  • 00:08:52
    company and getting to do design there
  • 00:08:53
    and then there's the capability front so
  • 00:08:55
    maybe speech mode for open a for us it's
  • 00:08:57
    the computer use uh work that we we
  • 00:08:59
    released this week you're like all right
  • 00:09:01
    60% all right got good yes all right
  • 00:09:04
    yeah and like so what we try to do is
  • 00:09:06
    embed designers early in the process but
  • 00:09:08
    knowing that like you're not placing a b
  • 00:09:10
    like the the experimentation talk was
  • 00:09:12
    saying like your um your output for
  • 00:09:14
    experiment should be learning not
  • 00:09:15
    necessarily like perfect products you're
  • 00:09:16
    going to ship every time I think the
  • 00:09:17
    same is true when you partner with
  • 00:09:18
    research like your outcome is hopefully
  • 00:09:20
    demos or informative things that like
  • 00:09:22
    could spark product ideas not like a
  • 00:09:24
    predictable product process where you're
  • 00:09:26
    like well it's this der risked by now
  • 00:09:28
    which means it's going to look that way
  • 00:09:29
    when research comes along I've also one
  • 00:09:32
    thing that I've really enjoyed because
  • 00:09:34
    research is at least parts parts of
  • 00:09:36
    research are very product oriented
  • 00:09:38
    especially on the post-training side
  • 00:09:39
    like Mike was saying and then parts of
  • 00:09:41
    it are really like academic research at
  • 00:09:43
    some level and so you we'll just also
  • 00:09:47
    occasionally hear about some capability
  • 00:09:49
    and we'll be in a meeting and you'll be
  • 00:09:51
    like oh I really wish we could do this
  • 00:09:52
    thing and a researcher on the team will
  • 00:09:55
    be like oh no we can do that we've had
  • 00:09:57
    that for three months and we're like
  • 00:09:59
    really what does that like okay where do
  • 00:10:02
    I learn more and they're like oh well we
  • 00:10:03
    didn't think we didn't we didn't know it
  • 00:10:05
    was important so you know I'm working on
  • 00:10:07
    this other thing now um but you do just
  • 00:10:10
    get like magic happening sometimes
  • 00:10:13
    too uh one thing like we think a lot
  • 00:10:16
    about when we're investing is actually
  • 00:10:18
    like can you do anything with a model if
  • 00:10:21
    it is 60% successful at a task instead
  • 00:10:23
    of 99% and unlike lots of tasks that's
  • 00:10:26
    closer to 60 right but the task is
  • 00:10:28
    really important valuable like how do
  • 00:10:30
    you how do you think about that
  • 00:10:32
    internally in terms of evaluating
  • 00:10:34
    progression on a task and then what what
  • 00:10:37
    types of things you like put in sort of
  • 00:10:40
    the burden of product to make it
  • 00:10:42
    graceful failure or to like sort of
  • 00:10:44
    cross the miles the user versus you know
  • 00:10:47
    we just need to wait for the models to
  • 00:10:48
    get better I'd argue there are a lot of
  • 00:10:50
    things that you can actually do when
  • 00:10:52
    something is 60% right you just you just
  • 00:10:54
    need to really design for it um you have
  • 00:10:56
    to expect that there's a human in the
  • 00:10:58
    loop a lot more than there would be
  • 00:10:59
    otherwise like if you look at uh take
  • 00:11:02
    like GitHub co-pilot right that was kind
  • 00:11:04
    of the first AI product that really open
  • 00:11:07
    people's eyes to like this thing can be
  • 00:11:09
    useful not just as you know Q&A but for
  • 00:11:11
    really economically valuable work and
  • 00:11:15
    that launched I don't know exactly which
  • 00:11:16
    model that was built off of but I mean
  • 00:11:18
    it was multiple Generations ago so I
  • 00:11:20
    guarantee you that model wasn't perfect
  • 00:11:22
    at anything related to coding I think
  • 00:11:24
    it's gpt2 which is like pretty small so
  • 00:11:27
    yeah I mean and so but the fact that it
  • 00:11:29
    was still valuable for you cuz if it got
  • 00:11:31
    the code you know some significant
  • 00:11:34
    fraction of the way there that was still
  • 00:11:36
    stuff you didn't have to type yourself
  • 00:11:37
    and you could edit it and so there are
  • 00:11:39
    experiences like that that I think
  • 00:11:40
    totally work I think we'll see the same
  • 00:11:42
    kinds of things happening with um with
  • 00:11:45
    sort of the the shift towards uh agents
  • 00:11:48
    and longer form tasks where you know it
  • 00:11:51
    may not be perfect but if it can save
  • 00:11:54
    you five or 10 minutes that's still
  • 00:11:55
    valuable and even more if the model can
  • 00:11:57
    understand where it doesn't have
  • 00:12:00
    confidence and can come back to you and
  • 00:12:01
    say I'm not sure about this can you
  • 00:12:02
    actually help me with this then you know
  • 00:12:05
    the the combination of human and model
  • 00:12:07
    together can be much higher than 60% I
  • 00:12:09
    also find that 60% this magic 60% number
  • 00:12:12
    like it's kind of lump I made it up five
  • 00:12:14
    minutes ago that was a takeway 6% that
  • 00:12:17
    is our new that's the Mendoza Line of uh
  • 00:12:19
    of AI like I think it's often very lumpy
  • 00:12:22
    where like it'll do very well on some
  • 00:12:23
    tasks and not well on others and I think
  • 00:12:25
    that also helps like uh when we ever run
  • 00:12:27
    like pilot programs with customers is
  • 00:12:29
    really interesting when we'll get like
  • 00:12:30
    the same day feedback from two different
  • 00:12:32
    companies when be like it's solved our
  • 00:12:33
    whole problem like we've been trying to
  • 00:12:35
    do this for three months thank you other
  • 00:12:36
    be like it was way off it's like worse
  • 00:12:38
    than the other model and so like uh it's
  • 00:12:40
    also humbling to know that you have your
  • 00:12:42
    own internal evals but like the rubber
  • 00:12:45
    hitting the road and actually seeing the
  • 00:12:46
    model out in the world is where it's
  • 00:12:48
    kind of the equipment of like you do all
  • 00:12:49
    this design and then like you put it in
  • 00:12:50
    front of one user and you're like oh wow
  • 00:12:52
    I was wrong uh the model has that
  • 00:12:54
    feeling as well we like we try as hard
  • 00:12:56
    as we can to like have a good sense but
  • 00:12:58
    then people have their own custom data
  • 00:13:00
    sets they have their own internal use
  • 00:13:01
    they've prompted it a certain way and
  • 00:13:03
    like uh so that delies that sort of
  • 00:13:05
    almost like bodal nature of when you
  • 00:13:07
    actually put it out in the world I'm
  • 00:13:09
    curious if you feel
  • 00:13:10
    this I think there's a very real sense
  • 00:13:13
    in which models today are not
  • 00:13:15
    intelligence limited they're eval
  • 00:13:17
    limited yeah they can actually do much
  • 00:13:19
    more and be much more correct on a wider
  • 00:13:21
    range of things than they are today and
  • 00:13:23
    it's really about sort of teaching them
  • 00:13:25
    they have the intelligence you need to
  • 00:13:27
    teach them certain specific topics that
  • 00:13:29
    you know maybe weren't in their original
  • 00:13:31
    training set but they can do it if you
  • 00:13:32
    do it right yeah we've seen that all the
  • 00:13:33
    time where like um uh there was a lot of
  • 00:13:36
    like exciting AI deployments that
  • 00:13:38
    happened in like you know maybe three
  • 00:13:40
    years ago and now they're like we think
  • 00:13:41
    the new models are better but we never
  • 00:13:43
    did evals because all we were doing was
  • 00:13:44
    just shipping cool AI features three
  • 00:13:45
    years ago and like the hardest hump to
  • 00:13:47
    get people over is like let's step back
  • 00:13:49
    and like what does success actually look
  • 00:13:51
    like for you like what problem are you
  • 00:13:52
    solving like often the pm has rotated so
  • 00:13:55
    it's like somebody's inherited it and
  • 00:13:56
    then be like all right what does that
  • 00:13:58
    look like all right let's some
  • 00:13:59
    evaluations what we've learned is like
  • 00:14:01
    Claud is actually good at writing
  • 00:14:02
    evaluations and also grading them so
  • 00:14:04
    like we can automate a lot of this for
  • 00:14:05
    you but you have to tell us what success
  • 00:14:07
    looks like and then let's go and
  • 00:14:09
    actually iteratively improve our way
  • 00:14:11
    over there and like that is often like
  • 00:14:13
    the difference between like 60% of a
  • 00:14:15
    task and like 85% of task if you come
  • 00:14:17
    interview at anthropic which maybe you
  • 00:14:18
    should uh at some point maybe you're
  • 00:14:20
    happy in your R maybe not um uh you'll
  • 00:14:22
    see one of the things we do in our
  • 00:14:23
    interview process actually like make you
  • 00:14:25
    get uh a prompt from like crappy eval to
  • 00:14:28
    good and like just we want to see you
  • 00:14:30
    think but like not enough of that Talent
  • 00:14:31
    exists elsewhere so we're trying to get
  • 00:14:33
    that like if there's one thing we can
  • 00:14:35
    teach people that's probably the most
  • 00:14:36
    important thing writing eals I mean it's
  • 00:14:38
    I actually think it's going to become a
  • 00:14:40
    core skill for PMS we actually had this
  • 00:14:43
    and maybe this is like a little inside
  • 00:14:44
    baseball but I thought this was
  • 00:14:45
    interesting like internally we had our
  • 00:14:47
    research PMS who like work a lot on
  • 00:14:49
    model capabilities and model development
  • 00:14:51
    and then we had our like more like
  • 00:14:52
    product surface PMS or API PMs and we
  • 00:14:55
    ended up realizing that like the job of
  • 00:14:57
    a PM in 2024 202 5 building AI powered
  • 00:15:00
    features looks is looking more and more
  • 00:15:02
    like the former than the latter in a lot
  • 00:15:04
    of cases like uh we launched uh like our
  • 00:15:07
    uh code analysis and like basically
  • 00:15:09
    Claud can analyze csvs and write code
  • 00:15:10
    for you now and the PM there was like
  • 00:15:13
    getting it 80% of the way there and then
  • 00:15:14
    having to hand it over to the PM that
  • 00:15:15
    could write the evals then go to like
  • 00:15:17
    fine tune and like prompt I was like
  • 00:15:19
    that's actually the same role like the
  • 00:15:20
    quality of your feature is now gated on
  • 00:15:22
    how well you have done the evals and the
  • 00:15:24
    prompts and so like that PM like
  • 00:15:27
    definition is definitely just mer now
  • 00:15:29
    yeah absolutely I we we set up a boot
  • 00:15:31
    camp and like took every PM through uh
  • 00:15:35
    writing evals and like what it was like
  • 00:15:37
    difference between good and bad evals
  • 00:15:39
    and you know we're we're definitely not
  • 00:15:41
    done there we've got to keep iterating
  • 00:15:42
    and getting better on it but it is such
  • 00:15:44
    a critical part of of making a good
  • 00:15:46
    product with AI yeah as part of this
  • 00:15:49
    recruiting call for any of the people
  • 00:15:51
    who want to be good at building AI
  • 00:15:53
    product or research product in the
  • 00:15:55
    future um we can't come to your boot
  • 00:15:58
    camp Kevin so how do we develop some
  • 00:16:00
    intuition for getting good at this eval
  • 00:16:03
    and iteration L I actually think it's
  • 00:16:06
    something you can you can use the models
  • 00:16:07
    themselves for like you were talking
  • 00:16:09
    about you can ask the models at this
  • 00:16:10
    point what makes a good eval give me you
  • 00:16:12
    know I want to do this can you write me
  • 00:16:13
    a sample eval and it will it will be
  • 00:16:16
    pretty good yeah I think that's like
  • 00:16:18
    that goes a log way I think there's also
  • 00:16:20
    this question of like it's and I if you
  • 00:16:23
    listen to like everybody from like Andre
  • 00:16:25
    carpath to others who have like spent a
  • 00:16:27
    lot of time in the field like nothing
  • 00:16:28
    beats looking at at data and so like
  • 00:16:30
    people often get caught up um being like
  • 00:16:32
    well we already have these evaluations
  • 00:16:33
    and the new model is like 80% there
  • 00:16:35
    rather than 78% we can't or like
  • 00:16:37
    you know it's worse and I was like have
  • 00:16:38
    we looked at the cases where it fails
  • 00:16:40
    and you're like oh actually this was
  • 00:16:41
    better it's just our grader is not as
  • 00:16:43
    good you know or um it's funny like
  • 00:16:46
    again a little inside baseball you know
  • 00:16:47
    like every model release has the model
  • 00:16:49
    card and some of these model uh these
  • 00:16:51
    eiles we've seen like even the golden
  • 00:16:53
    answer I'm like I'm not sure a human
  • 00:16:55
    would say it or like I think that math
  • 00:16:56
    is actually a little wrong like getting
  • 00:16:58
    100% is going to be really hard cuz even
  • 00:16:59
    just grading them is very challenging so
  • 00:17:01
    like I'd encourage you to like the way
  • 00:17:03
    you build the intuition is go look at
  • 00:17:04
    the actual answers even to sample them
  • 00:17:07
    be like all right yeah maybe we should
  • 00:17:09
    evolve The evals or maybe like The Vibes
  • 00:17:11
    are good even if the eval is like tied
  • 00:17:13
    so like getting real and getting like
  • 00:17:15
    deep on the data I think matters I also
  • 00:17:18
    think it'll be really interesting to see
  • 00:17:19
    how this evolves as we go towards longer
  • 00:17:21
    form more agentic tasks because it's one
  • 00:17:24
    thing when your evals are like I gave
  • 00:17:26
    you this math thing and you were able to
  • 00:17:28
    like add four-digit numbers and get to
  • 00:17:30
    the right answer you know it's easy to
  • 00:17:32
    know what what good looks like there as
  • 00:17:34
    the models start to do more uh long form
  • 00:17:38
    more ambiguous things go get me a hotel
  • 00:17:41
    in New York City you know what's what's
  • 00:17:44
    right there a lot of it will be about
  • 00:17:46
    personalization uh you know if you ask
  • 00:17:48
    any two humans who are perfectly
  • 00:17:50
    competent they're going to do two
  • 00:17:52
    different things so your grading becomes
  • 00:17:55
    much softer and it you know it'll just
  • 00:17:57
    be interesting we I think we'll have to
  • 00:17:58
    to evolve yet again like speaking of
  • 00:18:00
    having to reinvent stuff over and over
  • 00:18:02
    again I think a lot like when you think
  • 00:18:04
    about and I think both Labs have some
  • 00:18:06
    concept of like this is what
  • 00:18:08
    capabilities look like as things evolve
  • 00:18:09
    like it looks a little bit like a career
  • 00:18:11
    ladder like what bigger and longer
  • 00:18:12
    Horizon tasks are you taking and maybe
  • 00:18:14
    like eval start looking more like
  • 00:18:16
    performance review I'm in performance
  • 00:18:17
    review season so this is the metaphor
  • 00:18:18
    that's in my head sorry but it's like
  • 00:18:20
    you know like did the model like meet
  • 00:18:22
    your expectation of like what a
  • 00:18:23
    competent human have done did it exceed
  • 00:18:25
    it because it did it twice as fast or
  • 00:18:26
    like discovered some restaurant you
  • 00:18:28
    wouldn't have known it greatly exceed
  • 00:18:29
    meets most like it starts being like
  • 00:18:31
    more nuanced than just like right or
  • 00:18:33
    wrong let alone you have humans riding
  • 00:18:36
    these eval and the models are getting to
  • 00:18:38
    the point where they can often beat
  • 00:18:39
    humans at certain tasks like people
  • 00:18:41
    prefer the model's answers to a human's
  • 00:18:43
    answers and so if your humans writing
  • 00:18:44
    your evals like yeah you know so what
  • 00:18:47
    does that
  • 00:18:48
    mean uh what okay evals are clearly the
  • 00:18:52
    key um we're going to go spend a bunch
  • 00:18:54
    of time with these models teaching
  • 00:18:56
    ourselves to write EV vals what are
  • 00:18:58
    their skills should product people be
  • 00:19:00
    learning now you you're both on that
  • 00:19:01
    learning path I think uh prototyping
  • 00:19:05
    with these models is a thing that is
  • 00:19:07
    underused like our best PMS do this
  • 00:19:09
    where we'll get into some long
  • 00:19:10
    conversation about like should the UI be
  • 00:19:12
    this or that and before our designers
  • 00:19:14
    have even like picked up their figma
  • 00:19:16
    like our our often our PMS or sometimes
  • 00:19:19
    our Engineers will be like great I
  • 00:19:21
    prompted Claude I did like an AB
  • 00:19:22
    comparison of what these two UI could
  • 00:19:24
    look like let's try them and I'm like oh
  • 00:19:25
    this is really cool I'm like play that
  • 00:19:27
    out and like we'll be able to Pro
  • 00:19:28
    protype much like a far greater variety
  • 00:19:31
    and evaluate um like on a much faster
  • 00:19:34
    scale than before so like that skill of
  • 00:19:36
    like using these tools to actually be in
  • 00:19:39
    prototyping mode I think is a really
  • 00:19:41
    really useful one that's a good one I
  • 00:19:43
    would also I you you sort of said this
  • 00:19:46
    but I think it's also going to push PMS
  • 00:19:49
    to go deeper into the tech stack yeah um
  • 00:19:51
    because it's and maybe that changes over
  • 00:19:53
    the years like that if you were doing
  • 00:19:55
    like database Tech in I don't know 2005
  • 00:19:59
    maybe it required you to be able to go
  • 00:20:01
    really deep in a different way than it
  • 00:20:02
    would if you were doing database Tech
  • 00:20:04
    now like layers of abstraction get built
  • 00:20:06
    and you maybe don't need to know all the
  • 00:20:08
    fundamentals but it's not like every PM
  • 00:20:11
    needs to be a researcher by any means
  • 00:20:12
    but I think having an appreciation for
  • 00:20:14
    it spending time and learning the
  • 00:20:17
    language and gaining intuition for how
  • 00:20:19
    this stuff works a little bit I think
  • 00:20:21
    will go a long way I think the other
  • 00:20:22
    piece like you're dealing with this like
  • 00:20:24
    stochastic non-deterministic system
  • 00:20:26
    which like eals are our best attempt to
  • 00:20:28
    do it but like product design in a world
  • 00:20:30
    where like you're not in control of what
  • 00:20:33
    the model is going to say you can try
  • 00:20:35
    and so like what are the feedback
  • 00:20:36
    mechanisms that you need to close that
  • 00:20:38
    Loop like how do you decide when like
  • 00:20:40
    the model's gone astray how do you
  • 00:20:41
    collect that feedback in a in a rapid
  • 00:20:43
    way you know like what are the guard
  • 00:20:44
    rails you want to put in like how do you
  • 00:20:47
    even know what it's doing in aggregate
  • 00:20:48
    like it's a much more like you're
  • 00:20:51
    understanding like the output of this
  • 00:20:53
    intelligence across a lot of outputs
  • 00:20:55
    over a lot of people every single day it
  • 00:20:57
    just requires a very different set that
  • 00:20:59
    like oh the bug report is you clicked on
  • 00:21:01
    the button and didn't follow the user
  • 00:21:02
    it's like that's a pretty knowable kind
  • 00:21:04
    of problem right and and maybe this will
  • 00:21:05
    change you know 5 years from now when
  • 00:21:08
    people are used to it but I think we're
  • 00:21:09
    all still in the mode of adapting to
  • 00:21:11
    this sort of
  • 00:21:13
    non-deterministic user interface
  • 00:21:15
    ourselves uh and certainly people who
  • 00:21:17
    are not you know tech people here in
  • 00:21:19
    this room working on Tech products who
  • 00:21:21
    are using AI are are definitely not used
  • 00:21:24
    to it like it goes against all of the
  • 00:21:25
    intuition that we've built up for the
  • 00:21:27
    last like 25 years of using
  • 00:21:29
    computers uh and so like the idea that
  • 00:21:32
    you're going to put in the exact same
  • 00:21:33
    things normally if you put in the exact
  • 00:21:35
    same inputs computers give you the exact
  • 00:21:37
    same outputs and that is no longer true
  • 00:21:40
    uh and it it it's not just that we have
  • 00:21:43
    to adapt to it Building Products we have
  • 00:21:44
    to also put oursel in the shoes of the
  • 00:21:47
    people who are using our products and
  • 00:21:49
    think about what this means for them and
  • 00:21:50
    there's like I mean there are downsides
  • 00:21:52
    to it there also really cool upsides and
  • 00:21:54
    so it's fun to kind of think about how
  • 00:21:56
    how you can use that to your advantage
  • 00:21:58
    in different ways I remember like we we
  • 00:22:00
    did like a lot of like rolling user
  • 00:22:02
    research at Instagram so we have like
  • 00:22:04
    the same like or researchers would bring
  • 00:22:06
    in different people every single week
  • 00:22:07
    whatever it was like prototype ready
  • 00:22:08
    would get put through it and we do the
  • 00:22:10
    same thing at anthropic but what's
  • 00:22:11
    interesting is like for those sessions
  • 00:22:13
    what would often surprise me is like how
  • 00:22:15
    users were using Instagram there's
  • 00:22:16
    something interesting about like their
  • 00:22:17
    use case or like their reaction to a new
  • 00:22:19
    feature and like now it's like half that
  • 00:22:21
    and half what the model did in that
  • 00:22:22
    situation you're like oh it did the
  • 00:22:24
    right thing this is great so there's
  • 00:22:25
    like this like very like almost like a
  • 00:22:28
    sense of Pride maybe of like when it
  • 00:22:29
    reacts well and you're like in a user
  • 00:22:31
    research environment and then like also
  • 00:22:33
    the like frustration you're like oh no
  • 00:22:35
    you misunderstood the intent and now
  • 00:22:36
    you're like 10 pages down into this
  • 00:22:38
    answer and so like it it's also like
  • 00:22:40
    maybe a little of getting Zen about like
  • 00:22:42
    letting go of control and you know
  • 00:22:44
    what's going to happen in those
  • 00:22:45
    environments yeah you have both worked
  • 00:22:48
    on these consumer experiences that
  • 00:22:49
    taught new behaviors to you know many
  • 00:22:52
    hundreds of millions of people uh
  • 00:22:55
    quickly uh these AI products are
  • 00:22:57
    happening actually faster than that
  • 00:22:59
    right and you know if if PMs and
  • 00:23:02
    Technical people don't have that much
  • 00:23:04
    intuition naturally for how to use them
  • 00:23:06
    how do you think about educating end
  • 00:23:08
    users at the scale you're both working
  • 00:23:10
    with on something that is so unintuitive
  • 00:23:13
    I mean it it is kind of amazing how fast
  • 00:23:16
    we all
  • 00:23:17
    adapt uh I was talking to somebody the
  • 00:23:20
    other day and they were telling me about
  • 00:23:21
    their first wayo ride who's ridden in a
  • 00:23:24
    whmo who rode one here yeah if you
  • 00:23:27
    haven't ridden in a whmo you're in San
  • 00:23:29
    Francisco ride AO to wherever you're
  • 00:23:31
    going when you leave here it's a magical
  • 00:23:34
    experience but they were like my first
  • 00:23:37
    30 seconds I was like oh my God watch
  • 00:23:40
    out for that
  • 00:23:40
    bicyclist right and then 5 minutes in it
  • 00:23:43
    was like oh my God I'm living in the
  • 00:23:46
    future and then 10 minutes later it was
  • 00:23:49
    like board scrolling on your phone like
  • 00:23:52
    you know how quickly we become used to
  • 00:23:54
    something that is just absolute magic
  • 00:23:56
    yeah um and I think I mean cha PT is
  • 00:24:00
    less than 2 years old MH and it was
  • 00:24:03
    absolutely mind-blowing when it exist or
  • 00:24:05
    when it when it first came and now I
  • 00:24:07
    think if we had to go back and use the
  • 00:24:08
    original whatever it was GPT 3.5 I think
  • 00:24:12
    the horror yeah yeah like no everybody
  • 00:24:14
    be like GH so dumb how could I Poss you
  • 00:24:18
    know and and you know the stuff that's
  • 00:24:20
    that's happening today that we're
  • 00:24:22
    working on that you guys are working on
  • 00:24:24
    it all feels like magic 12 months from
  • 00:24:26
    now we're going to be like can you can
  • 00:24:27
    you believe if we use that garbage
  • 00:24:30
    because it's going to I me that's how
  • 00:24:31
    fast this thing is moving but it's also
  • 00:24:33
    amazing to me how quickly people adapt
  • 00:24:35
    because I mean as much as we try and
  • 00:24:37
    bring people along like there are also
  • 00:24:39
    um there's just there's a lot of
  • 00:24:41
    excitement people understand that this
  • 00:24:43
    is like the world is moving in this
  • 00:24:45
    direction and um we've got to try and
  • 00:24:47
    make it the best possible move that we
  • 00:24:50
    can but uh it's it's happening and it's
  • 00:24:52
    happening fast one thing we're trying to
  • 00:24:53
    get better at and that's is also letting
  • 00:24:55
    the product be like educational in a
  • 00:24:57
    very little away which is like a thing
  • 00:24:59
    we did not do early and now we're
  • 00:25:01
    changing is just tell Claude more about
  • 00:25:04
    itself which was like you know it's in
  • 00:25:05
    its training set that it's you know uh
  • 00:25:08
    artificial intelligence created by
  • 00:25:09
    anthropic whatever but now we're
  • 00:25:10
    literally like and here's how you use
  • 00:25:12
    this fature we Shi because people would
  • 00:25:13
    ask and again this came from user
  • 00:25:14
    research because we'd be like they would
  • 00:25:16
    be like how do I use this thing and then
  • 00:25:18
    Cloud would be like I don't know have
  • 00:25:19
    you tried like looking at it on the
  • 00:25:21
    internet you're like no that's un
  • 00:25:22
    helpful and So like um uh like we're
  • 00:25:25
    really trying to ground it and then at
  • 00:25:26
    launch time we're like you know it's a
  • 00:25:28
    process were improving but like it's
  • 00:25:29
    it's cool to now see like this is the
  • 00:25:31
    exact link to the documentation like
  • 00:25:32
    here's how you do it like I can help you
  • 00:25:34
    stuff by St oh you're stuck I can help
  • 00:25:36
    you here so uh these things are actually
  • 00:25:38
    very good at solving uh UI problems and
  • 00:25:41
    like user confusion and like we should
  • 00:25:43
    use them more for that yeah that's got
  • 00:25:45
    to be different when you are um you know
  • 00:25:47
    trying to do like change management in
  • 00:25:48
    an Enterprise though right because
  • 00:25:50
    there's a there's a status quo for how
  • 00:25:52
    are you doing things there's
  • 00:25:53
    organizational process like how do you
  • 00:25:55
    think about educating entire
  • 00:25:57
    organizations about productivity
  • 00:25:59
    improvements or whatever else can come I
  • 00:26:00
    think the Enterprise one is really
  • 00:26:01
    interesting because like even like these
  • 00:26:04
    products have like millions and millions
  • 00:26:05
    of users but like the power users are
  • 00:26:08
    very much I think still like early
  • 00:26:10
    adopters people who like technology and
  • 00:26:11
    then there's like a you know long tail
  • 00:26:13
    whereas when you go into Enterprise
  • 00:26:14
    you're deploying to like an organization
  • 00:26:16
    that is like often there's folks who are
  • 00:26:17
    like not very non-technical and like I
  • 00:26:19
    think that's really cool actually seeing
  • 00:26:22
    fairly non-technical users get exposed
  • 00:26:24
    to like a chat powered llm for the first
  • 00:26:27
    time and then getting to see it then you
  • 00:26:29
    have the luxury of like getting to run
  • 00:26:30
    like a session where you teach them
  • 00:26:32
    about it and like have educational
  • 00:26:33
    materials um and so I think we need to
  • 00:26:36
    learn from what happens in those and
  • 00:26:38
    then say like that's what we need to do
  • 00:26:39
    to teach the next 100 million people how
  • 00:26:41
    to use these these these uis and they're
  • 00:26:43
    usually power users internally and
  • 00:26:45
    they're they're excited to teach the
  • 00:26:48
    rest of people and you know like with
  • 00:26:50
    open AI we have these custom gpts that
  • 00:26:52
    you can make and organizations make
  • 00:26:54
    thousands of them often and it's a way
  • 00:26:56
    for the power users to make something
  • 00:26:58
    that makes AI easier and like
  • 00:27:01
    immediately valuable for the people that
  • 00:27:03
    might not know how to use it otherwise
  • 00:27:05
    um so like that's one cool thing you
  • 00:27:07
    find the pockets of power users and they
  • 00:27:09
    actually will sort of be
  • 00:27:11
    evangelists I I have to ask you then
  • 00:27:14
    because you you know your organizations
  • 00:27:16
    are both like all power users right so
  • 00:27:18
    you know you're living in your little
  • 00:27:19
    pocket of the future uh I'll ask about
  • 00:27:22
    one thing but feel free to redirect Mike
  • 00:27:24
    how am I supposed to use computer use
  • 00:27:26
    this is amazing like what are you guys
  • 00:27:27
    doing
  • 00:27:28
    yeah well internally like we're I mean
  • 00:27:31
    this to Kevin's earlier comment around
  • 00:27:32
    like when is it going to be ready all
  • 00:27:34
    right like go this like it was pretty
  • 00:27:36
    late breaking like we like had
  • 00:27:38
    conviction that it was like this is like
  • 00:27:40
    good and like we want to put this down
  • 00:27:41
    like it's early still and it's like
  • 00:27:43
    still going to make mistakes but like
  • 00:27:44
    how do we do this as well the funniest
  • 00:27:46
    use case like while we were beta testing
  • 00:27:47
    it was like somebody was like I wonder
  • 00:27:49
    if I can get it to order us a pizza and
  • 00:27:50
    like it did and they're like great
  • 00:27:51
    there's do like the the moment where
  • 00:27:53
    Domino's shows up at your office and it
  • 00:27:55
    was ordered entirely by AI is like a
  • 00:27:57
    very was a very cool like seminal moment
  • 00:27:58
    and then we're like oh but it's Domino's
  • 00:28:00
    but like you know like but like it was
  • 00:28:02
    definitely like amazing yeah uh but it
  • 00:28:05
    was AI you know so it was all it was it
  • 00:28:06
    was it was good it also like ordered
  • 00:28:08
    quite a bit of pizza so it was like
  • 00:28:09
    maybe hungrier than intended uh some
  • 00:28:11
    early things that we're seeing that we
  • 00:28:12
    think are really interesting one is UI
  • 00:28:14
    testing which is like I was like at
  • 00:28:16
    Instagram we had basically no UI tests
  • 00:28:18
    because they're hard to write they're
  • 00:28:19
    like they're brittle um and they're like
  • 00:28:21
    often like a little bit like oh like we
  • 00:28:23
    moved this button around and like it
  • 00:28:24
    should still pass that was the point of
  • 00:28:26
    the PR but like now it's going to fail
  • 00:28:27
    we're going to have to like do this
  • 00:28:28
    whole other snapshot um and like early
  • 00:28:30
    signs are like computer just works
  • 00:28:31
    really well for like hey does it work as
  • 00:28:33
    intended does it do the thing that you
  • 00:28:34
    want it to do and I think that's like
  • 00:28:36
    been been very very interesting and then
  • 00:28:38
    what we're starting getting into too is
  • 00:28:39
    like what are the agentic things that
  • 00:28:40
    just like involve a lot of like data
  • 00:28:43
    manipulation so we're looking at it with
  • 00:28:44
    our support teams and our finance teams
  • 00:28:46
    around like those PR forms are going to
  • 00:28:48
    fill themselves but like it's very
  • 00:28:50
    repetitive you of have data in one Silo
  • 00:28:52
    you want to put it in a different Silo
  • 00:28:53
    and it just requires like human time
  • 00:28:55
    like I keep using the word drudgery when
  • 00:28:57
    I talk about computer like can we
  • 00:28:58
    automate the drudgery so you can focus
  • 00:29:00
    on the creative stuff and not like the
  • 00:29:02
    you know 30 clicks to do one single
  • 00:29:06
    thing Uh Kevin I I think we have a lot
  • 00:29:09
    of teams that are um experimenting with
  • 00:29:12
    o1 you can obviously do much more
  • 00:29:13
    sophisticated things you also can't use
  • 00:29:15
    it as a one forone replacement if you're
  • 00:29:17
    already using right one of the you know
  • 00:29:19
    gbd 40 models or whatever in uh in your
  • 00:29:22
    application like can you give us some
  • 00:29:24
    guidance what are you guys doing with it
  • 00:29:26
    internally so I think one thing
  • 00:29:29
    that people maybe don't realize that
  • 00:29:31
    actually a lot of the most sophisticated
  • 00:29:33
    customers of ours are doing and that
  • 00:29:35
    we're certainly doing internally is it's
  • 00:29:36
    not really about one model for any
  • 00:29:39
    particular thing you end up putting
  • 00:29:41
    together sort of workflows and
  • 00:29:43
    orchestration between models and so you
  • 00:29:45
    use them for what they're good at 0
  • 00:29:47
    one's really good at reasoning but it
  • 00:29:48
    also takes a little bit of time to think
  • 00:29:50
    and it's not multimodal and you know has
  • 00:29:52
    other limitations you Define reasoning
  • 00:29:54
    for the group I realize it's a basic
  • 00:29:55
    question but yeah so uh we people are I
  • 00:29:59
    think pretty used to the concept the
  • 00:30:01
    like scaling pre-training concept you go
  • 00:30:04
    gpt2 3 four five whatever and you're
  • 00:30:07
    doing bigger and bigger runs on
  • 00:30:09
    pre-training these models are getting
  • 00:30:10
    you know smarter and smarter um like
  • 00:30:13
    they or rather maybe they know more and
  • 00:30:15
    more but they're kind of like system one
  • 00:30:18
    thinking right it's it's you ask it a
  • 00:30:20
    question you immediately get an answer
  • 00:30:22
    it's like text completion yeah sort of
  • 00:30:24
    if I ask you me asking you questions
  • 00:30:26
    right now and you just have to stream
  • 00:30:28
    one Tok at a time keep going don't think
  • 00:30:30
    it's amazing actually how much human
  • 00:30:33
    like your intuition about how other
  • 00:30:34
    humans work will often like help you in
  • 00:30:38
    intuiting about how these models work um
  • 00:30:40
    you know you asked me a question I got
  • 00:30:42
    off onto the wrong like sentence it's
  • 00:30:44
    hard to recover the models totally do
  • 00:30:46
    the same thing um but uh so you've got
  • 00:30:50
    that that sort of larger and larger
  • 00:30:52
    pre-training 01 is actually a different
  • 00:30:56
    way of scaling
  • 00:30:58
    intelligence by doing it at uh at query
  • 00:31:01
    time basically so instead of system one
  • 00:31:04
    thinking I ask you a question and
  • 00:31:05
    immediately tries to give you an answer
  • 00:31:07
    it'll pause same thing I would you know
  • 00:31:09
    you would do if I asked you a question I
  • 00:31:10
    said solve this Sudoku do this New York
  • 00:31:13
    Times connections puzzle you you would
  • 00:31:15
    start going okay these words how do they
  • 00:31:17
    group together okay these might be these
  • 00:31:19
    four well no I'm not sure could be you
  • 00:31:22
    know you're you're like forming
  • 00:31:23
    hypotheses using what you know to refute
  • 00:31:26
    these hypothesis or affirm them and then
  • 00:31:29
    from that continuing to reason on it's
  • 00:31:32
    how it's How scientific breakthroughs
  • 00:31:34
    are made it's how we answer hard
  • 00:31:36
    questions um and so this is about
  • 00:31:38
    teaching the models to do it and right
  • 00:31:40
    now you know they'll think for 30 or 60
  • 00:31:43
    seconds before they answer imagine what
  • 00:31:45
    happens if they can think for five hours
  • 00:31:47
    or five days um so it's basically a new
  • 00:31:50
    way to scale intelligence and we feel
  • 00:31:53
    like we're just at the very beginning
  • 00:31:55
    you know we're at the like gpt1 phase of
  • 00:31:58
    um of this new form of reasoning um but
  • 00:32:01
    in the same way it's not you don't use
  • 00:32:03
    it for everything right there are
  • 00:32:04
    sometimes when you ask me a question you
  • 00:32:05
    don't want me to wait 60 seconds you I
  • 00:32:07
    should just give you an answer um so we
  • 00:32:10
    end up using our models in a bunch of
  • 00:32:13
    different ways together so for example
  • 00:32:15
    like cyber security you would think not
  • 00:32:18
    really a use case for models they can
  • 00:32:20
    hallucinate that seems like a bad place
  • 00:32:21
    to hallucinate but you can a like find
  • 00:32:25
    tune a model to be good at certain Tas
  • 00:32:27
    asks and then you can fine-tune models
  • 00:32:30
    to be very precise about the kinds of
  • 00:32:32
    inputs and outputs that they expect and
  • 00:32:34
    have these models start working in in
  • 00:32:36
    concert together and you know models
  • 00:32:39
    that are checking the outputs of other
  • 00:32:40
    models realizing when something doesn't
  • 00:32:42
    make sense asking it to try again um and
  • 00:32:47
    uh so like that ends up being how we get
  • 00:32:50
    a ton of value out of our own models
  • 00:32:52
    internally it's like specific use cases
  • 00:32:56
    uh and or orchestrations of models
  • 00:32:59
    together designed sort of working in
  • 00:33:00
    concert to do specific tasks which again
  • 00:33:03
    going back to like reasoning about how
  • 00:33:04
    we work as humans how do we do complex
  • 00:33:07
    things as humans you have different
  • 00:33:08
    people who often have different skill
  • 00:33:10
    sets and they work together to
  • 00:33:11
    accomplish a hard
  • 00:33:13
    task I can't let you guys get away
  • 00:33:16
    without without telling us something
  • 00:33:18
    about the future and what's coming and
  • 00:33:20
    so um you don't have to give us release
  • 00:33:23
    dates I understand you don't know but uh
  • 00:33:26
    if you if you look out I I think the
  • 00:33:28
    furthest anyone can look out in AI right
  • 00:33:29
    now is like well tell me if you can see
  • 00:33:31
    the future but like let's say like 6
  • 00:33:33
    months 12 months like what's an
  • 00:33:35
    experience that you imagine is going to
  • 00:33:37
    be possible or prevalent I think a lot
  • 00:33:40
    about um like to Breaking the well I
  • 00:33:43
    think a lot about this all the time but
  • 00:33:45
    like the um two maybe two words to be
  • 00:33:47
    like plant seeds in in everybody's mind
  • 00:33:50
    like one is proactivity like how do the
  • 00:33:51
    models become more proactive like once
  • 00:33:53
    they know about you and they're
  • 00:33:54
    monitoring like they're reading your
  • 00:33:56
    email in a good not creepy way and
  • 00:33:58
    they're like uh because you authorized
  • 00:33:59
    them to and then they like you know spot
  • 00:34:02
    an interesting Trend or you start your
  • 00:34:03
    day with something that's a like um like
  • 00:34:05
    a proactive like uh recap of what's
  • 00:34:08
    going on some conversations you're going
  • 00:34:09
    to have I I I prid some research for you
  • 00:34:11
    hey your next meeting is coming up like
  • 00:34:13
    here's what you might want to talk about
  • 00:34:14
    I saw you have this like presentation
  • 00:34:16
    coming up here's the first draft that I
  • 00:34:17
    put together like that kind of
  • 00:34:18
    proactivity I think is going to be
  • 00:34:20
    really really powerful and then the
  • 00:34:21
    other part is being more asynchronous so
  • 00:34:23
    like uh I think o1 is like early UI in
  • 00:34:27
    this exploration which is like it's
  • 00:34:29
    going to do a lot and it's going to tell
  • 00:34:30
    you kind of what it's going to do along
  • 00:34:31
    the way and like you can sit there and
  • 00:34:33
    wait for it but you could also like be
  • 00:34:34
    like it's going to think for a while I'm
  • 00:34:35
    going to go like do something else maybe
  • 00:34:37
    tab back maybe it like can tell me when
  • 00:34:39
    it's done like yeah I expanding the time
  • 00:34:41
    Horizon both in terms of like you didn't
  • 00:34:43
    ask a question it just told you
  • 00:34:44
    something I think that's going to be
  • 00:34:45
    interesting and then you did ask a
  • 00:34:47
    question and you're going to be like
  • 00:34:48
    great like I'm going to go reason about
  • 00:34:50
    it I'm going to go research it I might
  • 00:34:52
    have to ask another human about it like
  • 00:34:53
    and then I'm going to like maybe come up
  • 00:34:55
    with my first answer I'm going to vet
  • 00:34:56
    that answer you'll hear back from me in
  • 00:34:58
    like an hour like Breaking Free of those
  • 00:35:00
    like uh constraints of like expecting an
  • 00:35:02
    answer immediately I think will let you
  • 00:35:04
    do things like hey I have this like
  • 00:35:06
    whole like mini project plan like go
  • 00:35:08
    flesh it out or like not just like I
  • 00:35:10
    want you to like change this one thing
  • 00:35:11
    on the screen but like fix this bug for
  • 00:35:13
    me like take my PRD and like adapt it
  • 00:35:16
    for these new market conditions like
  • 00:35:18
    adapt it for these three different
  • 00:35:19
    marketing conditions that emerg like
  • 00:35:20
    being able to push those Dimensions I
  • 00:35:22
    think is what I'm personally most
  • 00:35:23
    excited about on the product side yeah I
  • 00:35:26
    completely agree with all of that that
  • 00:35:28
    um and it's the models are going to get
  • 00:35:31
    smarter at an accelerating rate I think
  • 00:35:33
    which is also part of how all of that uh
  • 00:35:35
    comes to pass another thing that will be
  • 00:35:38
    really exciting is seeing the models
  • 00:35:40
    able to interact in all the same ways
  • 00:35:42
    that we as humans interact you know
  • 00:35:44
    right now you mostly type to these
  • 00:35:46
    things and you know I mostly type to a
  • 00:35:48
    lot of my friends on WhatsApp and other
  • 00:35:49
    things but I also speak I also can see
  • 00:35:54
    and uh we just we launched this advanced
  • 00:35:57
    voice mode Rel relatively recently I was
  • 00:35:59
    in uh I was in Korea and
  • 00:36:02
    Japan having
  • 00:36:04
    conversations and I would just I would
  • 00:36:06
    often be with somebody with whom I had
  • 00:36:09
    no common language whatsoever before
  • 00:36:11
    this we could not have said a word to
  • 00:36:13
    each other and instead I was like Hey
  • 00:36:16
    chat gbt I want you to act as a
  • 00:36:17
    translator when I say something in
  • 00:36:19
    English I want you to say it in Korean
  • 00:36:21
    and when you hear something in Korean
  • 00:36:23
    say it back to me in English and all of
  • 00:36:24
    a sudden I had this Universal translator
  • 00:36:26
    and I was having business conversations
  • 00:36:29
    with another person uh and it was
  • 00:36:32
    magical and you think what that can do
  • 00:36:35
    like not just in a business context but
  • 00:36:36
    think about people's willingness to
  • 00:36:38
    travel to new places if you don't ever
  • 00:36:39
    have to be worried about not speaking
  • 00:36:41
    the language and you've got this like
  • 00:36:42
    Star Trek Universal translator in your
  • 00:36:44
    pocket you know and so experiences like
  • 00:36:47
    that I think it's going to become
  • 00:36:49
    commonplace fast but it's magical and
  • 00:36:51
    I'm excited about that in combination
  • 00:36:54
    with all the stuff Mike was just
  • 00:36:56
    saying oh one of my favorite pastimes
  • 00:37:00
    now just you know since uh voice mode
  • 00:37:03
    release is actually watching there's a
  • 00:37:05
    genre of Tik Tok of well this just
  • 00:37:07
    speaks to how old I am like there's a
  • 00:37:09
    genre of Tik Tok where you just like uh
  • 00:37:11
    it's just young people talking to voice
  • 00:37:13
    mode like pouring their heart out using
  • 00:37:15
    it all these ways where I'm like oh my
  • 00:37:17
    God like there's this old term being
  • 00:37:19
    like digitally native or mobile native
  • 00:37:21
    and I'm like I like pretty strongly
  • 00:37:24
    believe in this AI thing and I would not
  • 00:37:26
    think to interact in this way but people
  • 00:37:29
    who are 14 years old are like well I
  • 00:37:31
    expect the AI to be able to do that and
  • 00:37:33
    I love that have you ever given it to
  • 00:37:35
    your kids uh I haven't yet my kids are
  • 00:37:37
    like five and seven Kevin knows them so
  • 00:37:39
    we but we'll get there I mean mine are
  • 00:37:41
    eight and 10 but like on a car ride
  • 00:37:43
    they'll be like can I talk to chat GPT
  • 00:37:45
    yes and they will ask it the most
  • 00:37:47
    bizarre things they will just have
  • 00:37:49
    weirdo conversations with it but they're
  • 00:37:52
    perfectly happy talking to an AI yeah
  • 00:37:54
    actually one of my favorite experiences
  • 00:37:56
    and maybe we'll close and ask you for
  • 00:37:57
    like the most surprising Behavior kids
  • 00:37:59
    or not is uh um like when my parents
  • 00:38:04
    read to me like I got L I was lucky if I
  • 00:38:07
    got to choose the book and it wasn't my
  • 00:38:08
    dad being like we're going to read this
  • 00:38:10
    physics study I'm interested in right my
  • 00:38:13
    kids I don't know if it's just like
  • 00:38:14
    parenting in the Bay Area but my kids
  • 00:38:16
    are like okay Mom make the images right
  • 00:38:19
    I want to tell a story about the dragon
  • 00:38:22
    unicorn in this setting I'm going to
  • 00:38:23
    tell you exactly how it's going to
  • 00:38:25
    happen create it in real time and I'm
  • 00:38:27
    like like that's a big ask I'm glad you
  • 00:38:30
    believe and like know that's possible
  • 00:38:32
    but it's it's a wild way to like create
  • 00:38:34
    your own entertainment too what is the
  • 00:38:36
    um most surprising Behavior you've seen
  • 00:38:38
    in your own products
  • 00:38:41
    recently I think it's a behavior and a
  • 00:38:45
    relationship like people really start
  • 00:38:49
    understanding the Nuance of like what
  • 00:38:51
    Claud is we just have like a a new
  • 00:38:53
    revenge of the model and it's like they
  • 00:38:55
    get the Nuance like it's like I guess
  • 00:38:57
    the behavor behavior is like almost
  • 00:38:58
    befriending or like really like
  • 00:39:00
    developing a lot of like 2-way empathy
  • 00:39:02
    around what's happening and then like
  • 00:39:03
    the is like oh you know the new model
  • 00:39:05
    like felt like it was smarter but maybe
  • 00:39:07
    a little more distant but maybe you know
  • 00:39:09
    and it's like it's like that kind of
  • 00:39:10
    like Nuance which like you like I it's
  • 00:39:13
    it's given me as a product person a lot
  • 00:39:15
    more empathy around like you're not just
  • 00:39:16
    shipping a product you're shipping like
  • 00:39:19
    intelligence and intelligence and
  • 00:39:21
    empathy are like what makes like
  • 00:39:23
    interpersonal relationships important
  • 00:39:24
    and if somebody show up and they're like
  • 00:39:25
    I was upgraded like I say know I scored
  • 00:39:28
    2% higher on this math score but like
  • 00:39:30
    I'm Different in this way you'd be like
  • 00:39:31
    oh I got to adapt now and maybe you know
  • 00:39:33
    be a little worried about it so like
  • 00:39:35
    that that's been an interesting Journey
  • 00:39:37
    for me like understanding the mentality
  • 00:39:39
    for people that when they're using our
  • 00:39:40
    products yeah Model Behavior is
  • 00:39:43
    absolutely a product role like the the
  • 00:39:46
    personality of the model is is key and
  • 00:39:49
    there are interesting questions around
  • 00:39:50
    how much should it customize uh versus
  • 00:39:52
    how much should you know open AI have
  • 00:39:54
    one personality and Claude has some
  • 00:39:56
    distinct personality
  • 00:39:58
    and are people going to use one versus
  • 00:39:59
    the other because they happen to like it
  • 00:40:01
    I mean that's that's a very human thing
  • 00:40:03
    right we're friends with different
  • 00:40:04
    people because we happen to like
  • 00:40:05
    different people better than others and
  • 00:40:06
    it's um that's an interesting thing to
  • 00:40:09
    to think about we did something recently
  • 00:40:13
    um and it sort of went viral on Twitter
  • 00:40:16
    people started asking the model based on
  • 00:40:19
    everything you know about me based on
  • 00:40:20
    all of our past interactions you know
  • 00:40:22
    what what would you say about me and the
  • 00:40:25
    model will will respond and it will like
  • 00:40:27
    give you give it a description of what
  • 00:40:29
    it you know kind of thinks based on all
  • 00:40:31
    of your past
  • 00:40:32
    interactions and it is this sort of
  • 00:40:35
    you're you're starting to interact with
  • 00:40:36
    it almost like some sort of person or
  • 00:40:39
    entity in interesting ways and um
  • 00:40:42
    anyways it was fascinating to see
  • 00:40:43
    people's reaction to
  • 00:40:46
    that Kevin Mike thank you so much for
  • 00:40:48
    doing this and giving us a glimpse into
  • 00:40:50
    the future thank you so much
Tags
  • AI
  • Product Development
  • User Feedback
  • Proactivity
  • AI Capabilities
  • Product Managers
  • User Experience
  • Communication
  • Machine Learning
  • Future Technology