How Airbyte Uses AI to Build Connectors



  • 00:00:26
    all right I think this works all right
  • 00:00:28
    we're good to go hi everybody it's great
  • 00:00:30
    to see you again fast forward from the
  • 00:00:32
    front door my name is teao I work over
  • 00:00:34
    here at airite we are a data Movement
  • 00:00:37
    platform you're going to be learning all
  • 00:00:38
    about tonight along with our partners
  • 00:00:40
    fractional who you're going to learn all
  • 00:00:41
    about them tonight as well thank you for
  • 00:00:43
    making the time to join us tonight we
  • 00:00:44
    hope you're enjoying the food the drinks
  • 00:00:46
    the company um our our aim here is to
  • 00:00:49
    make this a really fun night and a
  • 00:00:50
    really informative night especially
  • 00:00:52
    because many of you are probably still
  • 00:00:54
    about to start the recovery process from
  • 00:00:56
    disrupt um so we're excited to kind of
  • 00:00:59
    be closing out with you all for the day
  • 00:01:02
    um way we're going to do tonight nti's
  • 00:01:04
    gonna go ahead and come in and give his
  • 00:01:06
    presentation we're going to do our
  • 00:01:07
    fireside chat with Eddie and de we're
  • 00:01:09
    going to learn more about fractional and
  • 00:01:10
    how you can think about uh the AI
  • 00:01:13
    projects that you're working on the dos
  • 00:01:14
    the don'ts uh and really the aim for
  • 00:01:16
    tonight is not only to just be us
  • 00:01:19
    talking here and you listening we want
  • 00:01:21
    this to be interactive so if you have ai
  • 00:01:23
    projects that you're working on it's
  • 00:01:25
    like Eddie and everyone else from a
  • 00:01:26
    fractional perspective give their
  • 00:01:27
    thoughts if you want to Pepper Nati with
  • 00:01:29
    personal questions you can do that stuff
  • 00:01:31
    too um but really the night is meant to
  • 00:01:33
    be all about you uh so we're going to
  • 00:01:36
    try to live up to that but with that
  • 00:01:39
    being said I'm going to shut up and go
  • 00:01:40
    to the back here thank you all for
  • 00:01:41
    joining us again natik I'm gonna hand it
  • 00:01:44
    over to
  • 00:01:48
  • 00:01:50
    you hello
  • 00:01:52
    hello all right a
  • 00:01:55
    sec I'm clumsy so all right uh my goal
  • 00:02:00
    today is not to sell you all on airbit
  • 00:02:03
    but to put some context on yeah a few
  • 00:02:06
    minutes on what we are doing and why we
  • 00:02:10
    try to do co-pilot um style AI assist in
  • 00:02:14
    our Dev tools what we've got as a result
  • 00:02:17
    what we've learned um how you can use it
  • 00:02:20
    to grab data for your projects and then
  • 00:02:22
    we're going to talk with Eddie and Eddie
  • 00:02:23
    is going to talk to us about how to
  • 00:02:26
    actually um be better at building with
  • 00:02:29
    AI um and avoid common
  • 00:02:33
  • 00:02:34
    so airite we started just a few years
  • 00:02:38
    back we're almost four years
  • 00:02:40
    oldish and the slide that Michelle our
  • 00:02:43
    CEO shows to every new hire says that
  • 00:02:46
    our mission is to make data available to
  • 00:02:50
    anyone and anywhere if you own your data
  • 00:02:53
    and it's in any systems databases apis
  • 00:02:55
    you should be able to use your data
  • 00:02:57
    that's why there's a bunch of companies
  • 00:02:58
    like zap here or like University cases
  • 00:03:00
    right and turns out to fulfill this
  • 00:03:02
    Mission you know things get much easier
  • 00:03:05
    if you have Frameworks that can read
  • 00:03:07
    data from arbitrary apis that's what my
  • 00:03:10
    team is doing I am an engineering
  • 00:03:12
    manager on API extensibility team we're
  • 00:03:15
    doing Frameworks that power all of our
  • 00:03:17
  • 00:03:19
    connectors um in 2021 2022 we had a
  • 00:03:23
    python cdk um connector developer kit
  • 00:03:25
    framework we had around a 100 connectors
  • 00:03:29
    at that time and we thought okay well
  • 00:03:31
    how do we scale that we have 20
  • 00:03:33
    Engineers supporting 10 certified
  • 00:03:35
    hardcore connectors Community
  • 00:03:37
    contributes connectors but how do we
  • 00:03:38
    maintain all that so in 2023 we made a
  • 00:03:43
    graphical user interface around our low
  • 00:03:45
    code no code framework that encapsulates
  • 00:03:48
    a connector in a basically a bunch of
  • 00:03:51
    yaml kubernetes resource definition
  • 00:03:53
    style and that's great people started
  • 00:03:55
    being able to make a connector in an
  • 00:03:57
    hour versus you know days but it's still
  • 00:04:01
    a cool hour or more so in 2024 we've
  • 00:04:03
    released AI assist which is essentially
  • 00:04:06
    co-pilot for our graphical user
  • 00:04:07
  • 00:04:08
  • 00:04:10
    and I want to show you how it works I'm
  • 00:04:14
    99% confident is going to be fine but
  • 00:04:16
    I'm going to do it one-handed so let's
  • 00:04:20
    see just to give you a sense of what
  • 00:04:22
    this thing is so I figured you know what
  • 00:04:24
    are we going to build today we already
  • 00:04:27
    have a lot of connectors so finding one
  • 00:04:29
    that we don't have was a little bit of a
  • 00:04:31
    challenge and my CFO was walking nearby
  • 00:04:36
    and I thought hey juel do you think it's
  • 00:04:38
    cool if I use our financial data for a
  • 00:04:41
    demo for a Meetup and he said you signed
  • 00:04:45
    an NDA you
  • 00:04:52
    stupid my cash was about to be warmed up
  • 00:04:55
    interesting okay this might take us a
  • 00:04:57
    minute so we might as well continue and
  • 00:05:01
    give it a few seconds while that is
  • 00:05:09
    happening yeah let's almost
  • 00:05:13
    smoothly so we're going to return to
  • 00:05:15
    that but to give you
  • 00:05:17
    perspective data transfer companies are
  • 00:05:19
    only as good as the connector coverage
  • 00:05:22
    that we have if we only support 200 apis
  • 00:05:24
    you have your own API does your own
  • 00:05:26
    thing you want your data we don't
  • 00:05:28
    support it you're not going to use us
  • 00:05:30
    so how are we doing well you know we've
  • 00:05:32
    released AI assist and connector builder
  • 00:05:34
    in like
  • 00:05:35
    2023 um we've
  • 00:05:38
    added what approximately 100 connectors
  • 00:05:42
    from August to the end of October and if
  • 00:05:46
    like our total is less than 400 that's a
  • 00:05:49
    lot of
  • 00:05:50
    connectors how is this live demo thing
  • 00:05:53
    doing oh okay so this roughly is our
  • 00:05:58
    connector Builder and and it needs to
  • 00:06:00
    know things about your API it needs to
  • 00:06:02
    know your base URL which a assist
  • 00:06:04
    guessed for me it needs to know how to
  • 00:06:06
    authenticate and it thinks that this API
  • 00:06:08
    is using beer
  • 00:06:10
  • 00:06:11
    which I'm going to paste
  • 00:06:15
    save and we have streams of data so
  • 00:06:18
    transactions is obviously the most
  • 00:06:20
    interesting it figured out where
  • 00:06:22
    transactions live what HTTP method to
  • 00:06:24
    use um where transaction records are
  • 00:06:29
    within the HTTP
  • 00:06:31
    response um it figured the pagination it
  • 00:06:34
    figured where in the response is the
  • 00:06:36
    cursor to the next page let's see if it
  • 00:06:39
    works and if I actually pasted the
  • 00:06:47
    token come
  • 00:06:51
    on here it is okay I'm not going to show
  • 00:06:54
    you the actual records but what's
  • 00:06:55
    important is uh 100 records per page
  • 00:06:58
    five pages test read is successful
  • 00:07:00
    meaning I only had to paste my
  • 00:07:02
    documentation URL and my API token and
  • 00:07:05
    it figured out um how to get my data in
  • 00:07:09
    fact I did this a little bit earlier
  • 00:07:11
    today and got a bunch of streams and
  • 00:07:14
    then I used this little button here to
  • 00:07:17
    make a pull request and we have a pull
  • 00:07:21
    request in our GitHub I'm going to show
  • 00:07:22
    you that in a little bit that's how we
  • 00:07:26
    are growing from 200 something
  • 00:07:27
    connectors to 400 something
  • 00:07:30
    connectors within just these few
  • 00:07:34
  • 00:07:36
    now we tried three times to get this
  • 00:07:40
    thing right it was a hobby project of
  • 00:07:42
    one of our Engineers like oh LMS are
  • 00:07:44
    cool let's build something with LMS um
  • 00:07:46
    didn't quite work
  • 00:07:48
    out the first attempt was very naive
  • 00:07:51
    Eddie will walk you through some of the
  • 00:07:53
    details but we thought you know what
  • 00:07:54
    Chad gpts are cool let's just let's
  • 00:07:56
    paste the docks give the docks to Chad
  • 00:07:58
    GPT and say hey you output the Manifest
  • 00:08:00
    file of the connector and it works on
  • 00:08:03
    super simple things like Pokey API or
  • 00:08:05
    like exchange rate API some something
  • 00:08:07
    super simple with one or two streams of
  • 00:08:09
    data doesn't work on anything serious
  • 00:08:11
    cannot figure out authentication then we
  • 00:08:13
    thought okay well it is very difficult
  • 00:08:15
    for a l large language model to Output
  • 00:08:18
    the Manifest in our format it doesn't
  • 00:08:20
    know the constraints the schema but
  • 00:08:23
    there's a lot of open apis specs on the
  • 00:08:25
    internet so what if we ask it to First
  • 00:08:28
    generate open API spec and then from
  • 00:08:30
    that we're going to euristic generate
  • 00:08:32
    the Manifest it's also extremely
  • 00:08:34
    brittle and then we decided to work with
  • 00:08:37
    fractional on this co-pilot approach
  • 00:08:40
    this works but it's not just a single
  • 00:08:43
  • 00:08:44
    call it's not just prompt engineering um
  • 00:08:48
    this diagram is probably not very
  • 00:08:50
    visible right but there's basically four
  • 00:08:52
    levels nested logic of how we figure out
  • 00:08:56
    what authentication scheme a given API
  • 00:08:58
    uses given its docs open API spec and if
  • 00:09:03
    we don't have enough information there
  • 00:09:05
    or if there's no open API spec we would
  • 00:09:07
    attempt Googling and scraping Ser
  • 00:09:09
    results uh from Google to figure out how
  • 00:09:12
  • 00:09:14
    authenticate so Core lesson stop magic
  • 00:09:17
    is just a lot a lot a lot of TDS
  • 00:09:19
  • 00:09:20
    engering and the thing there is all of
  • 00:09:24
    that time unless your users are actually
  • 00:09:26
    benefiting from your software you're not
  • 00:09:28
    learning anything and just having a
  • 00:09:30
    prototype doesn't give you much you got
  • 00:09:32
    to figure out where you host it how you
  • 00:09:34
    monitor it how you evaluate it how you
  • 00:09:36
    monitor your budget burn how you figure
  • 00:09:38
    out when it moves out of beta
  • 00:09:41
    Etc so we figured airb is not just an
  • 00:09:45
    open- Source graphic user interface data
  • 00:09:48
    pipelines tool or ETL uh my personal big
  • 00:09:51
    thing here is to make uh system that
  • 00:09:55
    gives you your data in python or in CLI
  • 00:09:58
    you don't have to use air proper you
  • 00:09:59
    don't have to use our graphical user
  • 00:10:01
    interfaces to get your data if you have
  • 00:10:03
    hobby projects or things that you do on
  • 00:10:05
    weekends we should be able to help which
  • 00:10:07
    should be handy if you decide to
  • 00:10:09
    prototype stuff with Eddie and
  • 00:10:11
    fractional later
  • 00:10:12
    on um so what we can do um we have by
  • 00:10:15
    airb which is a CLI or python library
  • 00:10:18
    that can read data again from anywhere
  • 00:10:20
    and write it to local du dbcash and then
  • 00:10:23
    we have a bunch of destinations
  • 00:10:24
    including a bunch of vector destinations
  • 00:10:26
    and PG Vector Bine cone and such
  • 00:10:29
    yeah very interesting time let's build
  • 00:10:31
    some stuff together now I'm going to
  • 00:10:33
    pass it to Eddie um and see what we want
  • 00:10:37
    to talk about next
  • 00:10:41
  • 00:11:00
    are you moderating this section cool
  • 00:11:02
    well hello everybody uh while we're
  • 00:11:04
    waiting for Teo my name is Eddie I'm the
  • 00:11:06
    CTO at at fractional AI uh where uh Dev
  • 00:11:10
    shop that is specifically focused on
  • 00:11:12
    building challenging production
  • 00:11:14
    applications that that use llms in some
  • 00:11:16
    way so you know we were're uh we helped
  • 00:11:20
    build the the AI assist feature you just
  • 00:11:23
    saw which is like a good good example
  • 00:11:25
    when you're trying to dig into the weeds
  • 00:11:26
    of what some of these production AI
  • 00:11:28
    projects look like but we've also seen
  • 00:11:30
    over a hundred of these projects at this
  • 00:11:31
    point and um yeah I'm excited to talk
  • 00:11:35
    about all things about what it really
  • 00:11:36
    means to put put AI projects into
  • 00:11:39
    production that's for you
  • 00:11:41
    ni um I'm just going to be yelling
  • 00:11:44
    because you two are the most important
  • 00:11:45
    people here and from this side of room
  • 00:11:48
    you all are very important
  • 00:11:49
    obviously um I think where I want to
  • 00:11:52
    start Eddie you already kind of gave us
  • 00:11:55
    a little bit of background fractional uh
  • 00:11:57
    on in terms of working on different
  • 00:11:58
    kinds of projects
  • 00:11:59
    I want to go a little bit more
  • 00:12:01
    into the AI assistant when you thought
  • 00:12:04
    about the kinds of kinds of ways you can
  • 00:12:07
    incorporate AI for new projects like I
  • 00:12:09
    think there's a lot of people who are
  • 00:12:10
    looking around where should I be
  • 00:12:12
    implementing AI um ni you you talk a
  • 00:12:15
    little bit about how we want to bring AI
  • 00:12:18
    into our own workflow what's your first
  • 00:12:20
    advice for anyone who's thinking about
  • 00:12:22
    how can I bring AI into my
  • 00:12:26
    Enterprise it's a good question um I
  • 00:12:28
    think there's like a lot of ideas for
  • 00:12:29
    way AI can help um but that things often
  • 00:12:32
    get stuck early in the ideation process
  • 00:12:34
    or at the PCC phase I think one critical
  • 00:12:37
    thing that happened here was a lot of
  • 00:12:41
    the best opportunities for AI exist in a
  • 00:12:43
    manual workflow that you're already
  • 00:12:45
    running somewhere today uh people were
  • 00:12:47
    already building API connectors here and
  • 00:12:50
    so it was very clear like what was hard
  • 00:12:52
    you had a clear set of input output
  • 00:12:54
    pairs to care about you had clear
  • 00:12:56
    historical data you understood your
  • 00:12:57
    domain and could measure the value of
  • 00:13:00
    this thing right this took us quite a
  • 00:13:02
    while to build um if you're going to
  • 00:13:03
    spend all this time building something
  • 00:13:05
    you got to kind of know that there's a
  • 00:13:06
    there there that it's is like going to
  • 00:13:07
    save a lot of people a lot of real time
  • 00:13:09
    and not just be some speculative um
  • 00:13:11
    thing so that would be like the number
  • 00:13:13
    one thing I would focus on is this a
  • 00:13:16
    real existing manual workflow that looks
  • 00:13:19
    like the llm sort of capability set can
  • 00:13:23
    be applied here well and is it valuable
  • 00:13:25
    enough like if we can actually get there
  • 00:13:27
    does this save us a lot of time does it
  • 00:13:29
    it you know what's what's the financial
  • 00:13:31
    impact to us on this does it save us
  • 00:13:33
    hours does it you know generate new
  • 00:13:35
    Revenue what what kind of sort of uh uh
  • 00:13:37
    impact does it have when I think about
  • 00:13:40
    like what are the core capabilities of
  • 00:13:43
    these llms I basically think about it
  • 00:13:47
    as computers can now read write
  • 00:13:53
    make junior employee level decisions and
  • 00:13:57
    they're sort of domain experts about
  • 00:13:58
    everything and like that's the set of
  • 00:14:00
    things that I would look at in these
  • 00:14:01
    manual workflows rather than like oh
  • 00:14:03
    maybe we can apply AI here and it can
  • 00:14:04
    know everything about everything is this
  • 00:14:06
    very specific oh you know we're spending
  • 00:14:08
    a lot of time reading through API docs
  • 00:14:09
    and saying like what did it say um and
  • 00:14:12
    and that's a pretty llm capable
  • 00:14:15
    task did you have anything you want to
  • 00:14:17
    add there because otherwise I'm going to
  • 00:14:18
    take it to this experience directly
  • 00:14:21
    there's the whole you Scope our project
  • 00:14:23
    you decide you want you're going to do
  • 00:14:24
    it I'd love to know what went wrong in
  • 00:14:27
    this situation
  • 00:14:29
    oh so much
  • 00:14:32
    uh the first thing that jumped to mind
  • 00:14:34
    here is that um I think we failed to
  • 00:14:36
    appreciate upfront just how hard some of
  • 00:14:39
    the pure software engineering parts of
  • 00:14:42
    the crawling of API docs would be I
  • 00:14:44
    think we initially thought about this as
  • 00:14:46
    like step one download the docs step two
  • 00:14:49
    get llm to make a bunch of decisions um
  • 00:14:53
    and does that resonate with other people
  • 00:14:55
    you know one two and you're done all
  • 00:14:57
    right we got some hands over there ni
  • 00:15:00
    um and and fundamentally that is still
  • 00:15:01
    What's Happening Here Right like we're
  • 00:15:03
    trying to build a connector into an API
  • 00:15:05
    the kind of steps involved are go to the
  • 00:15:08
    web page that describes how to connect
  • 00:15:09
    this a to this API read through the docs
  • 00:15:11
    and then make a bunch of decisions okay
  • 00:15:13
    here's how we authenticate provide our
  • 00:15:15
    credentials to log into this API here's
  • 00:15:17
    what the set of endpoints looks like uh
  • 00:15:19
    turns out these documentation pages are
  • 00:15:21
    like everything you can possibly imagine
  • 00:15:23
    times like 10 and you have to support a
  • 00:15:25
    very wide variety of use cases you have
  • 00:15:27
    to handle you know rate limiting and
  • 00:15:29
    some docs are behind authentication and
  • 00:15:31
    some docs are like uh the information is
  • 00:15:34
    not even on the web page it's like you
  • 00:15:35
    know that you've got to click on things
  • 00:15:36
    and it's going to go fetch it from the
  • 00:15:37
    server and handling this super wide
  • 00:15:39
    variety of use cases or preventing
  • 00:15:41
    yourself from going and crawling out to
  • 00:15:42
    irrelevant Pages was incredibly hard and
  • 00:15:45
    even now when we look at failure cases
  • 00:15:48
    more often than not they're not uh an
  • 00:15:51
    the AI making a poort decision based on
  • 00:15:53
    good data it's the AI making something
  • 00:15:55
    up based on no data because we failed to
  • 00:15:57
    actually find the right the right sort
  • 00:15:59
    of source material out of the
  • 00:16:01
    web you just seen me make a demo that
  • 00:16:04
    took what like a minute right to process
  • 00:16:08
    and in this minute it tries to figure
  • 00:16:10
    out the relevant docs and figure out the
  • 00:16:13
    base URL then the stream URL
  • 00:16:15
    authentication scheme
  • 00:16:16
    parameters when we started there was the
  • 00:16:20
    happy path prototype connector like woo
  • 00:16:22
    this works really fast that's great but
  • 00:16:25
    then in some cases it took like four and
  • 00:16:30
    a half something minutes in crawling
  • 00:16:32
    docks in headless Chrome and sometimes
  • 00:16:35
    it would get into Loops so you would
  • 00:16:38
    think like in 2024 crawling pages from
  • 00:16:41
    the web should be solved problem and
  • 00:16:42
    there's a bunch of products that say
  • 00:16:44
    they do it right fir crawl is the one we
  • 00:16:47
  • 00:16:48
    now but can you just out of the box
  • 00:16:51
    Point them and expect them to work like
  • 00:16:54
    nope if you go read like you know a rag
  • 00:16:58
    rag tutorial right now it's going to
  • 00:17:00
    tell you uh you know go download your
  • 00:17:03
    information get get craw the docs
  • 00:17:05
    download the docs uh strip out some HTML
  • 00:17:09
    chunk it up into pieces put it into a
  • 00:17:12
    vector store and then query your vector
  • 00:17:13
    store um and actually we did kind of
  • 00:17:16
    start there the final implementation we
  • 00:17:18
    ended up with looks something more like
  • 00:17:20
    we don't pre-ra anything we wait until
  • 00:17:22
    we have a specific task we're trying to
  • 00:17:23
    do like how do you like what is the
  • 00:17:25
    authentication mechanism does this API
  • 00:17:27
    use you know http basic off to for the
  • 00:17:29
    username password does it use an API key
  • 00:17:32
    what is the method and then we purpose
  • 00:17:34
    go crawl for that we start at the
  • 00:17:36
    homepage of the docs and we ask an llm
  • 00:17:38
    to help us navigate toward you know
  • 00:17:40
    where we'd want to want to go we have so
  • 00:17:42
    many fallback mechanisms in here we have
  • 00:17:44
    multiple different Services we use for
  • 00:17:45
    this crawling because there can be rate
  • 00:17:47
    limiting issues they can be flaky um
  • 00:17:49
    there's there's all sorts of issues
  • 00:17:51
    around that we fall back on doing a
  • 00:17:52
    Google search if we can't find the
  • 00:17:54
    information we're looking for we use
  • 00:17:55
    perplexity at some points in the flow uh
  • 00:17:58
    we have a repos repository under the
  • 00:17:59
    hood of a bunch of pre-built opening API
  • 00:18:01
    specs from common repositories like it
  • 00:18:04
    is very complicated under the hood
  • 00:18:06
    there's a lot a lot going on that
  • 00:18:08
    doesn't look like you know you're uh
  • 00:18:10
    here's how you ask a question of your
  • 00:18:11
    documents or rag
  • 00:18:14
    tutorial and I kind of want to like
  • 00:18:16
    before we go towards like the next
  • 00:18:17
    question there I want to just get a
  • 00:18:19
    pulse for the room probably should have
  • 00:18:20
    started with this but I think it's
  • 00:18:21
    helpful as we're diving deeper into some
  • 00:18:23
    of these Concepts just to make sure
  • 00:18:24
    we're all kind of on that same
  • 00:18:25
    wavelength would you raise your hand if
  • 00:18:27
    you identify a builder in AI right now
  • 00:18:30
    you're building some kind of company or
  • 00:18:32
    product in the space all right great how
  • 00:18:34
    many of you are not necessarily building
  • 00:18:36
    but pretty well versed in the topic
  • 00:18:38
    you're doing a lot of independent
  • 00:18:40
    research and rais all right those two
  • 00:18:44
    together I think we have a large
  • 00:18:44
    majority for everyone else you're
  • 00:18:45
    probably where I'm at in my like Journey
  • 00:18:48
    so you can go ahead and be Googling
  • 00:18:50
    things on the side just like I'm going
  • 00:18:51
    to be doing over here um yeah yeah call
  • 00:18:53
    me out for if I'm getting too technical
  • 00:18:55
    no no no it's good we want we want to go
  • 00:18:57
    de deeper and this being live stream
  • 00:18:59
    record so you can always come back later
  • 00:19:01
    if you have more questions I want to
  • 00:19:03
    talk about that piece then like thinking
  • 00:19:04
    about all these components that go into
  • 00:19:07
    building an AI you think about
  • 00:19:08
    observability you think about the rag
  • 00:19:10
    like could you talk through what are
  • 00:19:14
    core components for you of a successful
  • 00:19:17
    AI project maybe evaluations or or
  • 00:19:19
    things of that nature where do you want
  • 00:19:20
    to take
  • 00:19:22
    this so I think the one of the earliest
  • 00:19:25
    steps in any project that's going to
  • 00:19:27
    reach this level of success ESS um if
  • 00:19:30
    it's going to have any sort of
  • 00:19:31
    meaningful complexity to it is going to
  • 00:19:33
    have to be building evales and what I
  • 00:19:36
    mean by EV vals is basically an
  • 00:19:38
    automated test suite for your
  • 00:19:40
    application but one where you're running
  • 00:19:42
    over lots of examples that you want your
  • 00:19:45
    system to be good at and you're testing
  • 00:19:46
    how well it it does at these things so
  • 00:19:48
    you define some metrics up front to
  • 00:19:50
    measure how well am I doing um and so
  • 00:19:52
    like as a concrete example here we're
  • 00:19:53
    trying to build API Integrations our
  • 00:19:55
    first step was let's go gather a bunch
  • 00:19:58
    of existing API Integrations we built
  • 00:20:01
    let's build a a sort of test harness
  • 00:20:03
    that can generate output from our system
  • 00:20:05
    test it against how well does it match
  • 00:20:08
    up with the things that actually people
  • 00:20:09
    built in the past and we produced a
  • 00:20:10
    whole bunch of metrics around these it's
  • 00:20:13
    it's actually non-trivial to get this
  • 00:20:15
    right um uh you know even though we had
  • 00:20:17
    a really rich set of ground truth to
  • 00:20:19
    look at here you know we had hundreds of
  • 00:20:20
    connectors to people that built the
  • 00:20:22
    comparisons are not very straightforward
  • 00:20:24
    like sometimes our system comes up with
  • 00:20:25
    different names than than people came up
  • 00:20:27
    with or the the community connectors
  • 00:20:29
    might have only a subset of of things
  • 00:20:32
    defined in them they could have defined
  • 00:20:34
    and that's that's okay for their use
  • 00:20:35
    case um so detecting sort of the
  • 00:20:37
    difference between we didn't generate
  • 00:20:40
    something and we should have versus we
  • 00:20:41
    didn't generate something and that's
  • 00:20:42
    fine um is is not uh it's not trivial
  • 00:20:46
    but you got to start somewhere and if
  • 00:20:48
    you don't do this your starting point is
  • 00:20:50
    gonna be very s Vibes based you're gonna
  • 00:20:52
    like run your first best idea some
  • 00:20:56
    sometimes it's going to work which is
  • 00:20:57
    going to be really encouraging and cool
  • 00:20:58
    sometimes it's not and you're not like
  • 00:21:00
    going to kind of have some intuition
  • 00:21:01
    about maybe here's how I improve it but
  • 00:21:02
    it's going to be based on whatever sort
  • 00:21:03
    sitting in front of you this is what
  • 00:21:05
    they ended up looking like at some point
  • 00:21:06
    maybe there's like can you go up a slide
  • 00:21:08
    so this is how it looked at the
  • 00:21:09
    beginning when we started we were just
  • 00:21:11
    like so if you can't see the rows here
  • 00:21:15
    are just example connectors that that
  • 00:21:18
    existed already um and we just picked
  • 00:21:21
    three uh knowing that we wanted to be
  • 00:21:23
    better than just doing these three but
  • 00:21:25
    we started somewhere and then each of
  • 00:21:26
    these columns is some some way that we
  • 00:21:28
    measure ourselves against the ground
  • 00:21:29
    truth so if we ask our system to produce
  • 00:21:31
    a Sentry connector there's already a
  • 00:21:33
    Sentry connector out there how well do
  • 00:21:35
    we do at all these things and uh and
  • 00:21:37
    produce these these metrics and we try
  • 00:21:39
    and kind of like produce a score that is
  • 00:21:41
    roughly weighted by how valuable is it
  • 00:21:43
    to a user if we screw this up or get it
  • 00:21:46
    right uh and and then you start now you
  • 00:21:49
    can actually sort of measure how well
  • 00:21:50
    you're doing this is a super powerful
  • 00:21:53
    tool there's sort of a Dark Art to like
  • 00:21:56
    you know perfect versus good on this but
  • 00:21:59
    um if you get this into a good place it
  • 00:22:02
    guides development in a very real way
  • 00:22:03
    like first of all you can tell in an
  • 00:22:04
    unbiased way like how are we doing
  • 00:22:06
    overall you can track your progress you
  • 00:22:08
    can track regressions and if you sort of
  • 00:22:11
    if you're doing some prompt engineering
  • 00:22:12
    and you're like tweaking the language
  • 00:22:13
    all the time to get better at some
  • 00:22:14
    specific failure mode you're seeing what
  • 00:22:17
    how do you know if you tweak your prompt
  • 00:22:18
    it's like not going to make you worse
  • 00:22:19
    the thing you tried to get better at
  • 00:22:20
    yesterday so this will help you track
  • 00:22:22
    regressions it also
  • 00:22:24
    drives uh the sort of anecdotal evidence
  • 00:22:28
    you want to see for where to invest your
  • 00:22:30
    attention next if you go you know
  • 00:22:32
    like you know we're doing pretty well
  • 00:22:34
    actually at this stage um but like
  • 00:22:37
    there's still some zeros in here um so
  • 00:22:40
    like my intuition from seeing this is
  • 00:22:43
    like okay we're like doing okay at
  • 00:22:44
    whatever this thing is for zenitz and
  • 00:22:46
    we're like doing not that good for this
  • 00:22:48
    schema thing for zenitz like wonder what
  • 00:22:50
    that is and i' click into what it's It
  • 00:22:52
    Go actually look at what we generated
  • 00:22:54
    and say ah okay like this the L&M got
  • 00:22:57
    this wrong because we're feeding it the
  • 00:22:58
    wrong information this is a crawling
  • 00:22:59
    problem not prompting problem and we' go
  • 00:23:02
    update our crawler and so sort of tells
  • 00:23:04
    you what to work on next and then over
  • 00:23:06
    time we expanded to that that next slide
  • 00:23:08
    that you were on a second ago which
  • 00:23:09
    is the evals just got bigger and bigger
  • 00:23:12
    and bigger we just kept getting more use
  • 00:23:13
    cases in there trying to get a wider and
  • 00:23:15
    wider set of examples to look at um and
  • 00:23:18
    it's what drove you know you showed the
  • 00:23:19
    sort of workflow diagram in your slides
  • 00:23:21
    earlier that was like kind of the
  • 00:23:22
    spaghetti look of all the different
  • 00:23:24
    steps that go into just one of the
  • 00:23:25
    questions here that evolved out of this
  • 00:23:28
    exploration trying to get better and
  • 00:23:30
    better by adding more sort of uh catches
  • 00:23:33
    for things that could go
  • 00:23:36
    wrong did you want to add anything
  • 00:23:39
    there can hope to add some context at
  • 00:23:44
    the high level this is the diagram for
  • 00:23:46
    the whole thing in the
  • 00:23:48
    beginning and so the idea was okay we're
  • 00:23:51
    going to crawl all of the documents now
  • 00:23:53
    we're going to index everything shove it
  • 00:23:55
    into a vector store and then there's
  • 00:23:57
    going to be like three four different
  • 00:23:58
    components one's going to figure out the
  • 00:24:00
    AL the other is going to figure out the
  • 00:24:02
    pagination um right and then the the the
  • 00:24:05
    different ones going to figure out the
  • 00:24:07
    list of streams basically stream is an
  • 00:24:08
    API endpoint like oh you know
  • 00:24:10
    repositories and GitHub is a stream
  • 00:24:12
    issues and GitHub is a stream if you
  • 00:24:14
    look at this one right here deaf is
  • 00:24:17
    what's we call a record selectorate air
  • 00:24:19
    bite is basically where exactly in the
  • 00:24:22
    response Json is the useful information
  • 00:24:26
    and the schema means okay what are The
  • 00:24:29
    Columns of data what are the fields of
  • 00:24:31
    the useful objects that we
  • 00:24:33
    want and as we grew into this even the
  • 00:24:37
    number of things that we've paid
  • 00:24:39
    attention to increased and each
  • 00:24:42
    particular component became this huge
  • 00:24:44
    spaghetti because it turns out that like
  • 00:24:47
    originally we thought you know what each
  • 00:24:49
    component is going to be a subset of
  • 00:24:51
    index docs the tagged and a prompt and
  • 00:24:55
    hopefully a single prompt is going to
  • 00:24:58
    just make it fine like we crawled
  • 00:25:00
    everything already right and turns out
  • 00:25:02
    in reality like every component that we
  • 00:25:04
    need answer to like every field where
  • 00:25:06
    you can get an AI assist prompt is
  • 00:25:08
    basically a program in
  • 00:25:11
    itself I want Tove the spaghetti piece
  • 00:25:14
    also we're going to change this up
  • 00:25:15
    because originally I was just going to
  • 00:25:17
    like have a point where it's purely
  • 00:25:18
    audio audience Q&A if you're having
  • 00:25:21
    questions about things as we come up
  • 00:25:23
    raise your hand and I will kind of bring
  • 00:25:25
    you into the conversation rather than
  • 00:25:27
    just wait for the end um but I'm curious
  • 00:25:30
    about how the spaghetti evolves over
  • 00:25:32
    here what surprised you the most about
  • 00:25:35
    the way your evaluation criteria early
  • 00:25:38
    on differ when you think about the end
  • 00:25:46
    state so I I'm surprised by the number
  • 00:25:49
    of random fallbacks and stuff in the
  • 00:25:51
    system like that we're still Google
  • 00:25:53
    searching in perplexity you know
  • 00:25:54
    searching under the hood to get to some
  • 00:25:56
    of the answers we want um
  • 00:26:01
    uh I think a very useful but difficult
  • 00:26:05
    thing on this project was thinking about
  • 00:26:07
    how to progress along this path to how
  • 00:26:10
    do we arrive at the right spaghetti um
  • 00:26:13
    uh because if you were to just guess it
  • 00:26:14
    up front you wouldn't guess right like
  • 00:26:16
    you have to kind of evolve your way
  • 00:26:18
    toward it and then that's intention with
  • 00:26:21
    like how do we know we're going to get
  • 00:26:22
    there like how do we
  • 00:26:24
    know how do we know this is even
  • 00:26:26
    possible um let alone that going to get
  • 00:26:28
    there in like a reasonable amount of
  • 00:26:29
    time and I think that question is very
  • 00:26:33
    challenging for AI projects right like
  • 00:26:34
    there's there's some stat that like 70%
  • 00:26:36
    of of poc's never make it to production
  • 00:26:39
    with with AI projects and I think it's
  • 00:26:42
    very challenging to know what a good POC
  • 00:26:44
    looks like and how to get from there to
  • 00:26:45
    production um and and so if you like
  • 00:26:48
    take take the AI assist project as just
  • 00:26:50
    like an example of a broader
  • 00:26:52
    theme um I mean you mentioned you guys
  • 00:26:55
    tried it a few times before right and
  • 00:26:56
    you weren't exactly sure what do we make
  • 00:26:58
    of this like I think this says this is
  • 00:27:00
    possible but I don't know how we get
  • 00:27:01
    there and like
  • 00:27:03
    the if you were just gonna try tomorrow
  • 00:27:06
    to say like is it possible to build
  • 00:27:08
    these API Integrations with llms like
  • 00:27:10
    the first thing You' try is you just
  • 00:27:12
    like go ask chat GPT to do it you'd show
  • 00:27:13
    chat GPT an example of these connectors
  • 00:27:16
    are just a file under the hood you
  • 00:27:17
    showed chat GPT an example of the file
  • 00:27:19
    and you said you know build me one like
  • 00:27:21
    this but for this
  • 00:27:22
    API and then something will come out
  • 00:27:25
    like probably something pretty good uh
  • 00:27:28
    because the files are sort of
  • 00:27:29
    inscrutable and if you don't know what
  • 00:27:30
    you're looking for it's going to look
  • 00:27:31
    right even if it's like technically
  • 00:27:32
    doesn't run later um and then you're
  • 00:27:34
    kind of stuck you don't really
  • 00:27:36
    know what did this really tell me about
  • 00:27:38
    is it possible you you can't really
  • 00:27:40
    iterate on it like how do you make chat
  • 00:27:42
    PT better at this now um how do you know
  • 00:27:45
    what array of stuff it's good at versus
  • 00:27:46
    bad at and it's not going to get you to
  • 00:27:48
    this like eventual kind of spaghetti
  • 00:27:50
  • 00:27:50
    diagram um so instead the approach we we
  • 00:27:54
    tend to take is we try and build pcc's
  • 00:27:57
    that are 100% on the critical path to
  • 00:27:58
    production um we try and be thoughtful
  • 00:28:01
    about which pieces we build early but
  • 00:28:03
    early in the project we didn't start by
  • 00:28:05
    saying let's just show like a really
  • 00:28:06
    shiny marketing demo that shows complete
  • 00:28:08
    end to end it working perfectly for one
  • 00:28:11
    connector we said let's pick three
  • 00:28:13
    connectors as examples and it's going to
  • 00:28:15
    start out kind of crappy and then we're
  • 00:28:16
    going to try and make it better over
  • 00:28:17
    time um and that that diagram you showed
  • 00:28:20
    a second ago that's like the the this
  • 00:28:23
    one yes this one this was our sketch
  • 00:28:25
    like a few weeks into the project of
  • 00:28:27
    what we we imagined the eventual
  • 00:28:29
    spaghetti might look like and it ended
  • 00:28:30
    up changing over time and what we tried
  • 00:28:32
    to do was tackle these pieces in order
  • 00:28:35
    um to try and drisk the riskiest parts
  • 00:28:37
    of the project we're like all right
  • 00:28:39
    let's try and work on the box that's
  • 00:28:40
    about authentication right now and see
  • 00:28:42
    like what's it look what's it look like
  • 00:28:44
    start to feel it out get rid of unknown
  • 00:28:45
    unknowns get that to a place where we're
  • 00:28:47
    like I believe that with iteration this
  • 00:28:48
    part is possible then tackle the next
  • 00:28:50
    piece and tackle the next piece and
  • 00:28:52
    start to flesh this out I think actually
  • 00:28:53
    the screenshot like the gray boxes were
  • 00:28:55
    like things we didn't try yet or
  • 00:28:57
    something um or de prioritize for p so
  • 00:29:00
    like you know we hadn't we didn't
  • 00:29:02
    actually tackle all of these but we
  • 00:29:03
    tried to tackle as many as we could to
  • 00:29:04
    start to drisk it and then that process
  • 00:29:08
    drove us to a more robust eval driven
  • 00:29:11
    now it feels like iteration doesn't feel
  • 00:29:13
    like we're building a V1 of something it
  • 00:29:14
    feels like we're kind of like you know
  • 00:29:15
    iterating iterating iterating and that
  • 00:29:17
    drives the ideas for where to add the
  • 00:29:19
    sort of branching Paths of that that
  • 00:29:21
  • 00:29:23
    diagram I do like the idea that we
  • 00:29:24
    should only be talking about EV valves
  • 00:29:26
    in the context of spaghetti going
  • 00:29:29
    so let's keep Let's uh maybe keep that
  • 00:29:31
    one up all night um thinking about the
  • 00:29:38
    yeah yeah in terms of the eval how are
  • 00:29:41
    you do are you just compar
  • 00:30:01
    yeah that's that's a great question um
  • 00:30:02
    yeah so the question was like what are
  • 00:30:04
    we measuring how are we doing these
  • 00:30:05
    evals um in this case are we just
  • 00:30:07
    comparing ourselves to an existing
  • 00:30:09
    connector that we know is good or uh he
  • 00:30:11
    said he's heard of some examples of
  • 00:30:13
    using an llm to evaluate how the other
  • 00:30:15
    llm did um it's a great question uh
  • 00:30:21
    so what we see across successful
  • 00:30:24
    projects varies a lot um part of what
  • 00:30:27
    makes the actually difficult is that
  • 00:30:29
    they rarely fit this like clean academic
  • 00:30:32
    standard for what you would want to see
  • 00:30:34
    um clean input output pairs great ground
  • 00:30:37
    truth you know how to compare these
  • 00:30:38
    things and how to measure them sometimes
  • 00:30:39
    the thing we're measuring ourselves
  • 00:30:40
    against is we like ship an example
  • 00:30:43
    output to some team somewhere and we're
  • 00:30:44
    like you're the experts on this domain
  • 00:30:45
    did we do a good job or not they ship it
  • 00:30:47
    back and like trying to evaluate based
  • 00:30:48
    on that and so the the mess wrangling
  • 00:30:51
    the mess is hard um we have seen
  • 00:30:54
    successful examples of using it that
  • 00:30:57
    that technique is called llm as judge
  • 00:30:59
    where you you have an llm evaluate how
  • 00:31:01
    you're doing it's good for like very
  • 00:31:03
    subjective things if you're generating
  • 00:31:04
    free form text and you're like does this
  • 00:31:06
    seem like it answered my question that's
  • 00:31:08
    like a task for an llm in this case we
  • 00:31:10
    were able to circumvent that I think in
  • 00:31:13
    every case uh we do some like
  • 00:31:15
    deterministic fuzzy stuff where we're
  • 00:31:17
    like does this name almost match that
  • 00:31:19
    name if so we're good um uh and so there
  • 00:31:22
    is some like deep Logic for like trying
  • 00:31:26
    to score ourselves uh in a way way
  • 00:31:28
    that's not not as straightforward is
  • 00:31:29
    just like does this thing equal that
  • 00:31:31
    thing um um but we've seen sort of
  • 00:31:34
    everything and at some point you do need
  • 00:31:36
    to sort of stop like looking for the
  • 00:31:38
    perfect thing and find something
  • 00:31:39
    directionally useful um we've had
  • 00:31:41
    projects where like you have a workflow
  • 00:31:43
    diagram this pop this this complicated
  • 00:31:45
    and the only thing we're able to measure
  • 00:31:46
    is like what's going on down here um
  • 00:31:48
    because it's like the only place where
  • 00:31:49
    you can design clean EV EV vals and then
  • 00:31:52
    you just sort of put up with that and
  • 00:31:54
    and do the best you can
  • 00:32:17
  • 00:32:40
    so it's so what does the output look
  • 00:32:43
    like is actually very critical to what I
  • 00:32:45
    think made this possible here um so
  • 00:32:47
    we've actually built uh sort of uh AI
  • 00:32:51
    powered integration Builders multiple
  • 00:32:53
    times um this is this is one of them for
  • 00:32:55
    airite I think one amazing asset that
  • 00:32:58
    airb has here is they have this format
  • 00:33:01
    that they call their their well I don't
  • 00:33:03
    know what you call it your low low code
  • 00:33:04
    cdk format your your this this spec for
  • 00:33:07
    how to define an API integration as
  • 00:33:09
    configuration instead of as code big
  • 00:33:12
    file that describ and in fact in our
  • 00:33:14
    pipeline we never have an llm write this
  • 00:33:18
    thing as output we write this as output
  • 00:33:20
    deterministically using code and we use
  • 00:33:22
    the llm to answer specific questions we
  • 00:33:24
    have about this process so we ask it
  • 00:33:27
    picking off authentication method for me
  • 00:33:28
    and then we use that to
  • 00:33:29
    deterministically generate the
  • 00:33:30
    authentication part of this that's part
  • 00:33:32
    of what makes this an approachable
  • 00:33:34
    problem we've built this before uh where
  • 00:33:37
    the end goal is to write code performs
  • 00:33:40
    way worse um and even in that process we
  • 00:33:43
    have uh under the hood we have an
  • 00:33:46
    intermediate format that is not I mean
  • 00:33:49
    it's like conceptually similar to this
  • 00:33:52
    that we're using to sort of constrain
  • 00:33:53
    the problem so much of the trick with
  • 00:33:55
    these LMS is constraining the domain in
  • 00:33:56
    which they're thinking right if if you
  • 00:33:58
    say write me some code you're going to
  • 00:34:00
    get something code shaped as output
  • 00:34:01
    whether it's good nobody knows um if you
  • 00:34:04
    ask it for a very specific constrainted
  • 00:34:06
    answer where it's only allowed to answer
  • 00:34:08
    within a very specific Universe it's
  • 00:34:09
    much more tunable it's going to perform
  • 00:34:10
    a lot better just kind of made that
  • 00:34:12
    possible yeah I mean
  • 00:34:25
    I'm I can take that
  • 00:34:29
    so to clarify the last two questions
  • 00:34:33
    it's I think it's both relevant to evals
  • 00:34:35
    ands to outputs uh the way we eval is we
  • 00:34:39
    compare what the model gives us with
  • 00:34:41
    what we have in connectors we know is
  • 00:34:43
    good it's not always one to one because
  • 00:34:46
    for example if you have a stream that's
  • 00:34:48
    called capital T transactions is it
  • 00:34:51
    still the same or like is it if if the
  • 00:34:54
    wording is slightly different but the
  • 00:34:55
    scheme is very similar if the schemas
  • 00:34:58
    are compatible but the columns are not
  • 00:34:59
    the same is it is it a match is it not
  • 00:35:01
    match like that that kind of stuff the
  • 00:35:03
    output is uh are pieces of the Manifest
  • 00:35:07
    and the AI Builder thing like we have a
  • 00:35:10
    python library that enforces the format
  • 00:35:14
    of the Manifest essentially think
  • 00:35:16
    kubernetes resource definitions right
  • 00:35:18
    there are fields that are required they
  • 00:35:20
    can be only of certain format so Builder
  • 00:35:23
    before outputting that as a suggestion
  • 00:35:26
    validates that it's
  • 00:35:28
    legit and then one use case is sure
  • 00:35:32
    right just a co-pilot thing in Builder
  • 00:35:35
    itself um what we see is the match
  • 00:35:38
    success rate like we see successful good
  • 00:35:41
    suggestions very very often like it's
  • 00:35:43
    probably north of 90% on each particular
  • 00:35:46
    field today but the thing is there's a
  • 00:35:48
    bunch of fields and those probabilities
  • 00:35:50
    multiply so the probability that you get
  • 00:35:53
    full connector end to end correctly is
  • 00:35:57
    you slightly lower but we're getting
  • 00:35:59
    there this use case is okay let's get a
  • 00:36:02
    lot of connectors let's make new
  • 00:36:03
    connectors Let's help people make
  • 00:36:06
    connectors for themselves and then share
  • 00:36:07
    them with our community but also I have
  • 00:36:11
    450 connectors and like more than 250 of
  • 00:36:14
    them are in that format so the whole
  • 00:36:16
    connector is just a big manifest file
  • 00:36:18
    and what I can do is I already have a CI
  • 00:36:20
    pipeline that runs every week and you
  • 00:36:23
    see there's this thing called version
  • 00:36:24
    right like this is the version of the
  • 00:36:26
    framework that it's using
  • 00:36:28
    and my CI pipeline checks hey do I have
  • 00:36:30
    a newer version of the framework and if
  • 00:36:32
    I do I'm going to update all of my
  • 00:36:35
    manifest as long as it's not breaking
  • 00:36:37
    another thing we could do basically on
  • 00:36:39
    CI uh or regularly is uh create another
  • 00:36:42
    endpoint in our AI assist thing and have
  • 00:36:46
    another flow where we say hey here's the
  • 00:36:49
    name of the connector here's the API
  • 00:36:51
    docs here's the existing manifest do you
  • 00:36:54
    think there may be some new streams that
  • 00:36:56
    we don't have
  • 00:36:59
    and like these or you know like maybe
  • 00:37:01
    there's a new authentication method
  • 00:37:03
    maybe there are some deprecations that
  • 00:37:04
    we want to clean up today the way this
  • 00:37:07
    works is connector fails for someone the
  • 00:37:09
    stream doesn't work anymore somebody
  • 00:37:11
    files in a GitHub issue they say well
  • 00:37:13
    we're open source you're very welcome to
  • 00:37:14
    contribute they contribute we run
  • 00:37:16
    regression tests verify it's not broken
  • 00:37:18
    then we merge when we had just the
  • 00:37:20
    python framework it took months now it
  • 00:37:23
    takes days but if I can automate this
  • 00:37:27
    cool so thank you for the
  • 00:37:34
    suggestion should I okay I'll do it
  • 00:37:37
    you're oh thanks um all right I kind of
  • 00:37:41
    want to like pull on a Thro a little bit
  • 00:37:43
    more that Samantha brought up which is
  • 00:37:44
    like you can Envision a future of like
  • 00:37:47
    an an agent or something doing this like
  • 00:37:50
    since the GPT era started it seems like
  • 00:37:54
    there's always something new it's
  • 00:37:55
    exciting that people are talking about
  • 00:37:56
    you know it was rag agents um graph rag
  • 00:38:01
    there's countless things in a year from
  • 00:38:03
    now do you feel any of these will
  • 00:38:05
    continue to be just as pertinent a part
  • 00:38:07
    of the conversation or do you think
  • 00:38:09
    something new will be the dominant point
  • 00:38:11
  • 00:38:16
    discussion and if you do think something
  • 00:38:18
    new what is that
  • 00:38:21
    thing I'm less of an AI futurist and
  • 00:38:25
    more of an AI today practitioner uh but
  • 00:38:29
    um you know when people talk about
  • 00:38:32
    agents for example I think there's like
  • 00:38:33
    multiple things they might mean um think
  • 00:38:37
    one thing they might mean is like build
  • 00:38:41
    a thing that's got a lot of autonomy
  • 00:38:42
    around what it can do you give give
  • 00:38:44
    something a bunch of tools and you let
  • 00:38:46
    it sort of decide it's less of this
  • 00:38:47
    deterministic we do this then we do this
  • 00:38:49
    then we do this and you sort of give it
  • 00:38:50
    access to whatever it wants
  • 00:38:54
    um I've yet to see anything like that
  • 00:38:56
    come to fruition in practice for a
  • 00:38:58
    significant system that could see that
  • 00:39:00
    changing over time um but right now it
  • 00:39:02
    seems very um theoretical to me and like
  • 00:39:06
    may may happen if it gets driven by you
  • 00:39:08
    know big big boost to what Foundation
  • 00:39:11
    models are capable of
  • 00:39:13
    um but I think the more interesting
  • 00:39:15
    today thing for for agents what people
  • 00:39:18
    tend to mean is like less around
  • 00:39:20
    autonomy more around specialization like
  • 00:39:22
    how do you break your problem down into
  • 00:39:24
    specific components that are in charge
  • 00:39:26
    of a very small subdomain and are
  • 00:39:28
    experts in that subdomain that I think
  • 00:39:30
    is going to get even more common I think
  • 00:39:31
    people are
  • 00:39:33
    realizing a the complexity of these
  • 00:39:35
    projects in practice you know what looks
  • 00:39:37
    at a high level like hey chat GPT give
  • 00:39:39
    me give me a connector it looks more
  • 00:39:40
    like this under the hood and also that
  • 00:39:44
    so much of uh the sort of mystery of
  • 00:39:46
    what it's like to build with LMS is
  • 00:39:48
    actually just software engineering under
  • 00:39:49
    the hood um I think that is going to
  • 00:39:51
    drive more adoption of these sort of
  • 00:39:54
    that type of agent system um and I we're
  • 00:39:58
    seeing more and more of it we're talking
  • 00:39:59
    about a very sort of tech tech forward
  • 00:40:00
    company Tech forward use case but we
  • 00:40:02
    also see like you know 100y old big
  • 00:40:05
    equipment manufacturers talking about
  • 00:40:06
    these workflows in a very realistic way
  • 00:40:09
    that I think is is going to be in
  • 00:40:10
    production within the next year at at a
  • 00:40:12
    company like that um that you might call
  • 00:40:14
    an agentic workflow um so I see that
  • 00:40:16
    that part of it being very real over the
  • 00:40:17
  • 00:40:25
    year Tak only
  • 00:40:29
    jumped to deep in building this thing my
  • 00:40:32
    Horizon of thinking about AI things a
  • 00:40:34
    year from now is very very
  • 00:40:36
    short my personal biggest thing is like
  • 00:40:40
    we we have manifest connect also have
  • 00:40:43
    python connectors and Java connectors
  • 00:40:46
    and we also have bug bugs in those so my
  • 00:40:48
    biggest dreams are around just those
  • 00:40:51
    software programming agents which can be
  • 00:40:53
    as simple as a little bash script that
  • 00:40:55
    says hey here's a GitHub un call issue
  • 00:40:59
    here's the bug report here are the logs
  • 00:41:01
    here's the directory with all of the
  • 00:41:03
    source files and here's the script that
  • 00:41:05
    builds and tests the connector here's
  • 00:41:07
    the bug
  • 00:41:08
    output can you fix it and then the
  • 00:41:11
    script applies the changes proposed by
  • 00:41:13
    the model runs the tests and if they
  • 00:41:15
    fail it says yeah that didn't work try
  • 00:41:18
    again in a while loop just until it
  • 00:41:21
    wraps up this is my next hobby project I
  • 00:41:24
    think after this thing is successful
  • 00:41:27
    what that means for other Industries and
  • 00:41:31
    for programmers and businesses that
  • 00:41:33
    build with AI EDD is the boss
  • 00:41:44
    there I don't know that we have a final
  • 00:41:46
    one fully
  • 00:41:51
    together I I don't think there is a full
  • 00:41:54
    final one and
  • 00:42:03
    very little framework code under the
  • 00:42:05
    hood there's there's some but it's it's
  • 00:42:06
  • 00:42:10
    substantial it's kind of not
  • 00:42:12
    representative of the final no that's
  • 00:42:14
    okay maybe this is closing the biggest
  • 00:42:16
    place where I think the the diagram
  • 00:42:18
    diverged is like around the The Crawling
  • 00:42:21
    of the docks like we don't do an upfront
  • 00:42:23
    crawling Step at all um and so it's it
  • 00:42:26
    stops looking like
  • 00:42:27
    I guess the other big change is like at
  • 00:42:29
    that point we
  • 00:42:31
    envisioned URL to API docs as input
  • 00:42:35
    connector as output one shot build the
  • 00:42:37
    whole thing all at
  • 00:42:38
    once and uh where it ended up going was
  • 00:42:42
    that is what the initial experience is
  • 00:42:44
    like in the UI but there's lots of
  • 00:42:45
    little buttons you can push to fill in
  • 00:42:46
    fields here and there and so the flow is
  • 00:42:48
    much more decomposed into a set of a
  • 00:42:50
    bunch of different endpoints and smaller
  • 00:42:52
    workflows that leverage some shared
  • 00:42:54
    shared stuff under the hood and so it's
  • 00:42:56
    not exactly left to right end to end
  • 00:42:58
    thing it's like 12 end to end things
  • 00:43:00
    that have some shared
  • 00:43:15
    stuff yes so so there's actually two
  • 00:43:17
    inputs uh you can give us an open API
  • 00:43:20
    spec uh as as input for those that don't
  • 00:43:23
    know an open API spec is like a it's a
  • 00:43:25
    common standard format you can use to
  • 00:43:27
    describe uh an API um it's optional but
  • 00:43:30
    if you give it to us we'll we'll use it
  • 00:43:32
    um we also have our own curated kind of
  • 00:43:36
    repo like common common apis that that
  • 00:43:39
    are out there in their specs that we
  • 00:43:41
    sometimes use um other supplemental
  • 00:43:44
    information is is it's all stuff living
  • 00:43:48
    on the web it's like Google searching um
  • 00:43:52
  • 00:43:54
    crawling anything El yeah I think that's
  • 00:43:56
    all the supplemental stuff
  • 00:44:04
    I wonder if you for some stages that are
  • 00:44:06
    disconnected andon to an artifact have
  • 00:44:10
    you tried to combine
  • 00:44:21
    them I ended
  • 00:44:24
    up but then
  • 00:44:34
  • 00:44:37
    you but but sometimes it makes sense and
  • 00:44:40
    the following question if you have
  • 00:44:47
    Doney so I think the first part was
  • 00:44:50
    around like instead of treating these
  • 00:44:52
    these different alternative steps for
  • 00:44:53
    finding information as as fallbacks to
  • 00:44:55
    one another can you sort of do them in
  • 00:44:56
    parallel and then try and try and
  • 00:44:58
    combine the information is that is that
  • 00:45:09
    right that are somewhere in here in a
  • 00:45:12
    sequence so have you tried to reconcile
  • 00:45:14
    them in a single step you know with
  • 00:45:18
    let's say let's call it a gentic
  • 00:45:19
    application or a gentic step in which
  • 00:45:23
    you do both tasks you can
  • 00:45:28
    right so the two tasks here are like
  • 00:45:31
    um they're
  • 00:45:33
    basically go out and find the relevant
  • 00:45:36
    information to a question like
  • 00:45:38
    authentication and
  • 00:45:39
    then I cannot
  • 00:45:42
    read imagine that there are two simple
  • 00:45:44
    Tas that you have separated by an
  • 00:45:47
    artifact you generate
  • 00:45:52
    one you instead
  • 00:45:57
    yeah I think it actually often starts
  • 00:46:00
    the opposite way it's like we start with
  • 00:46:01
    a larger problem we're like build this
  • 00:46:03
    whole thing and we're like this needs to
  • 00:46:04
    be broken down and sub
  • 00:46:12
    components possible that's happened
  • 00:46:14
    somewhere in the details I'm like less
  • 00:46:15
    less familiar no use case there is
  • 00:46:18
    jumping to mind but like I think the
  • 00:46:19
    tactic makes sense to me
  • 00:46:22
  • 00:46:24
    uh yeah in practice like one area we've
  • 00:46:27
    had to break things down is like sort of
  • 00:46:29
    deeply nested questions um where like we
  • 00:46:33
    may be asking the the llm like which of
  • 00:46:36
    these authentication methods is used and
  • 00:46:37
    like if it's this one I need this
  • 00:46:38
    information if it's that one I need that
  • 00:46:40
    information it's sort of asking these
  • 00:46:41
    deeply nested questions it like sort of
  • 00:46:42
    falls off and gets lazy and stops
  • 00:46:44
    following the instructions so we've had
  • 00:46:45
    to sort of chop it up into the sub
  • 00:46:47
    pieces so this a little bit like the
  • 00:46:48
    opposite of the flow you're describing
  • 00:46:50
    but like I could see if we if we'
  • 00:46:52
    started out with the sort of multi-step
  • 00:46:54
    version being like I wonder if we can do
  • 00:46:55
    this all at once which does save you on
  • 00:46:58
    latency and
  • 00:47:03
    cost more easier you have that you
  • 00:47:11
    try yeah at least try to recile
  • 00:47:16
    something for example when I started
  • 00:47:19
    doing group
  • 00:47:22
    ofel so I start basically from functions
  • 00:47:25
    and I automate function with a agent
  • 00:47:29
    step AG step and then I I I link
  • 00:47:33
    together but then I say okay this two
  • 00:47:36
    maybe can recile single yeah you instead
  • 00:47:39
  • 00:47:41
    having step agent to maintain have it
  • 00:47:45
    doesn't have to make sense for
  • 00:47:46
    everything and end up a single blob of
  • 00:47:49
    agent that that performs everything it's
  • 00:47:51
    not going to work what you were saying
  • 00:47:52
    at the very begin yeah we've seen I
  • 00:47:55
    think this is only tangentially Rel
  • 00:47:56
    related to what you're asking but we
  • 00:47:57
    have seen on another another project so
  • 00:48:00
    it looks pretty different to this but
  • 00:48:02
    it's fundamentally basically like it's a
  • 00:48:04
    Content moderation projects it's for a a
  • 00:48:06
    company called where they
  • 00:48:08
    they have like a petition uh platform
  • 00:48:11
    where people can can post petitions
  • 00:48:13
    about you know political things and
  • 00:48:15
    local things and stuff like that um and
  • 00:48:17
    they have kind of a challenging content
  • 00:48:20
    moderation problem because it's not as
  • 00:48:21
    simple as saying like did someone just
  • 00:48:24
    post spam or did someone just post hate
  • 00:48:25
    speech it's actually like a valid use of
  • 00:48:27
    their platform to say something like
  • 00:48:29
    somewhat inflammatory but like it can't
  • 00:48:31
    cross the lines of of their Community
  • 00:48:33
    guidelines and so um getting uh these
  • 00:48:37
    agents to sort of understand the
  • 00:48:38
    different nuances of like what does it
  • 00:48:40
    mean to to um to violate our policies is
  • 00:48:44
    is challenging and under the hood what
  • 00:48:46
    we do is we have these sort of
  • 00:48:47
    specialist agents that do look at this
  • 00:48:49
    through different lenses they write out
  • 00:48:51
    their sort of reasoning their Chain of
  • 00:48:53
    Thought they give us confidence scores
  • 00:48:54
    at the end and then we take a bunch of
  • 00:48:56
    these different answers together at the
  • 00:48:57
    end and we give it to one bigger process
  • 00:48:59
    that's like all right now that you
  • 00:49:00
    understand all the Nuance of these
  • 00:49:01
    different angles make a final decision
  • 00:49:03
    and it's sort of combining um these
  • 00:49:05
    different sort of Sub sub viewpoints if
  • 00:49:06
    that makes sense it's not exactly what
  • 00:49:08
    you were talking about but it's a sort
  • 00:49:09
    similar idea um on the 01 question uh he
  • 00:49:13
    asked if we had tried o one at any point
  • 00:49:15
    um we have um uh the biggest drawback
  • 00:49:20
    with o one is that it's slow um so this
  • 00:49:23
    is like just too latency sensitive of an
  • 00:49:25
    application we already have um takes a
  • 00:49:27
    while to build to generate a connector
  • 00:49:29
    here there's a lot of substeps if you
  • 00:49:30
    added 20 seconds to one of the prompts
  • 00:49:32
    it would probably be a nonstarter
  • 00:49:34
    especially given that the bottleneck
  • 00:49:35
    here is less the ai's intelligence and
  • 00:49:39
    more our ability to give the AI the
  • 00:49:40
    right information at the right
  • 00:49:42
    time we are getting that point where we
  • 00:49:46
    have a lot of pizza that people still
  • 00:49:49
    toat so I I want to start putting the
  • 00:49:52
    bows on the present here and just
  • 00:49:54
    confirm is there anything else that you
  • 00:49:55
    all wanted share with the audience that
  • 00:49:57
    we haven't had a chance to talk about
  • 00:49:59
    and I will also give the opportunity if
  • 00:50:01
    you have any burning final questions
  • 00:50:03
    feel free getting those in there but I
  • 00:50:05
    know there's slides there's a lot of
  • 00:50:06
    things that you all might want to show
  • 00:50:08
    anything you wanted to kind of TOS
  • 00:50:13
    out anything else
  • 00:50:17
    yes to cont
  • 00:50:36
    so we're trying toise some
  • 00:50:49
  • 00:50:51
    resp speak
  • 00:51:24
    spaghet so I guess I'll start by saying
  • 00:51:26
    this domain sounds very hard
  • 00:51:29
    um the the thing that makes me say it
  • 00:51:31
    sounds hard is that um hirings sounds
  • 00:51:35
    hard and like uh we struggle to train
  • 00:51:37
    humans to do it today um so getting
  • 00:51:43
    getting if I struggle to picture how to
  • 00:51:45
    get uh a pretty Junior uh person to
  • 00:51:49
    figure out how to reliably produce this
  • 00:51:50
    output then I also struggle to see how
  • 00:51:52
    to get an LM to do it the the analogy
  • 00:51:54
    that jumps to mind though is um
  • 00:51:58
    this kind of problem is present for AI
  • 00:52:01
    phone agent applications there's a lot
  • 00:52:03
    of you know people trying to put AI
  • 00:52:04
    agents on the phone they have to sort of
  • 00:52:06
    be robust in the face of people can say
  • 00:52:09
    anything um it's hard to build
  • 00:52:12
    uh customer support bot for an airline
  • 00:52:14
    if if you're afraid that it's you know
  • 00:52:16
    gonna just like give someone a free
  • 00:52:17
    ticket because you say ignore previous
  • 00:52:19
    instructions you know um I don't get the
  • 00:52:22
    sense that anyone's like figured this
  • 00:52:23
    out super well um the tactic they use
  • 00:52:26
    there is is sort of a hybrid
  • 00:52:28
    between a um almost like a what you
  • 00:52:32
    picture for like a phone tree where you
  • 00:52:33
    can just you know press press one if
  • 00:52:35
    you're a good candidate um and and and
  • 00:52:38
    still leveraging uh you know like the
  • 00:52:40
    the lm's ability to handle inputs as
  • 00:52:43
    never seen before and so it tends to
  • 00:52:45
    look like a state machine where you have
  • 00:52:47
    different states that the agent can be
  • 00:52:49
    in it's trying to assess it's very
  • 00:52:50
    specific narrow things at each point in
  • 00:52:52
    the state but that the way it decides to
  • 00:52:54
    move from state to state is B based on
  • 00:52:56
    llm logic you know logic described in
  • 00:52:58
    English not a very deterministic uh sort
  • 00:53:01
    of thing um and then I would still take
  • 00:53:04
    the approach of build evals based on Old
  • 00:53:07
    transcripts of calls that have gone off
  • 00:53:08
    the rails and measure yourself against
  • 00:53:09
    like known known bad use cases um
  • 00:53:13
    getting to Perfection on this sounds
  • 00:53:15
    sounds pretty challenging um also
  • 00:53:17
    getting nlms to to state how confident
  • 00:53:20
    they are in something is his own sort of
  • 00:53:21
    sub problem and so like you may be able
  • 00:53:23
    to get this eventually to a point where
  • 00:53:25
    it can tell you when it doesn't know but
  • 00:53:27
    tuning that is also going to be
  • 00:53:28
    challenging because they sort of
  • 00:53:29
    overstate their confidence
  • 00:54:00
    yes but the interpretation and tuning is
  • 00:54:02
    is like a real challenge like
  • 00:54:04
    um a lot of our projects have have steps
  • 00:54:07
    in the middle of the workflow where
  • 00:54:09
    we're asking we're asking for an
  • 00:54:12
    evaluation of the form of like think out
  • 00:54:14
    loud then come up with your answer and
  • 00:54:17
    then tell us you know how confident are
  • 00:54:19
    you in your answer it's usually not a
  • 00:54:20
    number it's usually like low medium high
  • 00:54:22
    very high and then you don't just trust
  • 00:54:24
    what that means you measure it against
  • 00:54:25
    your EV like is this predictive of
  • 00:54:27
    anything like um seems like very high
  • 00:54:30
    means like maybe possibly correct and so
  • 00:54:32
    you only filter down to maybe
  • 00:54:35
    high so do you have
  • 00:54:38
    any talking you have a things that are
  • 00:54:43
    work out really well like give you one
  • 00:54:45
    example I found out that for me if I put
  • 00:54:48
    in uh some example inputs and some
  • 00:54:51
    perfect outputs into the context then
  • 00:54:54
    you know splits out result like simar
  • 00:54:57
  • 00:55:01
    recation the examples thing does work uh
  • 00:55:04
    um showing it examples usually gets us
  • 00:55:06
    sort back on the rails like I'm sure
  • 00:55:07
    you've seen all the sort of trendy
  • 00:55:08
    little tricks you know offered a big tip
  • 00:55:10
    like say put a bunch of exclamation
  • 00:55:12
    points in there offer to fire it if it's
  • 00:55:13
    not going to do a good job like those
  • 00:55:14
    those things I think you
  • 00:55:16
    know may give you lift uh it's going to
  • 00:55:19
    be challenging to know if you don't if
  • 00:55:20
    you don't measure it
  • 00:55:23
    um I think more often in practice it's
  • 00:55:26
    it's around
  • 00:55:28
    um finding specific cases where you did
  • 00:55:31
    poorly and then baking them into your
  • 00:55:32
    prompt um uh you know trying a wide
  • 00:55:35
    variety of things noticing that it's
  • 00:55:37
    sort of off on this case and then
  • 00:55:39
    describing that case to it
  • 00:55:42
    um I don't have handy like a list of
  • 00:55:44
    things I mean um but uh and I bet if I
  • 00:55:48
    pulled the folks on our team everybody's
  • 00:55:50
    got a different uh set of favorite bag
  • 00:55:53
    of tricks um which I think is is um
  • 00:55:57
    it's also a danger on on these AR
  • 00:55:59
    projects is that um it's easy to fall
  • 00:56:02
    into like you just like it's like a
  • 00:56:04
    really good nerd snip machine right like
  • 00:56:06
    you can be like I'm pretty sure tipping
  • 00:56:07
    is g to be a great the great thing to
  • 00:56:09
    try on this project and so the evals
  • 00:56:11
    help keep you keep you on on task there
  • 00:56:13
    um the set of tactics is out there right
  • 00:56:15
    you can Google search for for people's
  • 00:56:16
    long list of tactics one random thing
  • 00:56:18
    we've had good success with is is
  • 00:56:20
    anthropic has a this prompt generator um
  • 00:56:23
    and you can just paste in your your
  • 00:56:25
    current prompt and it'll rewrite it
  • 00:56:27
    we've had surprising results where like
  • 00:56:28
    visually it doesn't look any better
  • 00:56:30
    we're like that's kind of what I already
  • 00:56:31
    said in my prompt and then like the
  • 00:56:33
    metrics just go up
  • 00:56:35
    um but it's not one weird trick it's
  • 00:56:38
    like try lots of things and measure your
  • 00:56:41
    progress all right thank you everybody
  • 00:56:44
    for coming tonight we're super excited
  • 00:56:47
    uh that you made their time to be with
  • 00:56:49
    us and quick round Applause for Eddie
  • 00:56:51
  • 00:56:57
    the office is going to be open for the
  • 00:56:58
    next 20 minutes or so so like I said
  • 00:57:00
    lots of pizza to eat there's still
  • 00:57:02
    drinks as well so go enjoy pester Eddie
  • 00:57:05
    and with any further questions maybe you
  • 00:57:07
    didn't get a chance to ask it now they
  • 00:57:09
    are going to be around and if they
  • 00:57:10
    weren't planning to now they are um but
  • 00:57:13
    again thank you for being here hope you
  • 00:57:14
    had a great time let's keep partying
