How Airbyte Uses AI to Build Connectors

00:57:22
https://www.youtube.com/watch?v=SR5Spck-IY0

概要

TLDRL'événement a exposé les participants à l'importance de l'IA dans les projets technologiques modernes, avec un accent particulier sur Airite, une plateforme de mouvement de données, et leur fonction AI Assist qui utilise l'IA pour automatiser la création de connecteurs API. Les présentateurs ont discuté du cycle de développement, des défis et des succès de l'IA Assist, notamment l'importance des évaluations (evals) dans l'amélioration continue des outils IA. L'événement a servi à partager des expériences entre professionnels de l'industrie, s'attardant sur la nécessité de bien structurer les projets IA pour éviter les embûches courantes et garantir le succès en production. Enfin, l'accent a été mis sur le fait que l'intégration de l'IA doit se faire dans le cadre de flux de travail manuels déjà en place pour maximiser son efficacité.

収穫

  • 🤝 L'événement favorise l'interaction et le partage de connaissances sur l'intégration de l'IA.
  • 💡 Airite et AI Assist représentent des outils clés pour la création automatisée de connecteurs API.
  • 🛠 Le développement IA nécessite des évaluations pour garantir l'efficacité et le succès.
  • ⚙️ L'automatisation de tâches complexes comme les connecteurs se fait avec une IA bien structurée.
  • 🔍 La compréhension et l'extraction correcte des données d'API sont cruciales.
  • 🧩 Les flux de travail doivent être bien structurés pour intégrer efficacement l'IA.
  • 🎯 L'importance de comprendre les besoins réels avant de démarrer des projets IA.
  • 🔄 Le processus de développement IA est itératif et nécessite des ajustements constants.
  • 📊 Partage des défis et solutions lors de la mise en œuvre de l'IA en production.
  • 🚀 Des outils comme AI Assist peuvent rapprocher l'objectif d'un accès facile aux données.

タイムライン

  • 00:00:00 - 00:05:00

    Teao présente Airite, une plateforme de mouvement de données, lors d'un événement interactif avec Fractional. L'objectif est de discuter des projets IA, des pratiques à faire et à éviter, et d'encourager une participation active du public.

  • 00:05:00 - 00:10:00

    Natik présente l'objectif d'Airite de rendre les données accessibles à tous. La société concentre ses efforts sur les frameworks capables de lire des données d'API arbitraires et a lancé AI Assist pour améliorer l'efficacité dans la création de connecteurs API.

  • 00:10:00 - 00:15:00

    Une démonstration d'AI Assist montre comment la création de connecteurs API peut être simplifiée de plusieurs jours à environ une heure. Cette innovation permet de faire des connexions API plus rapidement, ce qui est essentiel pour étendre la couverture API d'Airite.

  • 00:15:00 - 00:20:00

    AI Assist a commencé comme un projet naïf utilisant ChatGPT. L'approche était trop simple et n'a pas bien fonctionné pour des tâches complexes. L'équipe a ensuite développé une approche plus sophistiquée avec Fractional, combinant LLM avec une logique logiciel étendue.

  • 00:20:00 - 00:25:00

    La leçon principale tirée est que la 'magie' de l'IA nécessite beaucoup de travail technique fastidieux. Tester des solutions en dehors des environnements de production génère peu d'apprentissage jusqu'à ce que l'utilisateur final interagisse avec le logiciel.

  • 00:25:00 - 00:30:00

    Airbite offre un outil qui permet de récupérer des données à partir d'API et de les intégrer dans divers systèmes de base de données et destinations vectorielles. Cela facilite la vie des développeurs intéressés par les prototypes IA.

  • 00:30:00 - 00:35:00

    La présentation d'Eddie de Fractional met en avant l'importance de bien structurer les projets IA. Fractional a participé à la conception de l'AI Assist, en s'assurant que la production utilise des LLM d'une manière qui ajoute réellement de la valeur.

  • 00:35:00 - 00:40:00

    Eddie met en avant l'importance des workflows manuels existants où l'IA pourrait améliorer l'efficacité. Il souligne l'avantage des évaluations automatisées pour s'assurer que les solutions IA apportent une réelle valeur ajoutée aux utilisateurs.

  • 00:40:00 - 00:45:00

    Teao explique l'évolution des critères d'évaluation des projets IA, soulignant l'importance de systèmes d'évaluation robustes pour suivre les progrès et les régressions, assurant ainsi une amélioration constante du logiciel.

  • 00:45:00 - 00:50:00

    Eddie discute des futurs potentiels pour l'IA, notamment la montée des agents autonomes, mais souligne que la plupart des succès résideront dans la spécialisation et l'adoption de systèmes agentiques par domaine.

  • 00:50:00 - 00:57:22

    L'événement se conclut avec une invitation à continuer les discussions et à explorer plus en profondeur les opportunités qu'offrent les projets IA dans une ambiance détendue et collaborative.

もっと見る

マインドマップ

Mind Map

よくある質問

  • Qu'est-ce que Airite ?

    Airite est une plateforme de mouvement de données qui facilite l'accès aux données à partir de systèmes variés via des connecteurs API.

  • Quel était l'objectif principal de l'événement ?

    L'événement visait à informer sur l'intégration de l'IA dans les projets technologiques et fournir un espace interactif pour discuter des bonnes pratiques et des défis, notamment après un événement disruptif récent.

  • Qu'est-ce que l'AI Assist mentionné lors de l'événement ?

    AI Assist est une fonction co-pilot d'AI intégrée dans l'interface utilisateur graphique de construction de connecteurs d'Airite, permettant de simplifier et d'automatiser la création de connecteurs API avec l'aide de l'IA.

  • Quel était le but de la démo faite par Nati ?

    Il était utilisé pour démontrer comment l'IA peut être intégrée dans des outils pour simplifier les processus complexes comme la construction de connecteurs API.

  • Quels aspects de la mise en œuvre de l'IA ont été discutés lors de l'événement ?

    L'événement a mis l'accent sur le partage de connaissances autour de la mise en œuvre d'applications d'IA en production, y compris les tests d'évaluation automatisés (evals) et les flux de travail agents.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:26
    all right I think this works all right
  • 00:00:28
    we're good to go hi everybody it's great
  • 00:00:30
    to see you again fast forward from the
  • 00:00:32
    front door my name is teao I work over
  • 00:00:34
    here at airite we are a data Movement
  • 00:00:37
    platform you're going to be learning all
  • 00:00:38
    about tonight along with our partners
  • 00:00:40
    fractional who you're going to learn all
  • 00:00:41
    about them tonight as well thank you for
  • 00:00:43
    making the time to join us tonight we
  • 00:00:44
    hope you're enjoying the food the drinks
  • 00:00:46
    the company um our our aim here is to
  • 00:00:49
    make this a really fun night and a
  • 00:00:50
    really informative night especially
  • 00:00:52
    because many of you are probably still
  • 00:00:54
    about to start the recovery process from
  • 00:00:56
    disrupt um so we're excited to kind of
  • 00:00:59
    be closing out with you all for the day
  • 00:01:02
    um way we're going to do tonight nti's
  • 00:01:04
    gonna go ahead and come in and give his
  • 00:01:06
    presentation we're going to do our
  • 00:01:07
    fireside chat with Eddie and de we're
  • 00:01:09
    going to learn more about fractional and
  • 00:01:10
    how you can think about uh the AI
  • 00:01:13
    projects that you're working on the dos
  • 00:01:14
    the don'ts uh and really the aim for
  • 00:01:16
    tonight is not only to just be us
  • 00:01:19
    talking here and you listening we want
  • 00:01:21
    this to be interactive so if you have ai
  • 00:01:23
    projects that you're working on it's
  • 00:01:25
    like Eddie and everyone else from a
  • 00:01:26
    fractional perspective give their
  • 00:01:27
    thoughts if you want to Pepper Nati with
  • 00:01:29
    personal questions you can do that stuff
  • 00:01:31
    too um but really the night is meant to
  • 00:01:33
    be all about you uh so we're going to
  • 00:01:36
    try to live up to that but with that
  • 00:01:39
    being said I'm going to shut up and go
  • 00:01:40
    to the back here thank you all for
  • 00:01:41
    joining us again natik I'm gonna hand it
  • 00:01:44
    over to
  • 00:01:48
    [Applause]
  • 00:01:50
    you hello
  • 00:01:52
    hello all right a
  • 00:01:55
    sec I'm clumsy so all right uh my goal
  • 00:02:00
    today is not to sell you all on airbit
  • 00:02:03
    but to put some context on yeah a few
  • 00:02:06
    minutes on what we are doing and why we
  • 00:02:10
    try to do co-pilot um style AI assist in
  • 00:02:14
    our Dev tools what we've got as a result
  • 00:02:17
    what we've learned um how you can use it
  • 00:02:20
    to grab data for your projects and then
  • 00:02:22
    we're going to talk with Eddie and Eddie
  • 00:02:23
    is going to talk to us about how to
  • 00:02:26
    actually um be better at building with
  • 00:02:29
    AI um and avoid common
  • 00:02:33
    pitfalls
  • 00:02:34
    so airite we started just a few years
  • 00:02:38
    back we're almost four years
  • 00:02:40
    oldish and the slide that Michelle our
  • 00:02:43
    CEO shows to every new hire says that
  • 00:02:46
    our mission is to make data available to
  • 00:02:50
    anyone and anywhere if you own your data
  • 00:02:53
    and it's in any systems databases apis
  • 00:02:55
    you should be able to use your data
  • 00:02:57
    that's why there's a bunch of companies
  • 00:02:58
    like zap here or like University cases
  • 00:03:00
    right and turns out to fulfill this
  • 00:03:02
    Mission you know things get much easier
  • 00:03:05
    if you have Frameworks that can read
  • 00:03:07
    data from arbitrary apis that's what my
  • 00:03:10
    team is doing I am an engineering
  • 00:03:12
    manager on API extensibility team we're
  • 00:03:15
    doing Frameworks that power all of our
  • 00:03:17
    API
  • 00:03:19
    connectors um in 2021 2022 we had a
  • 00:03:23
    python cdk um connector developer kit
  • 00:03:25
    framework we had around a 100 connectors
  • 00:03:29
    at that time and we thought okay well
  • 00:03:31
    how do we scale that we have 20
  • 00:03:33
    Engineers supporting 10 certified
  • 00:03:35
    hardcore connectors Community
  • 00:03:37
    contributes connectors but how do we
  • 00:03:38
    maintain all that so in 2023 we made a
  • 00:03:43
    graphical user interface around our low
  • 00:03:45
    code no code framework that encapsulates
  • 00:03:48
    a connector in a basically a bunch of
  • 00:03:51
    yaml kubernetes resource definition
  • 00:03:53
    style and that's great people started
  • 00:03:55
    being able to make a connector in an
  • 00:03:57
    hour versus you know days but it's still
  • 00:04:01
    a cool hour or more so in 2024 we've
  • 00:04:03
    released AI assist which is essentially
  • 00:04:06
    co-pilot for our graphical user
  • 00:04:07
    interface
  • 00:04:08
    tool
  • 00:04:10
    and I want to show you how it works I'm
  • 00:04:14
    99% confident is going to be fine but
  • 00:04:16
    I'm going to do it one-handed so let's
  • 00:04:20
    see just to give you a sense of what
  • 00:04:22
    this thing is so I figured you know what
  • 00:04:24
    are we going to build today we already
  • 00:04:27
    have a lot of connectors so finding one
  • 00:04:29
    that we don't have was a little bit of a
  • 00:04:31
    challenge and my CFO was walking nearby
  • 00:04:36
    and I thought hey juel do you think it's
  • 00:04:38
    cool if I use our financial data for a
  • 00:04:41
    demo for a Meetup and he said you signed
  • 00:04:45
    an NDA you
  • 00:04:52
    stupid my cash was about to be warmed up
  • 00:04:55
    interesting okay this might take us a
  • 00:04:57
    minute so we might as well continue and
  • 00:05:01
    give it a few seconds while that is
  • 00:05:09
    happening yeah let's almost
  • 00:05:13
    smoothly so we're going to return to
  • 00:05:15
    that but to give you
  • 00:05:17
    perspective data transfer companies are
  • 00:05:19
    only as good as the connector coverage
  • 00:05:22
    that we have if we only support 200 apis
  • 00:05:24
    you have your own API does your own
  • 00:05:26
    thing you want your data we don't
  • 00:05:28
    support it you're not going to use us
  • 00:05:30
    so how are we doing well you know we've
  • 00:05:32
    released AI assist and connector builder
  • 00:05:34
    in like
  • 00:05:35
    2023 um we've
  • 00:05:38
    added what approximately 100 connectors
  • 00:05:42
    from August to the end of October and if
  • 00:05:46
    like our total is less than 400 that's a
  • 00:05:49
    lot of
  • 00:05:50
    connectors how is this live demo thing
  • 00:05:53
    doing oh okay so this roughly is our
  • 00:05:58
    connector Builder and and it needs to
  • 00:06:00
    know things about your API it needs to
  • 00:06:02
    know your base URL which a assist
  • 00:06:04
    guessed for me it needs to know how to
  • 00:06:06
    authenticate and it thinks that this API
  • 00:06:08
    is using beer
  • 00:06:10
    token
  • 00:06:11
    which I'm going to paste
  • 00:06:15
    save and we have streams of data so
  • 00:06:18
    transactions is obviously the most
  • 00:06:20
    interesting it figured out where
  • 00:06:22
    transactions live what HTTP method to
  • 00:06:24
    use um where transaction records are
  • 00:06:29
    within the HTTP
  • 00:06:31
    response um it figured the pagination it
  • 00:06:34
    figured where in the response is the
  • 00:06:36
    cursor to the next page let's see if it
  • 00:06:39
    works and if I actually pasted the
  • 00:06:47
    token come
  • 00:06:51
    on here it is okay I'm not going to show
  • 00:06:54
    you the actual records but what's
  • 00:06:55
    important is uh 100 records per page
  • 00:06:58
    five pages test read is successful
  • 00:07:00
    meaning I only had to paste my
  • 00:07:02
    documentation URL and my API token and
  • 00:07:05
    it figured out um how to get my data in
  • 00:07:09
    fact I did this a little bit earlier
  • 00:07:11
    today and got a bunch of streams and
  • 00:07:14
    then I used this little button here to
  • 00:07:17
    make a pull request and we have a pull
  • 00:07:21
    request in our GitHub I'm going to show
  • 00:07:22
    you that in a little bit that's how we
  • 00:07:26
    are growing from 200 something
  • 00:07:27
    connectors to 400 something
  • 00:07:30
    connectors within just these few
  • 00:07:34
    months
  • 00:07:36
    now we tried three times to get this
  • 00:07:40
    thing right it was a hobby project of
  • 00:07:42
    one of our Engineers like oh LMS are
  • 00:07:44
    cool let's build something with LMS um
  • 00:07:46
    didn't quite work
  • 00:07:48
    out the first attempt was very naive
  • 00:07:51
    Eddie will walk you through some of the
  • 00:07:53
    details but we thought you know what
  • 00:07:54
    Chad gpts are cool let's just let's
  • 00:07:56
    paste the docks give the docks to Chad
  • 00:07:58
    GPT and say hey you output the Manifest
  • 00:08:00
    file of the connector and it works on
  • 00:08:03
    super simple things like Pokey API or
  • 00:08:05
    like exchange rate API some something
  • 00:08:07
    super simple with one or two streams of
  • 00:08:09
    data doesn't work on anything serious
  • 00:08:11
    cannot figure out authentication then we
  • 00:08:13
    thought okay well it is very difficult
  • 00:08:15
    for a l large language model to Output
  • 00:08:18
    the Manifest in our format it doesn't
  • 00:08:20
    know the constraints the schema but
  • 00:08:23
    there's a lot of open apis specs on the
  • 00:08:25
    internet so what if we ask it to First
  • 00:08:28
    generate open API spec and then from
  • 00:08:30
    that we're going to euristic generate
  • 00:08:32
    the Manifest it's also extremely
  • 00:08:34
    brittle and then we decided to work with
  • 00:08:37
    fractional on this co-pilot approach
  • 00:08:40
    this works but it's not just a single
  • 00:08:43
    llm
  • 00:08:44
    call it's not just prompt engineering um
  • 00:08:48
    this diagram is probably not very
  • 00:08:50
    visible right but there's basically four
  • 00:08:52
    levels nested logic of how we figure out
  • 00:08:56
    what authentication scheme a given API
  • 00:08:58
    uses given its docs open API spec and if
  • 00:09:03
    we don't have enough information there
  • 00:09:05
    or if there's no open API spec we would
  • 00:09:07
    attempt Googling and scraping Ser
  • 00:09:09
    results uh from Google to figure out how
  • 00:09:12
    to
  • 00:09:14
    authenticate so Core lesson stop magic
  • 00:09:17
    is just a lot a lot a lot of TDS
  • 00:09:19
    software
  • 00:09:20
    engering and the thing there is all of
  • 00:09:24
    that time unless your users are actually
  • 00:09:26
    benefiting from your software you're not
  • 00:09:28
    learning anything and just having a
  • 00:09:30
    prototype doesn't give you much you got
  • 00:09:32
    to figure out where you host it how you
  • 00:09:34
    monitor it how you evaluate it how you
  • 00:09:36
    monitor your budget burn how you figure
  • 00:09:38
    out when it moves out of beta
  • 00:09:41
    Etc so we figured airb is not just an
  • 00:09:45
    open- Source graphic user interface data
  • 00:09:48
    pipelines tool or ETL uh my personal big
  • 00:09:51
    thing here is to make uh system that
  • 00:09:55
    gives you your data in python or in CLI
  • 00:09:58
    you don't have to use air proper you
  • 00:09:59
    don't have to use our graphical user
  • 00:10:01
    interfaces to get your data if you have
  • 00:10:03
    hobby projects or things that you do on
  • 00:10:05
    weekends we should be able to help which
  • 00:10:07
    should be handy if you decide to
  • 00:10:09
    prototype stuff with Eddie and
  • 00:10:11
    fractional later
  • 00:10:12
    on um so what we can do um we have by
  • 00:10:15
    airb which is a CLI or python library
  • 00:10:18
    that can read data again from anywhere
  • 00:10:20
    and write it to local du dbcash and then
  • 00:10:23
    we have a bunch of destinations
  • 00:10:24
    including a bunch of vector destinations
  • 00:10:26
    and PG Vector Bine cone and such
  • 00:10:29
    yeah very interesting time let's build
  • 00:10:31
    some stuff together now I'm going to
  • 00:10:33
    pass it to Eddie um and see what we want
  • 00:10:37
    to talk about next
  • 00:10:41
    [Applause]
  • 00:11:00
    are you moderating this section cool
  • 00:11:02
    well hello everybody uh while we're
  • 00:11:04
    waiting for Teo my name is Eddie I'm the
  • 00:11:06
    CTO at at fractional AI uh where uh Dev
  • 00:11:10
    shop that is specifically focused on
  • 00:11:12
    building challenging production
  • 00:11:14
    applications that that use llms in some
  • 00:11:16
    way so you know we were're uh we helped
  • 00:11:20
    build the the AI assist feature you just
  • 00:11:23
    saw which is like a good good example
  • 00:11:25
    when you're trying to dig into the weeds
  • 00:11:26
    of what some of these production AI
  • 00:11:28
    projects look like but we've also seen
  • 00:11:30
    over a hundred of these projects at this
  • 00:11:31
    point and um yeah I'm excited to talk
  • 00:11:35
    about all things about what it really
  • 00:11:36
    means to put put AI projects into
  • 00:11:39
    production that's for you
  • 00:11:41
    ni um I'm just going to be yelling
  • 00:11:44
    because you two are the most important
  • 00:11:45
    people here and from this side of room
  • 00:11:48
    you all are very important
  • 00:11:49
    obviously um I think where I want to
  • 00:11:52
    start Eddie you already kind of gave us
  • 00:11:55
    a little bit of background fractional uh
  • 00:11:57
    on in terms of working on different
  • 00:11:58
    kinds of projects
  • 00:11:59
    I want to go a little bit more
  • 00:12:01
    into the AI assistant when you thought
  • 00:12:04
    about the kinds of kinds of ways you can
  • 00:12:07
    incorporate AI for new projects like I
  • 00:12:09
    think there's a lot of people who are
  • 00:12:10
    looking around where should I be
  • 00:12:12
    implementing AI um ni you you talk a
  • 00:12:15
    little bit about how we want to bring AI
  • 00:12:18
    into our own workflow what's your first
  • 00:12:20
    advice for anyone who's thinking about
  • 00:12:22
    how can I bring AI into my
  • 00:12:26
    Enterprise it's a good question um I
  • 00:12:28
    think there's like a lot of ideas for
  • 00:12:29
    way AI can help um but that things often
  • 00:12:32
    get stuck early in the ideation process
  • 00:12:34
    or at the PCC phase I think one critical
  • 00:12:37
    thing that happened here was a lot of
  • 00:12:41
    the best opportunities for AI exist in a
  • 00:12:43
    manual workflow that you're already
  • 00:12:45
    running somewhere today uh people were
  • 00:12:47
    already building API connectors here and
  • 00:12:50
    so it was very clear like what was hard
  • 00:12:52
    you had a clear set of input output
  • 00:12:54
    pairs to care about you had clear
  • 00:12:56
    historical data you understood your
  • 00:12:57
    domain and could measure the value of
  • 00:13:00
    this thing right this took us quite a
  • 00:13:02
    while to build um if you're going to
  • 00:13:03
    spend all this time building something
  • 00:13:05
    you got to kind of know that there's a
  • 00:13:06
    there there that it's is like going to
  • 00:13:07
    save a lot of people a lot of real time
  • 00:13:09
    and not just be some speculative um
  • 00:13:11
    thing so that would be like the number
  • 00:13:13
    one thing I would focus on is this a
  • 00:13:16
    real existing manual workflow that looks
  • 00:13:19
    like the llm sort of capability set can
  • 00:13:23
    be applied here well and is it valuable
  • 00:13:25
    enough like if we can actually get there
  • 00:13:27
    does this save us a lot of time does it
  • 00:13:29
    it you know what's what's the financial
  • 00:13:31
    impact to us on this does it save us
  • 00:13:33
    hours does it you know generate new
  • 00:13:35
    Revenue what what kind of sort of uh uh
  • 00:13:37
    impact does it have when I think about
  • 00:13:40
    like what are the core capabilities of
  • 00:13:43
    these llms I basically think about it
  • 00:13:47
    as computers can now read write
  • 00:13:53
    make junior employee level decisions and
  • 00:13:57
    they're sort of domain experts about
  • 00:13:58
    everything and like that's the set of
  • 00:14:00
    things that I would look at in these
  • 00:14:01
    manual workflows rather than like oh
  • 00:14:03
    maybe we can apply AI here and it can
  • 00:14:04
    know everything about everything is this
  • 00:14:06
    very specific oh you know we're spending
  • 00:14:08
    a lot of time reading through API docs
  • 00:14:09
    and saying like what did it say um and
  • 00:14:12
    and that's a pretty llm capable
  • 00:14:15
    task did you have anything you want to
  • 00:14:17
    add there because otherwise I'm going to
  • 00:14:18
    take it to this experience directly
  • 00:14:21
    there's the whole you Scope our project
  • 00:14:23
    you decide you want you're going to do
  • 00:14:24
    it I'd love to know what went wrong in
  • 00:14:27
    this situation
  • 00:14:29
    oh so much
  • 00:14:32
    uh the first thing that jumped to mind
  • 00:14:34
    here is that um I think we failed to
  • 00:14:36
    appreciate upfront just how hard some of
  • 00:14:39
    the pure software engineering parts of
  • 00:14:42
    the crawling of API docs would be I
  • 00:14:44
    think we initially thought about this as
  • 00:14:46
    like step one download the docs step two
  • 00:14:49
    get llm to make a bunch of decisions um
  • 00:14:53
    and does that resonate with other people
  • 00:14:55
    you know one two and you're done all
  • 00:14:57
    right we got some hands over there ni
  • 00:15:00
    um and and fundamentally that is still
  • 00:15:01
    What's Happening Here Right like we're
  • 00:15:03
    trying to build a connector into an API
  • 00:15:05
    the kind of steps involved are go to the
  • 00:15:08
    web page that describes how to connect
  • 00:15:09
    this a to this API read through the docs
  • 00:15:11
    and then make a bunch of decisions okay
  • 00:15:13
    here's how we authenticate provide our
  • 00:15:15
    credentials to log into this API here's
  • 00:15:17
    what the set of endpoints looks like uh
  • 00:15:19
    turns out these documentation pages are
  • 00:15:21
    like everything you can possibly imagine
  • 00:15:23
    times like 10 and you have to support a
  • 00:15:25
    very wide variety of use cases you have
  • 00:15:27
    to handle you know rate limiting and
  • 00:15:29
    some docs are behind authentication and
  • 00:15:31
    some docs are like uh the information is
  • 00:15:34
    not even on the web page it's like you
  • 00:15:35
    know that you've got to click on things
  • 00:15:36
    and it's going to go fetch it from the
  • 00:15:37
    server and handling this super wide
  • 00:15:39
    variety of use cases or preventing
  • 00:15:41
    yourself from going and crawling out to
  • 00:15:42
    irrelevant Pages was incredibly hard and
  • 00:15:45
    even now when we look at failure cases
  • 00:15:48
    more often than not they're not uh an
  • 00:15:51
    the AI making a poort decision based on
  • 00:15:53
    good data it's the AI making something
  • 00:15:55
    up based on no data because we failed to
  • 00:15:57
    actually find the right the right sort
  • 00:15:59
    of source material out of the
  • 00:16:01
    web you just seen me make a demo that
  • 00:16:04
    took what like a minute right to process
  • 00:16:08
    and in this minute it tries to figure
  • 00:16:10
    out the relevant docs and figure out the
  • 00:16:13
    base URL then the stream URL
  • 00:16:15
    authentication scheme
  • 00:16:16
    parameters when we started there was the
  • 00:16:20
    happy path prototype connector like woo
  • 00:16:22
    this works really fast that's great but
  • 00:16:25
    then in some cases it took like four and
  • 00:16:30
    a half something minutes in crawling
  • 00:16:32
    docks in headless Chrome and sometimes
  • 00:16:35
    it would get into Loops so you would
  • 00:16:38
    think like in 2024 crawling pages from
  • 00:16:41
    the web should be solved problem and
  • 00:16:42
    there's a bunch of products that say
  • 00:16:44
    they do it right fir crawl is the one we
  • 00:16:47
    use
  • 00:16:48
    now but can you just out of the box
  • 00:16:51
    Point them and expect them to work like
  • 00:16:54
    nope if you go read like you know a rag
  • 00:16:58
    rag tutorial right now it's going to
  • 00:17:00
    tell you uh you know go download your
  • 00:17:03
    information get get craw the docs
  • 00:17:05
    download the docs uh strip out some HTML
  • 00:17:09
    chunk it up into pieces put it into a
  • 00:17:12
    vector store and then query your vector
  • 00:17:13
    store um and actually we did kind of
  • 00:17:16
    start there the final implementation we
  • 00:17:18
    ended up with looks something more like
  • 00:17:20
    we don't pre-ra anything we wait until
  • 00:17:22
    we have a specific task we're trying to
  • 00:17:23
    do like how do you like what is the
  • 00:17:25
    authentication mechanism does this API
  • 00:17:27
    use you know http basic off to for the
  • 00:17:29
    username password does it use an API key
  • 00:17:32
    what is the method and then we purpose
  • 00:17:34
    go crawl for that we start at the
  • 00:17:36
    homepage of the docs and we ask an llm
  • 00:17:38
    to help us navigate toward you know
  • 00:17:40
    where we'd want to want to go we have so
  • 00:17:42
    many fallback mechanisms in here we have
  • 00:17:44
    multiple different Services we use for
  • 00:17:45
    this crawling because there can be rate
  • 00:17:47
    limiting issues they can be flaky um
  • 00:17:49
    there's there's all sorts of issues
  • 00:17:51
    around that we fall back on doing a
  • 00:17:52
    Google search if we can't find the
  • 00:17:54
    information we're looking for we use
  • 00:17:55
    perplexity at some points in the flow uh
  • 00:17:58
    we have a repos repository under the
  • 00:17:59
    hood of a bunch of pre-built opening API
  • 00:18:01
    specs from common repositories like it
  • 00:18:04
    is very complicated under the hood
  • 00:18:06
    there's a lot a lot going on that
  • 00:18:08
    doesn't look like you know you're uh
  • 00:18:10
    here's how you ask a question of your
  • 00:18:11
    documents or rag
  • 00:18:14
    tutorial and I kind of want to like
  • 00:18:16
    before we go towards like the next
  • 00:18:17
    question there I want to just get a
  • 00:18:19
    pulse for the room probably should have
  • 00:18:20
    started with this but I think it's
  • 00:18:21
    helpful as we're diving deeper into some
  • 00:18:23
    of these Concepts just to make sure
  • 00:18:24
    we're all kind of on that same
  • 00:18:25
    wavelength would you raise your hand if
  • 00:18:27
    you identify a builder in AI right now
  • 00:18:30
    you're building some kind of company or
  • 00:18:32
    product in the space all right great how
  • 00:18:34
    many of you are not necessarily building
  • 00:18:36
    but pretty well versed in the topic
  • 00:18:38
    you're doing a lot of independent
  • 00:18:40
    research and rais all right those two
  • 00:18:44
    together I think we have a large
  • 00:18:44
    majority for everyone else you're
  • 00:18:45
    probably where I'm at in my like Journey
  • 00:18:48
    so you can go ahead and be Googling
  • 00:18:50
    things on the side just like I'm going
  • 00:18:51
    to be doing over here um yeah yeah call
  • 00:18:53
    me out for if I'm getting too technical
  • 00:18:55
    no no no it's good we want we want to go
  • 00:18:57
    de deeper and this being live stream
  • 00:18:59
    record so you can always come back later
  • 00:19:01
    if you have more questions I want to
  • 00:19:03
    talk about that piece then like thinking
  • 00:19:04
    about all these components that go into
  • 00:19:07
    building an AI you think about
  • 00:19:08
    observability you think about the rag
  • 00:19:10
    like could you talk through what are
  • 00:19:14
    core components for you of a successful
  • 00:19:17
    AI project maybe evaluations or or
  • 00:19:19
    things of that nature where do you want
  • 00:19:20
    to take
  • 00:19:22
    this so I think the one of the earliest
  • 00:19:25
    steps in any project that's going to
  • 00:19:27
    reach this level of success ESS um if
  • 00:19:30
    it's going to have any sort of
  • 00:19:31
    meaningful complexity to it is going to
  • 00:19:33
    have to be building evales and what I
  • 00:19:36
    mean by EV vals is basically an
  • 00:19:38
    automated test suite for your
  • 00:19:40
    application but one where you're running
  • 00:19:42
    over lots of examples that you want your
  • 00:19:45
    system to be good at and you're testing
  • 00:19:46
    how well it it does at these things so
  • 00:19:48
    you define some metrics up front to
  • 00:19:50
    measure how well am I doing um and so
  • 00:19:52
    like as a concrete example here we're
  • 00:19:53
    trying to build API Integrations our
  • 00:19:55
    first step was let's go gather a bunch
  • 00:19:58
    of existing API Integrations we built
  • 00:20:01
    let's build a a sort of test harness
  • 00:20:03
    that can generate output from our system
  • 00:20:05
    test it against how well does it match
  • 00:20:08
    up with the things that actually people
  • 00:20:09
    built in the past and we produced a
  • 00:20:10
    whole bunch of metrics around these it's
  • 00:20:13
    it's actually non-trivial to get this
  • 00:20:15
    right um uh you know even though we had
  • 00:20:17
    a really rich set of ground truth to
  • 00:20:19
    look at here you know we had hundreds of
  • 00:20:20
    connectors to people that built the
  • 00:20:22
    comparisons are not very straightforward
  • 00:20:24
    like sometimes our system comes up with
  • 00:20:25
    different names than than people came up
  • 00:20:27
    with or the the community connectors
  • 00:20:29
    might have only a subset of of things
  • 00:20:32
    defined in them they could have defined
  • 00:20:34
    and that's that's okay for their use
  • 00:20:35
    case um so detecting sort of the
  • 00:20:37
    difference between we didn't generate
  • 00:20:40
    something and we should have versus we
  • 00:20:41
    didn't generate something and that's
  • 00:20:42
    fine um is is not uh it's not trivial
  • 00:20:46
    but you got to start somewhere and if
  • 00:20:48
    you don't do this your starting point is
  • 00:20:50
    gonna be very s Vibes based you're gonna
  • 00:20:52
    like run your first best idea some
  • 00:20:56
    sometimes it's going to work which is
  • 00:20:57
    going to be really encouraging and cool
  • 00:20:58
    sometimes it's not and you're not like
  • 00:21:00
    going to kind of have some intuition
  • 00:21:01
    about maybe here's how I improve it but
  • 00:21:02
    it's going to be based on whatever sort
  • 00:21:03
    sitting in front of you this is what
  • 00:21:05
    they ended up looking like at some point
  • 00:21:06
    maybe there's like can you go up a slide
  • 00:21:08
    so this is how it looked at the
  • 00:21:09
    beginning when we started we were just
  • 00:21:11
    like so if you can't see the rows here
  • 00:21:15
    are just example connectors that that
  • 00:21:18
    existed already um and we just picked
  • 00:21:21
    three uh knowing that we wanted to be
  • 00:21:23
    better than just doing these three but
  • 00:21:25
    we started somewhere and then each of
  • 00:21:26
    these columns is some some way that we
  • 00:21:28
    measure ourselves against the ground
  • 00:21:29
    truth so if we ask our system to produce
  • 00:21:31
    a Sentry connector there's already a
  • 00:21:33
    Sentry connector out there how well do
  • 00:21:35
    we do at all these things and uh and
  • 00:21:37
    produce these these metrics and we try
  • 00:21:39
    and kind of like produce a score that is
  • 00:21:41
    roughly weighted by how valuable is it
  • 00:21:43
    to a user if we screw this up or get it
  • 00:21:46
    right uh and and then you start now you
  • 00:21:49
    can actually sort of measure how well
  • 00:21:50
    you're doing this is a super powerful
  • 00:21:53
    tool there's sort of a Dark Art to like
  • 00:21:56
    you know perfect versus good on this but
  • 00:21:59
    um if you get this into a good place it
  • 00:22:02
    guides development in a very real way
  • 00:22:03
    like first of all you can tell in an
  • 00:22:04
    unbiased way like how are we doing
  • 00:22:06
    overall you can track your progress you
  • 00:22:08
    can track regressions and if you sort of
  • 00:22:11
    if you're doing some prompt engineering
  • 00:22:12
    and you're like tweaking the language
  • 00:22:13
    all the time to get better at some
  • 00:22:14
    specific failure mode you're seeing what
  • 00:22:17
    how do you know if you tweak your prompt
  • 00:22:18
    it's like not going to make you worse
  • 00:22:19
    the thing you tried to get better at
  • 00:22:20
    yesterday so this will help you track
  • 00:22:22
    regressions it also
  • 00:22:24
    drives uh the sort of anecdotal evidence
  • 00:22:28
    you want to see for where to invest your
  • 00:22:30
    attention next if you go you know
  • 00:22:32
    like you know we're doing pretty well
  • 00:22:34
    actually at this stage um but like
  • 00:22:37
    there's still some zeros in here um so
  • 00:22:40
    like my intuition from seeing this is
  • 00:22:43
    like okay we're like doing okay at
  • 00:22:44
    whatever this thing is for zenitz and
  • 00:22:46
    we're like doing not that good for this
  • 00:22:48
    schema thing for zenitz like wonder what
  • 00:22:50
    that is and i' click into what it's It
  • 00:22:52
    Go actually look at what we generated
  • 00:22:54
    and say ah okay like this the L&M got
  • 00:22:57
    this wrong because we're feeding it the
  • 00:22:58
    wrong information this is a crawling
  • 00:22:59
    problem not prompting problem and we' go
  • 00:23:02
    update our crawler and so sort of tells
  • 00:23:04
    you what to work on next and then over
  • 00:23:06
    time we expanded to that that next slide
  • 00:23:08
    that you were on a second ago which
  • 00:23:09
    is the evals just got bigger and bigger
  • 00:23:12
    and bigger we just kept getting more use
  • 00:23:13
    cases in there trying to get a wider and
  • 00:23:15
    wider set of examples to look at um and
  • 00:23:18
    it's what drove you know you showed the
  • 00:23:19
    sort of workflow diagram in your slides
  • 00:23:21
    earlier that was like kind of the
  • 00:23:22
    spaghetti look of all the different
  • 00:23:24
    steps that go into just one of the
  • 00:23:25
    questions here that evolved out of this
  • 00:23:28
    exploration trying to get better and
  • 00:23:30
    better by adding more sort of uh catches
  • 00:23:33
    for things that could go
  • 00:23:36
    wrong did you want to add anything
  • 00:23:39
    there can hope to add some context at
  • 00:23:44
    the high level this is the diagram for
  • 00:23:46
    the whole thing in the
  • 00:23:48
    beginning and so the idea was okay we're
  • 00:23:51
    going to crawl all of the documents now
  • 00:23:53
    we're going to index everything shove it
  • 00:23:55
    into a vector store and then there's
  • 00:23:57
    going to be like three four different
  • 00:23:58
    components one's going to figure out the
  • 00:24:00
    AL the other is going to figure out the
  • 00:24:02
    pagination um right and then the the the
  • 00:24:05
    different ones going to figure out the
  • 00:24:07
    list of streams basically stream is an
  • 00:24:08
    API endpoint like oh you know
  • 00:24:10
    repositories and GitHub is a stream
  • 00:24:12
    issues and GitHub is a stream if you
  • 00:24:14
    look at this one right here deaf is
  • 00:24:17
    what's we call a record selectorate air
  • 00:24:19
    bite is basically where exactly in the
  • 00:24:22
    response Json is the useful information
  • 00:24:26
    and the schema means okay what are The
  • 00:24:29
    Columns of data what are the fields of
  • 00:24:31
    the useful objects that we
  • 00:24:33
    want and as we grew into this even the
  • 00:24:37
    number of things that we've paid
  • 00:24:39
    attention to increased and each
  • 00:24:42
    particular component became this huge
  • 00:24:44
    spaghetti because it turns out that like
  • 00:24:47
    originally we thought you know what each
  • 00:24:49
    component is going to be a subset of
  • 00:24:51
    index docs the tagged and a prompt and
  • 00:24:55
    hopefully a single prompt is going to
  • 00:24:58
    just make it fine like we crawled
  • 00:25:00
    everything already right and turns out
  • 00:25:02
    in reality like every component that we
  • 00:25:04
    need answer to like every field where
  • 00:25:06
    you can get an AI assist prompt is
  • 00:25:08
    basically a program in
  • 00:25:11
    itself I want Tove the spaghetti piece
  • 00:25:14
    also we're going to change this up
  • 00:25:15
    because originally I was just going to
  • 00:25:17
    like have a point where it's purely
  • 00:25:18
    audio audience Q&A if you're having
  • 00:25:21
    questions about things as we come up
  • 00:25:23
    raise your hand and I will kind of bring
  • 00:25:25
    you into the conversation rather than
  • 00:25:27
    just wait for the end um but I'm curious
  • 00:25:30
    about how the spaghetti evolves over
  • 00:25:32
    here what surprised you the most about
  • 00:25:35
    the way your evaluation criteria early
  • 00:25:38
    on differ when you think about the end
  • 00:25:46
    state so I I'm surprised by the number
  • 00:25:49
    of random fallbacks and stuff in the
  • 00:25:51
    system like that we're still Google
  • 00:25:53
    searching in perplexity you know
  • 00:25:54
    searching under the hood to get to some
  • 00:25:56
    of the answers we want um
  • 00:26:01
    uh I think a very useful but difficult
  • 00:26:05
    thing on this project was thinking about
  • 00:26:07
    how to progress along this path to how
  • 00:26:10
    do we arrive at the right spaghetti um
  • 00:26:13
    uh because if you were to just guess it
  • 00:26:14
    up front you wouldn't guess right like
  • 00:26:16
    you have to kind of evolve your way
  • 00:26:18
    toward it and then that's intention with
  • 00:26:21
    like how do we know we're going to get
  • 00:26:22
    there like how do we
  • 00:26:24
    know how do we know this is even
  • 00:26:26
    possible um let alone that going to get
  • 00:26:28
    there in like a reasonable amount of
  • 00:26:29
    time and I think that question is very
  • 00:26:33
    challenging for AI projects right like
  • 00:26:34
    there's there's some stat that like 70%
  • 00:26:36
    of of poc's never make it to production
  • 00:26:39
    with with AI projects and I think it's
  • 00:26:42
    very challenging to know what a good POC
  • 00:26:44
    looks like and how to get from there to
  • 00:26:45
    production um and and so if you like
  • 00:26:48
    take take the AI assist project as just
  • 00:26:50
    like an example of a broader
  • 00:26:52
    theme um I mean you mentioned you guys
  • 00:26:55
    tried it a few times before right and
  • 00:26:56
    you weren't exactly sure what do we make
  • 00:26:58
    of this like I think this says this is
  • 00:27:00
    possible but I don't know how we get
  • 00:27:01
    there and like
  • 00:27:03
    the if you were just gonna try tomorrow
  • 00:27:06
    to say like is it possible to build
  • 00:27:08
    these API Integrations with llms like
  • 00:27:10
    the first thing You' try is you just
  • 00:27:12
    like go ask chat GPT to do it you'd show
  • 00:27:13
    chat GPT an example of these connectors
  • 00:27:16
    are just a file under the hood you
  • 00:27:17
    showed chat GPT an example of the file
  • 00:27:19
    and you said you know build me one like
  • 00:27:21
    this but for this
  • 00:27:22
    API and then something will come out
  • 00:27:25
    like probably something pretty good uh
  • 00:27:28
    because the files are sort of
  • 00:27:29
    inscrutable and if you don't know what
  • 00:27:30
    you're looking for it's going to look
  • 00:27:31
    right even if it's like technically
  • 00:27:32
    doesn't run later um and then you're
  • 00:27:34
    kind of stuck you don't really
  • 00:27:36
    know what did this really tell me about
  • 00:27:38
    is it possible you you can't really
  • 00:27:40
    iterate on it like how do you make chat
  • 00:27:42
    PT better at this now um how do you know
  • 00:27:45
    what array of stuff it's good at versus
  • 00:27:46
    bad at and it's not going to get you to
  • 00:27:48
    this like eventual kind of spaghetti
  • 00:27:50
    looking
  • 00:27:50
    diagram um so instead the approach we we
  • 00:27:54
    tend to take is we try and build pcc's
  • 00:27:57
    that are 100% on the critical path to
  • 00:27:58
    production um we try and be thoughtful
  • 00:28:01
    about which pieces we build early but
  • 00:28:03
    early in the project we didn't start by
  • 00:28:05
    saying let's just show like a really
  • 00:28:06
    shiny marketing demo that shows complete
  • 00:28:08
    end to end it working perfectly for one
  • 00:28:11
    connector we said let's pick three
  • 00:28:13
    connectors as examples and it's going to
  • 00:28:15
    start out kind of crappy and then we're
  • 00:28:16
    going to try and make it better over
  • 00:28:17
    time um and that that diagram you showed
  • 00:28:20
    a second ago that's like the the this
  • 00:28:23
    one yes this one this was our sketch
  • 00:28:25
    like a few weeks into the project of
  • 00:28:27
    what we we imagined the eventual
  • 00:28:29
    spaghetti might look like and it ended
  • 00:28:30
    up changing over time and what we tried
  • 00:28:32
    to do was tackle these pieces in order
  • 00:28:35
    um to try and drisk the riskiest parts
  • 00:28:37
    of the project we're like all right
  • 00:28:39
    let's try and work on the box that's
  • 00:28:40
    about authentication right now and see
  • 00:28:42
    like what's it look what's it look like
  • 00:28:44
    start to feel it out get rid of unknown
  • 00:28:45
    unknowns get that to a place where we're
  • 00:28:47
    like I believe that with iteration this
  • 00:28:48
    part is possible then tackle the next
  • 00:28:50
    piece and tackle the next piece and
  • 00:28:52
    start to flesh this out I think actually
  • 00:28:53
    the screenshot like the gray boxes were
  • 00:28:55
    like things we didn't try yet or
  • 00:28:57
    something um or de prioritize for p so
  • 00:29:00
    like you know we hadn't we didn't
  • 00:29:02
    actually tackle all of these but we
  • 00:29:03
    tried to tackle as many as we could to
  • 00:29:04
    start to drisk it and then that process
  • 00:29:08
    drove us to a more robust eval driven
  • 00:29:11
    now it feels like iteration doesn't feel
  • 00:29:13
    like we're building a V1 of something it
  • 00:29:14
    feels like we're kind of like you know
  • 00:29:15
    iterating iterating iterating and that
  • 00:29:17
    drives the ideas for where to add the
  • 00:29:19
    sort of branching Paths of that that
  • 00:29:21
    workflow
  • 00:29:23
    diagram I do like the idea that we
  • 00:29:24
    should only be talking about EV valves
  • 00:29:26
    in the context of spaghetti going
  • 00:29:29
    so let's keep Let's uh maybe keep that
  • 00:29:31
    one up all night um thinking about the
  • 00:29:38
    yeah yeah in terms of the eval how are
  • 00:29:41
    you do are you just compar
  • 00:30:01
    yeah that's that's a great question um
  • 00:30:02
    yeah so the question was like what are
  • 00:30:04
    we measuring how are we doing these
  • 00:30:05
    evals um in this case are we just
  • 00:30:07
    comparing ourselves to an existing
  • 00:30:09
    connector that we know is good or uh he
  • 00:30:11
    said he's heard of some examples of
  • 00:30:13
    using an llm to evaluate how the other
  • 00:30:15
    llm did um it's a great question uh
  • 00:30:21
    so what we see across successful
  • 00:30:24
    projects varies a lot um part of what
  • 00:30:27
    makes the actually difficult is that
  • 00:30:29
    they rarely fit this like clean academic
  • 00:30:32
    standard for what you would want to see
  • 00:30:34
    um clean input output pairs great ground
  • 00:30:37
    truth you know how to compare these
  • 00:30:38
    things and how to measure them sometimes
  • 00:30:39
    the thing we're measuring ourselves
  • 00:30:40
    against is we like ship an example
  • 00:30:43
    output to some team somewhere and we're
  • 00:30:44
    like you're the experts on this domain
  • 00:30:45
    did we do a good job or not they ship it
  • 00:30:47
    back and like trying to evaluate based
  • 00:30:48
    on that and so the the mess wrangling
  • 00:30:51
    the mess is hard um we have seen
  • 00:30:54
    successful examples of using it that
  • 00:30:57
    that technique is called llm as judge
  • 00:30:59
    where you you have an llm evaluate how
  • 00:31:01
    you're doing it's good for like very
  • 00:31:03
    subjective things if you're generating
  • 00:31:04
    free form text and you're like does this
  • 00:31:06
    seem like it answered my question that's
  • 00:31:08
    like a task for an llm in this case we
  • 00:31:10
    were able to circumvent that I think in
  • 00:31:13
    every case uh we do some like
  • 00:31:15
    deterministic fuzzy stuff where we're
  • 00:31:17
    like does this name almost match that
  • 00:31:19
    name if so we're good um uh and so there
  • 00:31:22
    is some like deep Logic for like trying
  • 00:31:26
    to score ourselves uh in a way way
  • 00:31:28
    that's not not as straightforward is
  • 00:31:29
    just like does this thing equal that
  • 00:31:31
    thing um um but we've seen sort of
  • 00:31:34
    everything and at some point you do need
  • 00:31:36
    to sort of stop like looking for the
  • 00:31:38
    perfect thing and find something
  • 00:31:39
    directionally useful um we've had
  • 00:31:41
    projects where like you have a workflow
  • 00:31:43
    diagram this pop this this complicated
  • 00:31:45
    and the only thing we're able to measure
  • 00:31:46
    is like what's going on down here um
  • 00:31:48
    because it's like the only place where
  • 00:31:49
    you can design clean EV EV vals and then
  • 00:31:52
    you just sort of put up with that and
  • 00:31:54
    and do the best you can
  • 00:32:17
    AG
  • 00:32:40
    so it's so what does the output look
  • 00:32:43
    like is actually very critical to what I
  • 00:32:45
    think made this possible here um so
  • 00:32:47
    we've actually built uh sort of uh AI
  • 00:32:51
    powered integration Builders multiple
  • 00:32:53
    times um this is this is one of them for
  • 00:32:55
    airite I think one amazing asset that
  • 00:32:58
    airb has here is they have this format
  • 00:33:01
    that they call their their well I don't
  • 00:33:03
    know what you call it your low low code
  • 00:33:04
    cdk format your your this this spec for
  • 00:33:07
    how to define an API integration as
  • 00:33:09
    configuration instead of as code big
  • 00:33:12
    file that describ and in fact in our
  • 00:33:14
    pipeline we never have an llm write this
  • 00:33:18
    thing as output we write this as output
  • 00:33:20
    deterministically using code and we use
  • 00:33:22
    the llm to answer specific questions we
  • 00:33:24
    have about this process so we ask it
  • 00:33:27
    picking off authentication method for me
  • 00:33:28
    and then we use that to
  • 00:33:29
    deterministically generate the
  • 00:33:30
    authentication part of this that's part
  • 00:33:32
    of what makes this an approachable
  • 00:33:34
    problem we've built this before uh where
  • 00:33:37
    the end goal is to write code performs
  • 00:33:40
    way worse um and even in that process we
  • 00:33:43
    have uh under the hood we have an
  • 00:33:46
    intermediate format that is not I mean
  • 00:33:49
    it's like conceptually similar to this
  • 00:33:52
    that we're using to sort of constrain
  • 00:33:53
    the problem so much of the trick with
  • 00:33:55
    these LMS is constraining the domain in
  • 00:33:56
    which they're thinking right if if you
  • 00:33:58
    say write me some code you're going to
  • 00:34:00
    get something code shaped as output
  • 00:34:01
    whether it's good nobody knows um if you
  • 00:34:04
    ask it for a very specific constrainted
  • 00:34:06
    answer where it's only allowed to answer
  • 00:34:08
    within a very specific Universe it's
  • 00:34:09
    much more tunable it's going to perform
  • 00:34:10
    a lot better just kind of made that
  • 00:34:12
    possible yeah I mean
  • 00:34:25
    I'm I can take that
  • 00:34:29
    so to clarify the last two questions
  • 00:34:33
    it's I think it's both relevant to evals
  • 00:34:35
    ands to outputs uh the way we eval is we
  • 00:34:39
    compare what the model gives us with
  • 00:34:41
    what we have in connectors we know is
  • 00:34:43
    good it's not always one to one because
  • 00:34:46
    for example if you have a stream that's
  • 00:34:48
    called capital T transactions is it
  • 00:34:51
    still the same or like is it if if the
  • 00:34:54
    wording is slightly different but the
  • 00:34:55
    scheme is very similar if the schemas
  • 00:34:58
    are compatible but the columns are not
  • 00:34:59
    the same is it is it a match is it not
  • 00:35:01
    match like that that kind of stuff the
  • 00:35:03
    output is uh are pieces of the Manifest
  • 00:35:07
    and the AI Builder thing like we have a
  • 00:35:10
    python library that enforces the format
  • 00:35:14
    of the Manifest essentially think
  • 00:35:16
    kubernetes resource definitions right
  • 00:35:18
    there are fields that are required they
  • 00:35:20
    can be only of certain format so Builder
  • 00:35:23
    before outputting that as a suggestion
  • 00:35:26
    validates that it's
  • 00:35:28
    legit and then one use case is sure
  • 00:35:32
    right just a co-pilot thing in Builder
  • 00:35:35
    itself um what we see is the match
  • 00:35:38
    success rate like we see successful good
  • 00:35:41
    suggestions very very often like it's
  • 00:35:43
    probably north of 90% on each particular
  • 00:35:46
    field today but the thing is there's a
  • 00:35:48
    bunch of fields and those probabilities
  • 00:35:50
    multiply so the probability that you get
  • 00:35:53
    full connector end to end correctly is
  • 00:35:57
    you slightly lower but we're getting
  • 00:35:59
    there this use case is okay let's get a
  • 00:36:02
    lot of connectors let's make new
  • 00:36:03
    connectors Let's help people make
  • 00:36:06
    connectors for themselves and then share
  • 00:36:07
    them with our community but also I have
  • 00:36:11
    450 connectors and like more than 250 of
  • 00:36:14
    them are in that format so the whole
  • 00:36:16
    connector is just a big manifest file
  • 00:36:18
    and what I can do is I already have a CI
  • 00:36:20
    pipeline that runs every week and you
  • 00:36:23
    see there's this thing called version
  • 00:36:24
    right like this is the version of the
  • 00:36:26
    framework that it's using
  • 00:36:28
    and my CI pipeline checks hey do I have
  • 00:36:30
    a newer version of the framework and if
  • 00:36:32
    I do I'm going to update all of my
  • 00:36:35
    manifest as long as it's not breaking
  • 00:36:37
    another thing we could do basically on
  • 00:36:39
    CI uh or regularly is uh create another
  • 00:36:42
    endpoint in our AI assist thing and have
  • 00:36:46
    another flow where we say hey here's the
  • 00:36:49
    name of the connector here's the API
  • 00:36:51
    docs here's the existing manifest do you
  • 00:36:54
    think there may be some new streams that
  • 00:36:56
    we don't have
  • 00:36:59
    and like these or you know like maybe
  • 00:37:01
    there's a new authentication method
  • 00:37:03
    maybe there are some deprecations that
  • 00:37:04
    we want to clean up today the way this
  • 00:37:07
    works is connector fails for someone the
  • 00:37:09
    stream doesn't work anymore somebody
  • 00:37:11
    files in a GitHub issue they say well
  • 00:37:13
    we're open source you're very welcome to
  • 00:37:14
    contribute they contribute we run
  • 00:37:16
    regression tests verify it's not broken
  • 00:37:18
    then we merge when we had just the
  • 00:37:20
    python framework it took months now it
  • 00:37:23
    takes days but if I can automate this
  • 00:37:27
    cool so thank you for the
  • 00:37:34
    suggestion should I okay I'll do it
  • 00:37:37
    you're oh thanks um all right I kind of
  • 00:37:41
    want to like pull on a Thro a little bit
  • 00:37:43
    more that Samantha brought up which is
  • 00:37:44
    like you can Envision a future of like
  • 00:37:47
    an an agent or something doing this like
  • 00:37:50
    since the GPT era started it seems like
  • 00:37:54
    there's always something new it's
  • 00:37:55
    exciting that people are talking about
  • 00:37:56
    you know it was rag agents um graph rag
  • 00:38:01
    there's countless things in a year from
  • 00:38:03
    now do you feel any of these will
  • 00:38:05
    continue to be just as pertinent a part
  • 00:38:07
    of the conversation or do you think
  • 00:38:09
    something new will be the dominant point
  • 00:38:11
    of
  • 00:38:16
    discussion and if you do think something
  • 00:38:18
    new what is that
  • 00:38:21
    thing I'm less of an AI futurist and
  • 00:38:25
    more of an AI today practitioner uh but
  • 00:38:29
    um you know when people talk about
  • 00:38:32
    agents for example I think there's like
  • 00:38:33
    multiple things they might mean um think
  • 00:38:37
    one thing they might mean is like build
  • 00:38:41
    a thing that's got a lot of autonomy
  • 00:38:42
    around what it can do you give give
  • 00:38:44
    something a bunch of tools and you let
  • 00:38:46
    it sort of decide it's less of this
  • 00:38:47
    deterministic we do this then we do this
  • 00:38:49
    then we do this and you sort of give it
  • 00:38:50
    access to whatever it wants
  • 00:38:54
    um I've yet to see anything like that
  • 00:38:56
    come to fruition in practice for a
  • 00:38:58
    significant system that could see that
  • 00:39:00
    changing over time um but right now it
  • 00:39:02
    seems very um theoretical to me and like
  • 00:39:06
    may may happen if it gets driven by you
  • 00:39:08
    know big big boost to what Foundation
  • 00:39:11
    models are capable of
  • 00:39:13
    um but I think the more interesting
  • 00:39:15
    today thing for for agents what people
  • 00:39:18
    tend to mean is like less around
  • 00:39:20
    autonomy more around specialization like
  • 00:39:22
    how do you break your problem down into
  • 00:39:24
    specific components that are in charge
  • 00:39:26
    of a very small subdomain and are
  • 00:39:28
    experts in that subdomain that I think
  • 00:39:30
    is going to get even more common I think
  • 00:39:31
    people are
  • 00:39:33
    realizing a the complexity of these
  • 00:39:35
    projects in practice you know what looks
  • 00:39:37
    at a high level like hey chat GPT give
  • 00:39:39
    me give me a connector it looks more
  • 00:39:40
    like this under the hood and also that
  • 00:39:44
    so much of uh the sort of mystery of
  • 00:39:46
    what it's like to build with LMS is
  • 00:39:48
    actually just software engineering under
  • 00:39:49
    the hood um I think that is going to
  • 00:39:51
    drive more adoption of these sort of
  • 00:39:54
    that type of agent system um and I we're
  • 00:39:58
    seeing more and more of it we're talking
  • 00:39:59
    about a very sort of tech tech forward
  • 00:40:00
    company Tech forward use case but we
  • 00:40:02
    also see like you know 100y old big
  • 00:40:05
    equipment manufacturers talking about
  • 00:40:06
    these workflows in a very realistic way
  • 00:40:09
    that I think is is going to be in
  • 00:40:10
    production within the next year at at a
  • 00:40:12
    company like that um that you might call
  • 00:40:14
    an agentic workflow um so I see that
  • 00:40:16
    that part of it being very real over the
  • 00:40:17
    next
  • 00:40:25
    year Tak only
  • 00:40:29
    jumped to deep in building this thing my
  • 00:40:32
    Horizon of thinking about AI things a
  • 00:40:34
    year from now is very very
  • 00:40:36
    short my personal biggest thing is like
  • 00:40:40
    we we have manifest connect also have
  • 00:40:43
    python connectors and Java connectors
  • 00:40:46
    and we also have bug bugs in those so my
  • 00:40:48
    biggest dreams are around just those
  • 00:40:51
    software programming agents which can be
  • 00:40:53
    as simple as a little bash script that
  • 00:40:55
    says hey here's a GitHub un call issue
  • 00:40:59
    here's the bug report here are the logs
  • 00:41:01
    here's the directory with all of the
  • 00:41:03
    source files and here's the script that
  • 00:41:05
    builds and tests the connector here's
  • 00:41:07
    the bug
  • 00:41:08
    output can you fix it and then the
  • 00:41:11
    script applies the changes proposed by
  • 00:41:13
    the model runs the tests and if they
  • 00:41:15
    fail it says yeah that didn't work try
  • 00:41:18
    again in a while loop just until it
  • 00:41:21
    wraps up this is my next hobby project I
  • 00:41:24
    think after this thing is successful
  • 00:41:27
    what that means for other Industries and
  • 00:41:31
    for programmers and businesses that
  • 00:41:33
    build with AI EDD is the boss
  • 00:41:44
    there I don't know that we have a final
  • 00:41:46
    one fully
  • 00:41:51
    together I I don't think there is a full
  • 00:41:54
    final one and
  • 00:42:03
    very little framework code under the
  • 00:42:05
    hood there's there's some but it's it's
  • 00:42:06
    not
  • 00:42:10
    substantial it's kind of not
  • 00:42:12
    representative of the final no that's
  • 00:42:14
    okay maybe this is closing the biggest
  • 00:42:16
    place where I think the the diagram
  • 00:42:18
    diverged is like around the The Crawling
  • 00:42:21
    of the docks like we don't do an upfront
  • 00:42:23
    crawling Step at all um and so it's it
  • 00:42:26
    stops looking like
  • 00:42:27
    I guess the other big change is like at
  • 00:42:29
    that point we
  • 00:42:31
    envisioned URL to API docs as input
  • 00:42:35
    connector as output one shot build the
  • 00:42:37
    whole thing all at
  • 00:42:38
    once and uh where it ended up going was
  • 00:42:42
    that is what the initial experience is
  • 00:42:44
    like in the UI but there's lots of
  • 00:42:45
    little buttons you can push to fill in
  • 00:42:46
    fields here and there and so the flow is
  • 00:42:48
    much more decomposed into a set of a
  • 00:42:50
    bunch of different endpoints and smaller
  • 00:42:52
    workflows that leverage some shared
  • 00:42:54
    shared stuff under the hood and so it's
  • 00:42:56
    not exactly left to right end to end
  • 00:42:58
    thing it's like 12 end to end things
  • 00:43:00
    that have some shared
  • 00:43:15
    stuff yes so so there's actually two
  • 00:43:17
    inputs uh you can give us an open API
  • 00:43:20
    spec uh as as input for those that don't
  • 00:43:23
    know an open API spec is like a it's a
  • 00:43:25
    common standard format you can use to
  • 00:43:27
    describe uh an API um it's optional but
  • 00:43:30
    if you give it to us we'll we'll use it
  • 00:43:32
    um we also have our own curated kind of
  • 00:43:36
    repo like common common apis that that
  • 00:43:39
    are out there in their specs that we
  • 00:43:41
    sometimes use um other supplemental
  • 00:43:44
    information is is it's all stuff living
  • 00:43:48
    on the web it's like Google searching um
  • 00:43:52
    uh
  • 00:43:54
    crawling anything El yeah I think that's
  • 00:43:56
    all the supplemental stuff
  • 00:44:04
    I wonder if you for some stages that are
  • 00:44:06
    disconnected andon to an artifact have
  • 00:44:10
    you tried to combine
  • 00:44:21
    them I ended
  • 00:44:24
    up but then
  • 00:44:34
    I
  • 00:44:37
    you but but sometimes it makes sense and
  • 00:44:40
    the following question if you have
  • 00:44:47
    Doney so I think the first part was
  • 00:44:50
    around like instead of treating these
  • 00:44:52
    these different alternative steps for
  • 00:44:53
    finding information as as fallbacks to
  • 00:44:55
    one another can you sort of do them in
  • 00:44:56
    parallel and then try and try and
  • 00:44:58
    combine the information is that is that
  • 00:45:09
    right that are somewhere in here in a
  • 00:45:12
    sequence so have you tried to reconcile
  • 00:45:14
    them in a single step you know with
  • 00:45:18
    let's say let's call it a gentic
  • 00:45:19
    application or a gentic step in which
  • 00:45:23
    you do both tasks you can
  • 00:45:28
    right so the two tasks here are like
  • 00:45:31
    um they're
  • 00:45:33
    basically go out and find the relevant
  • 00:45:36
    information to a question like
  • 00:45:38
    authentication and
  • 00:45:39
    then I cannot
  • 00:45:42
    read imagine that there are two simple
  • 00:45:44
    Tas that you have separated by an
  • 00:45:47
    artifact you generate
  • 00:45:52
    one you instead
  • 00:45:57
    yeah I think it actually often starts
  • 00:46:00
    the opposite way it's like we start with
  • 00:46:01
    a larger problem we're like build this
  • 00:46:03
    whole thing and we're like this needs to
  • 00:46:04
    be broken down and sub
  • 00:46:12
    components possible that's happened
  • 00:46:14
    somewhere in the details I'm like less
  • 00:46:15
    less familiar no use case there is
  • 00:46:18
    jumping to mind but like I think the
  • 00:46:19
    tactic makes sense to me
  • 00:46:22
    um
  • 00:46:24
    uh yeah in practice like one area we've
  • 00:46:27
    had to break things down is like sort of
  • 00:46:29
    deeply nested questions um where like we
  • 00:46:33
    may be asking the the llm like which of
  • 00:46:36
    these authentication methods is used and
  • 00:46:37
    like if it's this one I need this
  • 00:46:38
    information if it's that one I need that
  • 00:46:40
    information it's sort of asking these
  • 00:46:41
    deeply nested questions it like sort of
  • 00:46:42
    falls off and gets lazy and stops
  • 00:46:44
    following the instructions so we've had
  • 00:46:45
    to sort of chop it up into the sub
  • 00:46:47
    pieces so this a little bit like the
  • 00:46:48
    opposite of the flow you're describing
  • 00:46:50
    but like I could see if we if we'
  • 00:46:52
    started out with the sort of multi-step
  • 00:46:54
    version being like I wonder if we can do
  • 00:46:55
    this all at once which does save you on
  • 00:46:58
    latency and
  • 00:47:03
    cost more easier you have that you
  • 00:47:11
    try yeah at least try to recile
  • 00:47:16
    something for example when I started
  • 00:47:19
    doing group
  • 00:47:22
    ofel so I start basically from functions
  • 00:47:25
    and I automate function with a agent
  • 00:47:29
    step AG step and then I I I link
  • 00:47:33
    together but then I say okay this two
  • 00:47:36
    maybe can recile single yeah you instead
  • 00:47:39
    of
  • 00:47:41
    having step agent to maintain have it
  • 00:47:45
    doesn't have to make sense for
  • 00:47:46
    everything and end up a single blob of
  • 00:47:49
    agent that that performs everything it's
  • 00:47:51
    not going to work what you were saying
  • 00:47:52
    at the very begin yeah we've seen I
  • 00:47:55
    think this is only tangentially Rel
  • 00:47:56
    related to what you're asking but we
  • 00:47:57
    have seen on another another project so
  • 00:48:00
    it looks pretty different to this but
  • 00:48:02
    it's fundamentally basically like it's a
  • 00:48:04
    Content moderation projects it's for a a
  • 00:48:06
    company called change.org where they
  • 00:48:08
    they have like a petition uh platform
  • 00:48:11
    where people can can post petitions
  • 00:48:13
    about you know political things and
  • 00:48:15
    local things and stuff like that um and
  • 00:48:17
    they have kind of a challenging content
  • 00:48:20
    moderation problem because it's not as
  • 00:48:21
    simple as saying like did someone just
  • 00:48:24
    post spam or did someone just post hate
  • 00:48:25
    speech it's actually like a valid use of
  • 00:48:27
    their platform to say something like
  • 00:48:29
    somewhat inflammatory but like it can't
  • 00:48:31
    cross the lines of of their Community
  • 00:48:33
    guidelines and so um getting uh these
  • 00:48:37
    agents to sort of understand the
  • 00:48:38
    different nuances of like what does it
  • 00:48:40
    mean to to um to violate our policies is
  • 00:48:44
    is challenging and under the hood what
  • 00:48:46
    we do is we have these sort of
  • 00:48:47
    specialist agents that do look at this
  • 00:48:49
    through different lenses they write out
  • 00:48:51
    their sort of reasoning their Chain of
  • 00:48:53
    Thought they give us confidence scores
  • 00:48:54
    at the end and then we take a bunch of
  • 00:48:56
    these different answers together at the
  • 00:48:57
    end and we give it to one bigger process
  • 00:48:59
    that's like all right now that you
  • 00:49:00
    understand all the Nuance of these
  • 00:49:01
    different angles make a final decision
  • 00:49:03
    and it's sort of combining um these
  • 00:49:05
    different sort of Sub sub viewpoints if
  • 00:49:06
    that makes sense it's not exactly what
  • 00:49:08
    you were talking about but it's a sort
  • 00:49:09
    similar idea um on the 01 question uh he
  • 00:49:13
    asked if we had tried o one at any point
  • 00:49:15
    um we have um uh the biggest drawback
  • 00:49:20
    with o one is that it's slow um so this
  • 00:49:23
    is like just too latency sensitive of an
  • 00:49:25
    application we already have um takes a
  • 00:49:27
    while to build to generate a connector
  • 00:49:29
    here there's a lot of substeps if you
  • 00:49:30
    added 20 seconds to one of the prompts
  • 00:49:32
    it would probably be a nonstarter
  • 00:49:34
    especially given that the bottleneck
  • 00:49:35
    here is less the ai's intelligence and
  • 00:49:39
    more our ability to give the AI the
  • 00:49:40
    right information at the right
  • 00:49:42
    time we are getting that point where we
  • 00:49:46
    have a lot of pizza that people still
  • 00:49:49
    toat so I I want to start putting the
  • 00:49:52
    bows on the present here and just
  • 00:49:54
    confirm is there anything else that you
  • 00:49:55
    all wanted share with the audience that
  • 00:49:57
    we haven't had a chance to talk about
  • 00:49:59
    and I will also give the opportunity if
  • 00:50:01
    you have any burning final questions
  • 00:50:03
    feel free getting those in there but I
  • 00:50:05
    know there's slides there's a lot of
  • 00:50:06
    things that you all might want to show
  • 00:50:08
    anything you wanted to kind of TOS
  • 00:50:13
    out anything else
  • 00:50:17
    yes to cont
  • 00:50:36
    so we're trying toise some
  • 00:50:49
    to
  • 00:50:51
    resp speak
  • 00:51:24
    spaghet so I guess I'll start by saying
  • 00:51:26
    this domain sounds very hard
  • 00:51:29
    um the the thing that makes me say it
  • 00:51:31
    sounds hard is that um hirings sounds
  • 00:51:35
    hard and like uh we struggle to train
  • 00:51:37
    humans to do it today um so getting
  • 00:51:43
    getting if I struggle to picture how to
  • 00:51:45
    get uh a pretty Junior uh person to
  • 00:51:49
    figure out how to reliably produce this
  • 00:51:50
    output then I also struggle to see how
  • 00:51:52
    to get an LM to do it the the analogy
  • 00:51:54
    that jumps to mind though is um
  • 00:51:58
    this kind of problem is present for AI
  • 00:52:01
    phone agent applications there's a lot
  • 00:52:03
    of you know people trying to put AI
  • 00:52:04
    agents on the phone they have to sort of
  • 00:52:06
    be robust in the face of people can say
  • 00:52:09
    anything um it's hard to build
  • 00:52:12
    uh customer support bot for an airline
  • 00:52:14
    if if you're afraid that it's you know
  • 00:52:16
    gonna just like give someone a free
  • 00:52:17
    ticket because you say ignore previous
  • 00:52:19
    instructions you know um I don't get the
  • 00:52:22
    sense that anyone's like figured this
  • 00:52:23
    out super well um the tactic they use
  • 00:52:26
    there is is sort of a hybrid
  • 00:52:28
    between a um almost like a what you
  • 00:52:32
    picture for like a phone tree where you
  • 00:52:33
    can just you know press press one if
  • 00:52:35
    you're a good candidate um and and and
  • 00:52:38
    still leveraging uh you know like the
  • 00:52:40
    the lm's ability to handle inputs as
  • 00:52:43
    never seen before and so it tends to
  • 00:52:45
    look like a state machine where you have
  • 00:52:47
    different states that the agent can be
  • 00:52:49
    in it's trying to assess it's very
  • 00:52:50
    specific narrow things at each point in
  • 00:52:52
    the state but that the way it decides to
  • 00:52:54
    move from state to state is B based on
  • 00:52:56
    llm logic you know logic described in
  • 00:52:58
    English not a very deterministic uh sort
  • 00:53:01
    of thing um and then I would still take
  • 00:53:04
    the approach of build evals based on Old
  • 00:53:07
    transcripts of calls that have gone off
  • 00:53:08
    the rails and measure yourself against
  • 00:53:09
    like known known bad use cases um
  • 00:53:13
    getting to Perfection on this sounds
  • 00:53:15
    sounds pretty challenging um also
  • 00:53:17
    getting nlms to to state how confident
  • 00:53:20
    they are in something is his own sort of
  • 00:53:21
    sub problem and so like you may be able
  • 00:53:23
    to get this eventually to a point where
  • 00:53:25
    it can tell you when it doesn't know but
  • 00:53:27
    tuning that is also going to be
  • 00:53:28
    challenging because they sort of
  • 00:53:29
    overstate their confidence
  • 00:54:00
    yes but the interpretation and tuning is
  • 00:54:02
    is like a real challenge like
  • 00:54:04
    um a lot of our projects have have steps
  • 00:54:07
    in the middle of the workflow where
  • 00:54:09
    we're asking we're asking for an
  • 00:54:12
    evaluation of the form of like think out
  • 00:54:14
    loud then come up with your answer and
  • 00:54:17
    then tell us you know how confident are
  • 00:54:19
    you in your answer it's usually not a
  • 00:54:20
    number it's usually like low medium high
  • 00:54:22
    very high and then you don't just trust
  • 00:54:24
    what that means you measure it against
  • 00:54:25
    your EV like is this predictive of
  • 00:54:27
    anything like um seems like very high
  • 00:54:30
    means like maybe possibly correct and so
  • 00:54:32
    you only filter down to maybe
  • 00:54:35
    high so do you have
  • 00:54:38
    any talking you have a things that are
  • 00:54:43
    work out really well like give you one
  • 00:54:45
    example I found out that for me if I put
  • 00:54:48
    in uh some example inputs and some
  • 00:54:51
    perfect outputs into the context then
  • 00:54:54
    you know splits out result like simar
  • 00:54:57
    like
  • 00:55:01
    recation the examples thing does work uh
  • 00:55:04
    um showing it examples usually gets us
  • 00:55:06
    sort back on the rails like I'm sure
  • 00:55:07
    you've seen all the sort of trendy
  • 00:55:08
    little tricks you know offered a big tip
  • 00:55:10
    like say put a bunch of exclamation
  • 00:55:12
    points in there offer to fire it if it's
  • 00:55:13
    not going to do a good job like those
  • 00:55:14
    those things I think you
  • 00:55:16
    know may give you lift uh it's going to
  • 00:55:19
    be challenging to know if you don't if
  • 00:55:20
    you don't measure it
  • 00:55:23
    um I think more often in practice it's
  • 00:55:26
    it's around
  • 00:55:28
    um finding specific cases where you did
  • 00:55:31
    poorly and then baking them into your
  • 00:55:32
    prompt um uh you know trying a wide
  • 00:55:35
    variety of things noticing that it's
  • 00:55:37
    sort of off on this case and then
  • 00:55:39
    describing that case to it
  • 00:55:42
    um I don't have handy like a list of
  • 00:55:44
    things I mean um but uh and I bet if I
  • 00:55:48
    pulled the folks on our team everybody's
  • 00:55:50
    got a different uh set of favorite bag
  • 00:55:53
    of tricks um which I think is is um
  • 00:55:57
    it's also a danger on on these AR
  • 00:55:59
    projects is that um it's easy to fall
  • 00:56:02
    into like you just like it's like a
  • 00:56:04
    really good nerd snip machine right like
  • 00:56:06
    you can be like I'm pretty sure tipping
  • 00:56:07
    is g to be a great the great thing to
  • 00:56:09
    try on this project and so the evals
  • 00:56:11
    help keep you keep you on on task there
  • 00:56:13
    um the set of tactics is out there right
  • 00:56:15
    you can Google search for for people's
  • 00:56:16
    long list of tactics one random thing
  • 00:56:18
    we've had good success with is is
  • 00:56:20
    anthropic has a this prompt generator um
  • 00:56:23
    and you can just paste in your your
  • 00:56:25
    current prompt and it'll rewrite it
  • 00:56:27
    we've had surprising results where like
  • 00:56:28
    visually it doesn't look any better
  • 00:56:30
    we're like that's kind of what I already
  • 00:56:31
    said in my prompt and then like the
  • 00:56:33
    metrics just go up
  • 00:56:35
    um but it's not one weird trick it's
  • 00:56:38
    like try lots of things and measure your
  • 00:56:41
    progress all right thank you everybody
  • 00:56:44
    for coming tonight we're super excited
  • 00:56:47
    uh that you made their time to be with
  • 00:56:49
    us and quick round Applause for Eddie
  • 00:56:51
    and
  • 00:56:57
    the office is going to be open for the
  • 00:56:58
    next 20 minutes or so so like I said
  • 00:57:00
    lots of pizza to eat there's still
  • 00:57:02
    drinks as well so go enjoy pester Eddie
  • 00:57:05
    and with any further questions maybe you
  • 00:57:07
    didn't get a chance to ask it now they
  • 00:57:09
    are going to be around and if they
  • 00:57:10
    weren't planning to now they are um but
  • 00:57:13
    again thank you for being here hope you
  • 00:57:14
    had a great time let's keep partying
タグ
  • Airite
  • IA Assist
  • connecteurs API
  • évaluations (evals)
  • développement IA
  • plateforme de données
  • intégration IA
  • outil co-pilot
  • automatisation
  • flux de travail agents