How Airbyte Uses AI to Build Connectors
Resumen
TLDRL'événement a exposé les participants à l'importance de l'IA dans les projets technologiques modernes, avec un accent particulier sur Airite, une plateforme de mouvement de données, et leur fonction AI Assist qui utilise l'IA pour automatiser la création de connecteurs API. Les présentateurs ont discuté du cycle de développement, des défis et des succès de l'IA Assist, notamment l'importance des évaluations (evals) dans l'amélioration continue des outils IA. L'événement a servi à partager des expériences entre professionnels de l'industrie, s'attardant sur la nécessité de bien structurer les projets IA pour éviter les embûches courantes et garantir le succès en production. Enfin, l'accent a été mis sur le fait que l'intégration de l'IA doit se faire dans le cadre de flux de travail manuels déjà en place pour maximiser son efficacité.
Para llevar
- 🤝 L'événement favorise l'interaction et le partage de connaissances sur l'intégration de l'IA.
- 💡 Airite et AI Assist représentent des outils clés pour la création automatisée de connecteurs API.
- 🛠 Le développement IA nécessite des évaluations pour garantir l'efficacité et le succès.
- ⚙️ L'automatisation de tâches complexes comme les connecteurs se fait avec une IA bien structurée.
- 🔍 La compréhension et l'extraction correcte des données d'API sont cruciales.
- 🧩 Les flux de travail doivent être bien structurés pour intégrer efficacement l'IA.
- 🎯 L'importance de comprendre les besoins réels avant de démarrer des projets IA.
- 🔄 Le processus de développement IA est itératif et nécessite des ajustements constants.
- 📊 Partage des défis et solutions lors de la mise en œuvre de l'IA en production.
- 🚀 Des outils comme AI Assist peuvent rapprocher l'objectif d'un accès facile aux données.
Cronología
- 00:00:00 - 00:05:00
Teao présente Airite, une plateforme de mouvement de données, lors d'un événement interactif avec Fractional. L'objectif est de discuter des projets IA, des pratiques à faire et à éviter, et d'encourager une participation active du public.
- 00:05:00 - 00:10:00
Natik présente l'objectif d'Airite de rendre les données accessibles à tous. La société concentre ses efforts sur les frameworks capables de lire des données d'API arbitraires et a lancé AI Assist pour améliorer l'efficacité dans la création de connecteurs API.
- 00:10:00 - 00:15:00
Une démonstration d'AI Assist montre comment la création de connecteurs API peut être simplifiée de plusieurs jours à environ une heure. Cette innovation permet de faire des connexions API plus rapidement, ce qui est essentiel pour étendre la couverture API d'Airite.
- 00:15:00 - 00:20:00
AI Assist a commencé comme un projet naïf utilisant ChatGPT. L'approche était trop simple et n'a pas bien fonctionné pour des tâches complexes. L'équipe a ensuite développé une approche plus sophistiquée avec Fractional, combinant LLM avec une logique logiciel étendue.
- 00:20:00 - 00:25:00
La leçon principale tirée est que la 'magie' de l'IA nécessite beaucoup de travail technique fastidieux. Tester des solutions en dehors des environnements de production génère peu d'apprentissage jusqu'à ce que l'utilisateur final interagisse avec le logiciel.
- 00:25:00 - 00:30:00
Airbite offre un outil qui permet de récupérer des données à partir d'API et de les intégrer dans divers systèmes de base de données et destinations vectorielles. Cela facilite la vie des développeurs intéressés par les prototypes IA.
- 00:30:00 - 00:35:00
La présentation d'Eddie de Fractional met en avant l'importance de bien structurer les projets IA. Fractional a participé à la conception de l'AI Assist, en s'assurant que la production utilise des LLM d'une manière qui ajoute réellement de la valeur.
- 00:35:00 - 00:40:00
Eddie met en avant l'importance des workflows manuels existants où l'IA pourrait améliorer l'efficacité. Il souligne l'avantage des évaluations automatisées pour s'assurer que les solutions IA apportent une réelle valeur ajoutée aux utilisateurs.
- 00:40:00 - 00:45:00
Teao explique l'évolution des critères d'évaluation des projets IA, soulignant l'importance de systèmes d'évaluation robustes pour suivre les progrès et les régressions, assurant ainsi une amélioration constante du logiciel.
- 00:45:00 - 00:50:00
Eddie discute des futurs potentiels pour l'IA, notamment la montée des agents autonomes, mais souligne que la plupart des succès résideront dans la spécialisation et l'adoption de systèmes agentiques par domaine.
- 00:50:00 - 00:57:22
L'événement se conclut avec une invitation à continuer les discussions et à explorer plus en profondeur les opportunités qu'offrent les projets IA dans une ambiance détendue et collaborative.
Mapa mental
Preguntas frecuentes
Qu'est-ce que Airite ?
Airite est une plateforme de mouvement de données qui facilite l'accès aux données à partir de systèmes variés via des connecteurs API.
Quel était l'objectif principal de l'événement ?
L'événement visait à informer sur l'intégration de l'IA dans les projets technologiques et fournir un espace interactif pour discuter des bonnes pratiques et des défis, notamment après un événement disruptif récent.
Qu'est-ce que l'AI Assist mentionné lors de l'événement ?
AI Assist est une fonction co-pilot d'AI intégrée dans l'interface utilisateur graphique de construction de connecteurs d'Airite, permettant de simplifier et d'automatiser la création de connecteurs API avec l'aide de l'IA.
Quel était le but de la démo faite par Nati ?
Il était utilisé pour démontrer comment l'IA peut être intégrée dans des outils pour simplifier les processus complexes comme la construction de connecteurs API.
Quels aspects de la mise en œuvre de l'IA ont été discutés lors de l'événement ?
L'événement a mis l'accent sur le partage de connaissances autour de la mise en œuvre d'applications d'IA en production, y compris les tests d'évaluation automatisés (evals) et les flux de travail agents.
Ver más resúmenes de vídeos
- 00:00:26all right I think this works all right
- 00:00:28we're good to go hi everybody it's great
- 00:00:30to see you again fast forward from the
- 00:00:32front door my name is teao I work over
- 00:00:34here at airite we are a data Movement
- 00:00:37platform you're going to be learning all
- 00:00:38about tonight along with our partners
- 00:00:40fractional who you're going to learn all
- 00:00:41about them tonight as well thank you for
- 00:00:43making the time to join us tonight we
- 00:00:44hope you're enjoying the food the drinks
- 00:00:46the company um our our aim here is to
- 00:00:49make this a really fun night and a
- 00:00:50really informative night especially
- 00:00:52because many of you are probably still
- 00:00:54about to start the recovery process from
- 00:00:56disrupt um so we're excited to kind of
- 00:00:59be closing out with you all for the day
- 00:01:02um way we're going to do tonight nti's
- 00:01:04gonna go ahead and come in and give his
- 00:01:06presentation we're going to do our
- 00:01:07fireside chat with Eddie and de we're
- 00:01:09going to learn more about fractional and
- 00:01:10how you can think about uh the AI
- 00:01:13projects that you're working on the dos
- 00:01:14the don'ts uh and really the aim for
- 00:01:16tonight is not only to just be us
- 00:01:19talking here and you listening we want
- 00:01:21this to be interactive so if you have ai
- 00:01:23projects that you're working on it's
- 00:01:25like Eddie and everyone else from a
- 00:01:26fractional perspective give their
- 00:01:27thoughts if you want to Pepper Nati with
- 00:01:29personal questions you can do that stuff
- 00:01:31too um but really the night is meant to
- 00:01:33be all about you uh so we're going to
- 00:01:36try to live up to that but with that
- 00:01:39being said I'm going to shut up and go
- 00:01:40to the back here thank you all for
- 00:01:41joining us again natik I'm gonna hand it
- 00:01:44over to
- 00:01:48[Applause]
- 00:01:50you hello
- 00:01:52hello all right a
- 00:01:55sec I'm clumsy so all right uh my goal
- 00:02:00today is not to sell you all on airbit
- 00:02:03but to put some context on yeah a few
- 00:02:06minutes on what we are doing and why we
- 00:02:10try to do co-pilot um style AI assist in
- 00:02:14our Dev tools what we've got as a result
- 00:02:17what we've learned um how you can use it
- 00:02:20to grab data for your projects and then
- 00:02:22we're going to talk with Eddie and Eddie
- 00:02:23is going to talk to us about how to
- 00:02:26actually um be better at building with
- 00:02:29AI um and avoid common
- 00:02:33pitfalls
- 00:02:34so airite we started just a few years
- 00:02:38back we're almost four years
- 00:02:40oldish and the slide that Michelle our
- 00:02:43CEO shows to every new hire says that
- 00:02:46our mission is to make data available to
- 00:02:50anyone and anywhere if you own your data
- 00:02:53and it's in any systems databases apis
- 00:02:55you should be able to use your data
- 00:02:57that's why there's a bunch of companies
- 00:02:58like zap here or like University cases
- 00:03:00right and turns out to fulfill this
- 00:03:02Mission you know things get much easier
- 00:03:05if you have Frameworks that can read
- 00:03:07data from arbitrary apis that's what my
- 00:03:10team is doing I am an engineering
- 00:03:12manager on API extensibility team we're
- 00:03:15doing Frameworks that power all of our
- 00:03:17API
- 00:03:19connectors um in 2021 2022 we had a
- 00:03:23python cdk um connector developer kit
- 00:03:25framework we had around a 100 connectors
- 00:03:29at that time and we thought okay well
- 00:03:31how do we scale that we have 20
- 00:03:33Engineers supporting 10 certified
- 00:03:35hardcore connectors Community
- 00:03:37contributes connectors but how do we
- 00:03:38maintain all that so in 2023 we made a
- 00:03:43graphical user interface around our low
- 00:03:45code no code framework that encapsulates
- 00:03:48a connector in a basically a bunch of
- 00:03:51yaml kubernetes resource definition
- 00:03:53style and that's great people started
- 00:03:55being able to make a connector in an
- 00:03:57hour versus you know days but it's still
- 00:04:01a cool hour or more so in 2024 we've
- 00:04:03released AI assist which is essentially
- 00:04:06co-pilot for our graphical user
- 00:04:07interface
- 00:04:08tool
- 00:04:10and I want to show you how it works I'm
- 00:04:1499% confident is going to be fine but
- 00:04:16I'm going to do it one-handed so let's
- 00:04:20see just to give you a sense of what
- 00:04:22this thing is so I figured you know what
- 00:04:24are we going to build today we already
- 00:04:27have a lot of connectors so finding one
- 00:04:29that we don't have was a little bit of a
- 00:04:31challenge and my CFO was walking nearby
- 00:04:36and I thought hey juel do you think it's
- 00:04:38cool if I use our financial data for a
- 00:04:41demo for a Meetup and he said you signed
- 00:04:45an NDA you
- 00:04:52stupid my cash was about to be warmed up
- 00:04:55interesting okay this might take us a
- 00:04:57minute so we might as well continue and
- 00:05:01give it a few seconds while that is
- 00:05:09happening yeah let's almost
- 00:05:13smoothly so we're going to return to
- 00:05:15that but to give you
- 00:05:17perspective data transfer companies are
- 00:05:19only as good as the connector coverage
- 00:05:22that we have if we only support 200 apis
- 00:05:24you have your own API does your own
- 00:05:26thing you want your data we don't
- 00:05:28support it you're not going to use us
- 00:05:30so how are we doing well you know we've
- 00:05:32released AI assist and connector builder
- 00:05:34in like
- 00:05:352023 um we've
- 00:05:38added what approximately 100 connectors
- 00:05:42from August to the end of October and if
- 00:05:46like our total is less than 400 that's a
- 00:05:49lot of
- 00:05:50connectors how is this live demo thing
- 00:05:53doing oh okay so this roughly is our
- 00:05:58connector Builder and and it needs to
- 00:06:00know things about your API it needs to
- 00:06:02know your base URL which a assist
- 00:06:04guessed for me it needs to know how to
- 00:06:06authenticate and it thinks that this API
- 00:06:08is using beer
- 00:06:10token
- 00:06:11which I'm going to paste
- 00:06:15save and we have streams of data so
- 00:06:18transactions is obviously the most
- 00:06:20interesting it figured out where
- 00:06:22transactions live what HTTP method to
- 00:06:24use um where transaction records are
- 00:06:29within the HTTP
- 00:06:31response um it figured the pagination it
- 00:06:34figured where in the response is the
- 00:06:36cursor to the next page let's see if it
- 00:06:39works and if I actually pasted the
- 00:06:47token come
- 00:06:51on here it is okay I'm not going to show
- 00:06:54you the actual records but what's
- 00:06:55important is uh 100 records per page
- 00:06:58five pages test read is successful
- 00:07:00meaning I only had to paste my
- 00:07:02documentation URL and my API token and
- 00:07:05it figured out um how to get my data in
- 00:07:09fact I did this a little bit earlier
- 00:07:11today and got a bunch of streams and
- 00:07:14then I used this little button here to
- 00:07:17make a pull request and we have a pull
- 00:07:21request in our GitHub I'm going to show
- 00:07:22you that in a little bit that's how we
- 00:07:26are growing from 200 something
- 00:07:27connectors to 400 something
- 00:07:30connectors within just these few
- 00:07:34months
- 00:07:36now we tried three times to get this
- 00:07:40thing right it was a hobby project of
- 00:07:42one of our Engineers like oh LMS are
- 00:07:44cool let's build something with LMS um
- 00:07:46didn't quite work
- 00:07:48out the first attempt was very naive
- 00:07:51Eddie will walk you through some of the
- 00:07:53details but we thought you know what
- 00:07:54Chad gpts are cool let's just let's
- 00:07:56paste the docks give the docks to Chad
- 00:07:58GPT and say hey you output the Manifest
- 00:08:00file of the connector and it works on
- 00:08:03super simple things like Pokey API or
- 00:08:05like exchange rate API some something
- 00:08:07super simple with one or two streams of
- 00:08:09data doesn't work on anything serious
- 00:08:11cannot figure out authentication then we
- 00:08:13thought okay well it is very difficult
- 00:08:15for a l large language model to Output
- 00:08:18the Manifest in our format it doesn't
- 00:08:20know the constraints the schema but
- 00:08:23there's a lot of open apis specs on the
- 00:08:25internet so what if we ask it to First
- 00:08:28generate open API spec and then from
- 00:08:30that we're going to euristic generate
- 00:08:32the Manifest it's also extremely
- 00:08:34brittle and then we decided to work with
- 00:08:37fractional on this co-pilot approach
- 00:08:40this works but it's not just a single
- 00:08:43llm
- 00:08:44call it's not just prompt engineering um
- 00:08:48this diagram is probably not very
- 00:08:50visible right but there's basically four
- 00:08:52levels nested logic of how we figure out
- 00:08:56what authentication scheme a given API
- 00:08:58uses given its docs open API spec and if
- 00:09:03we don't have enough information there
- 00:09:05or if there's no open API spec we would
- 00:09:07attempt Googling and scraping Ser
- 00:09:09results uh from Google to figure out how
- 00:09:12to
- 00:09:14authenticate so Core lesson stop magic
- 00:09:17is just a lot a lot a lot of TDS
- 00:09:19software
- 00:09:20engering and the thing there is all of
- 00:09:24that time unless your users are actually
- 00:09:26benefiting from your software you're not
- 00:09:28learning anything and just having a
- 00:09:30prototype doesn't give you much you got
- 00:09:32to figure out where you host it how you
- 00:09:34monitor it how you evaluate it how you
- 00:09:36monitor your budget burn how you figure
- 00:09:38out when it moves out of beta
- 00:09:41Etc so we figured airb is not just an
- 00:09:45open- Source graphic user interface data
- 00:09:48pipelines tool or ETL uh my personal big
- 00:09:51thing here is to make uh system that
- 00:09:55gives you your data in python or in CLI
- 00:09:58you don't have to use air proper you
- 00:09:59don't have to use our graphical user
- 00:10:01interfaces to get your data if you have
- 00:10:03hobby projects or things that you do on
- 00:10:05weekends we should be able to help which
- 00:10:07should be handy if you decide to
- 00:10:09prototype stuff with Eddie and
- 00:10:11fractional later
- 00:10:12on um so what we can do um we have by
- 00:10:15airb which is a CLI or python library
- 00:10:18that can read data again from anywhere
- 00:10:20and write it to local du dbcash and then
- 00:10:23we have a bunch of destinations
- 00:10:24including a bunch of vector destinations
- 00:10:26and PG Vector Bine cone and such
- 00:10:29yeah very interesting time let's build
- 00:10:31some stuff together now I'm going to
- 00:10:33pass it to Eddie um and see what we want
- 00:10:37to talk about next
- 00:10:41[Applause]
- 00:11:00are you moderating this section cool
- 00:11:02well hello everybody uh while we're
- 00:11:04waiting for Teo my name is Eddie I'm the
- 00:11:06CTO at at fractional AI uh where uh Dev
- 00:11:10shop that is specifically focused on
- 00:11:12building challenging production
- 00:11:14applications that that use llms in some
- 00:11:16way so you know we were're uh we helped
- 00:11:20build the the AI assist feature you just
- 00:11:23saw which is like a good good example
- 00:11:25when you're trying to dig into the weeds
- 00:11:26of what some of these production AI
- 00:11:28projects look like but we've also seen
- 00:11:30over a hundred of these projects at this
- 00:11:31point and um yeah I'm excited to talk
- 00:11:35about all things about what it really
- 00:11:36means to put put AI projects into
- 00:11:39production that's for you
- 00:11:41ni um I'm just going to be yelling
- 00:11:44because you two are the most important
- 00:11:45people here and from this side of room
- 00:11:48you all are very important
- 00:11:49obviously um I think where I want to
- 00:11:52start Eddie you already kind of gave us
- 00:11:55a little bit of background fractional uh
- 00:11:57on in terms of working on different
- 00:11:58kinds of projects
- 00:11:59I want to go a little bit more
- 00:12:01into the AI assistant when you thought
- 00:12:04about the kinds of kinds of ways you can
- 00:12:07incorporate AI for new projects like I
- 00:12:09think there's a lot of people who are
- 00:12:10looking around where should I be
- 00:12:12implementing AI um ni you you talk a
- 00:12:15little bit about how we want to bring AI
- 00:12:18into our own workflow what's your first
- 00:12:20advice for anyone who's thinking about
- 00:12:22how can I bring AI into my
- 00:12:26Enterprise it's a good question um I
- 00:12:28think there's like a lot of ideas for
- 00:12:29way AI can help um but that things often
- 00:12:32get stuck early in the ideation process
- 00:12:34or at the PCC phase I think one critical
- 00:12:37thing that happened here was a lot of
- 00:12:41the best opportunities for AI exist in a
- 00:12:43manual workflow that you're already
- 00:12:45running somewhere today uh people were
- 00:12:47already building API connectors here and
- 00:12:50so it was very clear like what was hard
- 00:12:52you had a clear set of input output
- 00:12:54pairs to care about you had clear
- 00:12:56historical data you understood your
- 00:12:57domain and could measure the value of
- 00:13:00this thing right this took us quite a
- 00:13:02while to build um if you're going to
- 00:13:03spend all this time building something
- 00:13:05you got to kind of know that there's a
- 00:13:06there there that it's is like going to
- 00:13:07save a lot of people a lot of real time
- 00:13:09and not just be some speculative um
- 00:13:11thing so that would be like the number
- 00:13:13one thing I would focus on is this a
- 00:13:16real existing manual workflow that looks
- 00:13:19like the llm sort of capability set can
- 00:13:23be applied here well and is it valuable
- 00:13:25enough like if we can actually get there
- 00:13:27does this save us a lot of time does it
- 00:13:29it you know what's what's the financial
- 00:13:31impact to us on this does it save us
- 00:13:33hours does it you know generate new
- 00:13:35Revenue what what kind of sort of uh uh
- 00:13:37impact does it have when I think about
- 00:13:40like what are the core capabilities of
- 00:13:43these llms I basically think about it
- 00:13:47as computers can now read write
- 00:13:53make junior employee level decisions and
- 00:13:57they're sort of domain experts about
- 00:13:58everything and like that's the set of
- 00:14:00things that I would look at in these
- 00:14:01manual workflows rather than like oh
- 00:14:03maybe we can apply AI here and it can
- 00:14:04know everything about everything is this
- 00:14:06very specific oh you know we're spending
- 00:14:08a lot of time reading through API docs
- 00:14:09and saying like what did it say um and
- 00:14:12and that's a pretty llm capable
- 00:14:15task did you have anything you want to
- 00:14:17add there because otherwise I'm going to
- 00:14:18take it to this experience directly
- 00:14:21there's the whole you Scope our project
- 00:14:23you decide you want you're going to do
- 00:14:24it I'd love to know what went wrong in
- 00:14:27this situation
- 00:14:29oh so much
- 00:14:32uh the first thing that jumped to mind
- 00:14:34here is that um I think we failed to
- 00:14:36appreciate upfront just how hard some of
- 00:14:39the pure software engineering parts of
- 00:14:42the crawling of API docs would be I
- 00:14:44think we initially thought about this as
- 00:14:46like step one download the docs step two
- 00:14:49get llm to make a bunch of decisions um
- 00:14:53and does that resonate with other people
- 00:14:55you know one two and you're done all
- 00:14:57right we got some hands over there ni
- 00:15:00um and and fundamentally that is still
- 00:15:01What's Happening Here Right like we're
- 00:15:03trying to build a connector into an API
- 00:15:05the kind of steps involved are go to the
- 00:15:08web page that describes how to connect
- 00:15:09this a to this API read through the docs
- 00:15:11and then make a bunch of decisions okay
- 00:15:13here's how we authenticate provide our
- 00:15:15credentials to log into this API here's
- 00:15:17what the set of endpoints looks like uh
- 00:15:19turns out these documentation pages are
- 00:15:21like everything you can possibly imagine
- 00:15:23times like 10 and you have to support a
- 00:15:25very wide variety of use cases you have
- 00:15:27to handle you know rate limiting and
- 00:15:29some docs are behind authentication and
- 00:15:31some docs are like uh the information is
- 00:15:34not even on the web page it's like you
- 00:15:35know that you've got to click on things
- 00:15:36and it's going to go fetch it from the
- 00:15:37server and handling this super wide
- 00:15:39variety of use cases or preventing
- 00:15:41yourself from going and crawling out to
- 00:15:42irrelevant Pages was incredibly hard and
- 00:15:45even now when we look at failure cases
- 00:15:48more often than not they're not uh an
- 00:15:51the AI making a poort decision based on
- 00:15:53good data it's the AI making something
- 00:15:55up based on no data because we failed to
- 00:15:57actually find the right the right sort
- 00:15:59of source material out of the
- 00:16:01web you just seen me make a demo that
- 00:16:04took what like a minute right to process
- 00:16:08and in this minute it tries to figure
- 00:16:10out the relevant docs and figure out the
- 00:16:13base URL then the stream URL
- 00:16:15authentication scheme
- 00:16:16parameters when we started there was the
- 00:16:20happy path prototype connector like woo
- 00:16:22this works really fast that's great but
- 00:16:25then in some cases it took like four and
- 00:16:30a half something minutes in crawling
- 00:16:32docks in headless Chrome and sometimes
- 00:16:35it would get into Loops so you would
- 00:16:38think like in 2024 crawling pages from
- 00:16:41the web should be solved problem and
- 00:16:42there's a bunch of products that say
- 00:16:44they do it right fir crawl is the one we
- 00:16:47use
- 00:16:48now but can you just out of the box
- 00:16:51Point them and expect them to work like
- 00:16:54nope if you go read like you know a rag
- 00:16:58rag tutorial right now it's going to
- 00:17:00tell you uh you know go download your
- 00:17:03information get get craw the docs
- 00:17:05download the docs uh strip out some HTML
- 00:17:09chunk it up into pieces put it into a
- 00:17:12vector store and then query your vector
- 00:17:13store um and actually we did kind of
- 00:17:16start there the final implementation we
- 00:17:18ended up with looks something more like
- 00:17:20we don't pre-ra anything we wait until
- 00:17:22we have a specific task we're trying to
- 00:17:23do like how do you like what is the
- 00:17:25authentication mechanism does this API
- 00:17:27use you know http basic off to for the
- 00:17:29username password does it use an API key
- 00:17:32what is the method and then we purpose
- 00:17:34go crawl for that we start at the
- 00:17:36homepage of the docs and we ask an llm
- 00:17:38to help us navigate toward you know
- 00:17:40where we'd want to want to go we have so
- 00:17:42many fallback mechanisms in here we have
- 00:17:44multiple different Services we use for
- 00:17:45this crawling because there can be rate
- 00:17:47limiting issues they can be flaky um
- 00:17:49there's there's all sorts of issues
- 00:17:51around that we fall back on doing a
- 00:17:52Google search if we can't find the
- 00:17:54information we're looking for we use
- 00:17:55perplexity at some points in the flow uh
- 00:17:58we have a repos repository under the
- 00:17:59hood of a bunch of pre-built opening API
- 00:18:01specs from common repositories like it
- 00:18:04is very complicated under the hood
- 00:18:06there's a lot a lot going on that
- 00:18:08doesn't look like you know you're uh
- 00:18:10here's how you ask a question of your
- 00:18:11documents or rag
- 00:18:14tutorial and I kind of want to like
- 00:18:16before we go towards like the next
- 00:18:17question there I want to just get a
- 00:18:19pulse for the room probably should have
- 00:18:20started with this but I think it's
- 00:18:21helpful as we're diving deeper into some
- 00:18:23of these Concepts just to make sure
- 00:18:24we're all kind of on that same
- 00:18:25wavelength would you raise your hand if
- 00:18:27you identify a builder in AI right now
- 00:18:30you're building some kind of company or
- 00:18:32product in the space all right great how
- 00:18:34many of you are not necessarily building
- 00:18:36but pretty well versed in the topic
- 00:18:38you're doing a lot of independent
- 00:18:40research and rais all right those two
- 00:18:44together I think we have a large
- 00:18:44majority for everyone else you're
- 00:18:45probably where I'm at in my like Journey
- 00:18:48so you can go ahead and be Googling
- 00:18:50things on the side just like I'm going
- 00:18:51to be doing over here um yeah yeah call
- 00:18:53me out for if I'm getting too technical
- 00:18:55no no no it's good we want we want to go
- 00:18:57de deeper and this being live stream
- 00:18:59record so you can always come back later
- 00:19:01if you have more questions I want to
- 00:19:03talk about that piece then like thinking
- 00:19:04about all these components that go into
- 00:19:07building an AI you think about
- 00:19:08observability you think about the rag
- 00:19:10like could you talk through what are
- 00:19:14core components for you of a successful
- 00:19:17AI project maybe evaluations or or
- 00:19:19things of that nature where do you want
- 00:19:20to take
- 00:19:22this so I think the one of the earliest
- 00:19:25steps in any project that's going to
- 00:19:27reach this level of success ESS um if
- 00:19:30it's going to have any sort of
- 00:19:31meaningful complexity to it is going to
- 00:19:33have to be building evales and what I
- 00:19:36mean by EV vals is basically an
- 00:19:38automated test suite for your
- 00:19:40application but one where you're running
- 00:19:42over lots of examples that you want your
- 00:19:45system to be good at and you're testing
- 00:19:46how well it it does at these things so
- 00:19:48you define some metrics up front to
- 00:19:50measure how well am I doing um and so
- 00:19:52like as a concrete example here we're
- 00:19:53trying to build API Integrations our
- 00:19:55first step was let's go gather a bunch
- 00:19:58of existing API Integrations we built
- 00:20:01let's build a a sort of test harness
- 00:20:03that can generate output from our system
- 00:20:05test it against how well does it match
- 00:20:08up with the things that actually people
- 00:20:09built in the past and we produced a
- 00:20:10whole bunch of metrics around these it's
- 00:20:13it's actually non-trivial to get this
- 00:20:15right um uh you know even though we had
- 00:20:17a really rich set of ground truth to
- 00:20:19look at here you know we had hundreds of
- 00:20:20connectors to people that built the
- 00:20:22comparisons are not very straightforward
- 00:20:24like sometimes our system comes up with
- 00:20:25different names than than people came up
- 00:20:27with or the the community connectors
- 00:20:29might have only a subset of of things
- 00:20:32defined in them they could have defined
- 00:20:34and that's that's okay for their use
- 00:20:35case um so detecting sort of the
- 00:20:37difference between we didn't generate
- 00:20:40something and we should have versus we
- 00:20:41didn't generate something and that's
- 00:20:42fine um is is not uh it's not trivial
- 00:20:46but you got to start somewhere and if
- 00:20:48you don't do this your starting point is
- 00:20:50gonna be very s Vibes based you're gonna
- 00:20:52like run your first best idea some
- 00:20:56sometimes it's going to work which is
- 00:20:57going to be really encouraging and cool
- 00:20:58sometimes it's not and you're not like
- 00:21:00going to kind of have some intuition
- 00:21:01about maybe here's how I improve it but
- 00:21:02it's going to be based on whatever sort
- 00:21:03sitting in front of you this is what
- 00:21:05they ended up looking like at some point
- 00:21:06maybe there's like can you go up a slide
- 00:21:08so this is how it looked at the
- 00:21:09beginning when we started we were just
- 00:21:11like so if you can't see the rows here
- 00:21:15are just example connectors that that
- 00:21:18existed already um and we just picked
- 00:21:21three uh knowing that we wanted to be
- 00:21:23better than just doing these three but
- 00:21:25we started somewhere and then each of
- 00:21:26these columns is some some way that we
- 00:21:28measure ourselves against the ground
- 00:21:29truth so if we ask our system to produce
- 00:21:31a Sentry connector there's already a
- 00:21:33Sentry connector out there how well do
- 00:21:35we do at all these things and uh and
- 00:21:37produce these these metrics and we try
- 00:21:39and kind of like produce a score that is
- 00:21:41roughly weighted by how valuable is it
- 00:21:43to a user if we screw this up or get it
- 00:21:46right uh and and then you start now you
- 00:21:49can actually sort of measure how well
- 00:21:50you're doing this is a super powerful
- 00:21:53tool there's sort of a Dark Art to like
- 00:21:56you know perfect versus good on this but
- 00:21:59um if you get this into a good place it
- 00:22:02guides development in a very real way
- 00:22:03like first of all you can tell in an
- 00:22:04unbiased way like how are we doing
- 00:22:06overall you can track your progress you
- 00:22:08can track regressions and if you sort of
- 00:22:11if you're doing some prompt engineering
- 00:22:12and you're like tweaking the language
- 00:22:13all the time to get better at some
- 00:22:14specific failure mode you're seeing what
- 00:22:17how do you know if you tweak your prompt
- 00:22:18it's like not going to make you worse
- 00:22:19the thing you tried to get better at
- 00:22:20yesterday so this will help you track
- 00:22:22regressions it also
- 00:22:24drives uh the sort of anecdotal evidence
- 00:22:28you want to see for where to invest your
- 00:22:30attention next if you go you know
- 00:22:32like you know we're doing pretty well
- 00:22:34actually at this stage um but like
- 00:22:37there's still some zeros in here um so
- 00:22:40like my intuition from seeing this is
- 00:22:43like okay we're like doing okay at
- 00:22:44whatever this thing is for zenitz and
- 00:22:46we're like doing not that good for this
- 00:22:48schema thing for zenitz like wonder what
- 00:22:50that is and i' click into what it's It
- 00:22:52Go actually look at what we generated
- 00:22:54and say ah okay like this the L&M got
- 00:22:57this wrong because we're feeding it the
- 00:22:58wrong information this is a crawling
- 00:22:59problem not prompting problem and we' go
- 00:23:02update our crawler and so sort of tells
- 00:23:04you what to work on next and then over
- 00:23:06time we expanded to that that next slide
- 00:23:08that you were on a second ago which
- 00:23:09is the evals just got bigger and bigger
- 00:23:12and bigger we just kept getting more use
- 00:23:13cases in there trying to get a wider and
- 00:23:15wider set of examples to look at um and
- 00:23:18it's what drove you know you showed the
- 00:23:19sort of workflow diagram in your slides
- 00:23:21earlier that was like kind of the
- 00:23:22spaghetti look of all the different
- 00:23:24steps that go into just one of the
- 00:23:25questions here that evolved out of this
- 00:23:28exploration trying to get better and
- 00:23:30better by adding more sort of uh catches
- 00:23:33for things that could go
- 00:23:36wrong did you want to add anything
- 00:23:39there can hope to add some context at
- 00:23:44the high level this is the diagram for
- 00:23:46the whole thing in the
- 00:23:48beginning and so the idea was okay we're
- 00:23:51going to crawl all of the documents now
- 00:23:53we're going to index everything shove it
- 00:23:55into a vector store and then there's
- 00:23:57going to be like three four different
- 00:23:58components one's going to figure out the
- 00:24:00AL the other is going to figure out the
- 00:24:02pagination um right and then the the the
- 00:24:05different ones going to figure out the
- 00:24:07list of streams basically stream is an
- 00:24:08API endpoint like oh you know
- 00:24:10repositories and GitHub is a stream
- 00:24:12issues and GitHub is a stream if you
- 00:24:14look at this one right here deaf is
- 00:24:17what's we call a record selectorate air
- 00:24:19bite is basically where exactly in the
- 00:24:22response Json is the useful information
- 00:24:26and the schema means okay what are The
- 00:24:29Columns of data what are the fields of
- 00:24:31the useful objects that we
- 00:24:33want and as we grew into this even the
- 00:24:37number of things that we've paid
- 00:24:39attention to increased and each
- 00:24:42particular component became this huge
- 00:24:44spaghetti because it turns out that like
- 00:24:47originally we thought you know what each
- 00:24:49component is going to be a subset of
- 00:24:51index docs the tagged and a prompt and
- 00:24:55hopefully a single prompt is going to
- 00:24:58just make it fine like we crawled
- 00:25:00everything already right and turns out
- 00:25:02in reality like every component that we
- 00:25:04need answer to like every field where
- 00:25:06you can get an AI assist prompt is
- 00:25:08basically a program in
- 00:25:11itself I want Tove the spaghetti piece
- 00:25:14also we're going to change this up
- 00:25:15because originally I was just going to
- 00:25:17like have a point where it's purely
- 00:25:18audio audience Q&A if you're having
- 00:25:21questions about things as we come up
- 00:25:23raise your hand and I will kind of bring
- 00:25:25you into the conversation rather than
- 00:25:27just wait for the end um but I'm curious
- 00:25:30about how the spaghetti evolves over
- 00:25:32here what surprised you the most about
- 00:25:35the way your evaluation criteria early
- 00:25:38on differ when you think about the end
- 00:25:46state so I I'm surprised by the number
- 00:25:49of random fallbacks and stuff in the
- 00:25:51system like that we're still Google
- 00:25:53searching in perplexity you know
- 00:25:54searching under the hood to get to some
- 00:25:56of the answers we want um
- 00:26:01uh I think a very useful but difficult
- 00:26:05thing on this project was thinking about
- 00:26:07how to progress along this path to how
- 00:26:10do we arrive at the right spaghetti um
- 00:26:13uh because if you were to just guess it
- 00:26:14up front you wouldn't guess right like
- 00:26:16you have to kind of evolve your way
- 00:26:18toward it and then that's intention with
- 00:26:21like how do we know we're going to get
- 00:26:22there like how do we
- 00:26:24know how do we know this is even
- 00:26:26possible um let alone that going to get
- 00:26:28there in like a reasonable amount of
- 00:26:29time and I think that question is very
- 00:26:33challenging for AI projects right like
- 00:26:34there's there's some stat that like 70%
- 00:26:36of of poc's never make it to production
- 00:26:39with with AI projects and I think it's
- 00:26:42very challenging to know what a good POC
- 00:26:44looks like and how to get from there to
- 00:26:45production um and and so if you like
- 00:26:48take take the AI assist project as just
- 00:26:50like an example of a broader
- 00:26:52theme um I mean you mentioned you guys
- 00:26:55tried it a few times before right and
- 00:26:56you weren't exactly sure what do we make
- 00:26:58of this like I think this says this is
- 00:27:00possible but I don't know how we get
- 00:27:01there and like
- 00:27:03the if you were just gonna try tomorrow
- 00:27:06to say like is it possible to build
- 00:27:08these API Integrations with llms like
- 00:27:10the first thing You' try is you just
- 00:27:12like go ask chat GPT to do it you'd show
- 00:27:13chat GPT an example of these connectors
- 00:27:16are just a file under the hood you
- 00:27:17showed chat GPT an example of the file
- 00:27:19and you said you know build me one like
- 00:27:21this but for this
- 00:27:22API and then something will come out
- 00:27:25like probably something pretty good uh
- 00:27:28because the files are sort of
- 00:27:29inscrutable and if you don't know what
- 00:27:30you're looking for it's going to look
- 00:27:31right even if it's like technically
- 00:27:32doesn't run later um and then you're
- 00:27:34kind of stuck you don't really
- 00:27:36know what did this really tell me about
- 00:27:38is it possible you you can't really
- 00:27:40iterate on it like how do you make chat
- 00:27:42PT better at this now um how do you know
- 00:27:45what array of stuff it's good at versus
- 00:27:46bad at and it's not going to get you to
- 00:27:48this like eventual kind of spaghetti
- 00:27:50looking
- 00:27:50diagram um so instead the approach we we
- 00:27:54tend to take is we try and build pcc's
- 00:27:57that are 100% on the critical path to
- 00:27:58production um we try and be thoughtful
- 00:28:01about which pieces we build early but
- 00:28:03early in the project we didn't start by
- 00:28:05saying let's just show like a really
- 00:28:06shiny marketing demo that shows complete
- 00:28:08end to end it working perfectly for one
- 00:28:11connector we said let's pick three
- 00:28:13connectors as examples and it's going to
- 00:28:15start out kind of crappy and then we're
- 00:28:16going to try and make it better over
- 00:28:17time um and that that diagram you showed
- 00:28:20a second ago that's like the the this
- 00:28:23one yes this one this was our sketch
- 00:28:25like a few weeks into the project of
- 00:28:27what we we imagined the eventual
- 00:28:29spaghetti might look like and it ended
- 00:28:30up changing over time and what we tried
- 00:28:32to do was tackle these pieces in order
- 00:28:35um to try and drisk the riskiest parts
- 00:28:37of the project we're like all right
- 00:28:39let's try and work on the box that's
- 00:28:40about authentication right now and see
- 00:28:42like what's it look what's it look like
- 00:28:44start to feel it out get rid of unknown
- 00:28:45unknowns get that to a place where we're
- 00:28:47like I believe that with iteration this
- 00:28:48part is possible then tackle the next
- 00:28:50piece and tackle the next piece and
- 00:28:52start to flesh this out I think actually
- 00:28:53the screenshot like the gray boxes were
- 00:28:55like things we didn't try yet or
- 00:28:57something um or de prioritize for p so
- 00:29:00like you know we hadn't we didn't
- 00:29:02actually tackle all of these but we
- 00:29:03tried to tackle as many as we could to
- 00:29:04start to drisk it and then that process
- 00:29:08drove us to a more robust eval driven
- 00:29:11now it feels like iteration doesn't feel
- 00:29:13like we're building a V1 of something it
- 00:29:14feels like we're kind of like you know
- 00:29:15iterating iterating iterating and that
- 00:29:17drives the ideas for where to add the
- 00:29:19sort of branching Paths of that that
- 00:29:21workflow
- 00:29:23diagram I do like the idea that we
- 00:29:24should only be talking about EV valves
- 00:29:26in the context of spaghetti going
- 00:29:29so let's keep Let's uh maybe keep that
- 00:29:31one up all night um thinking about the
- 00:29:38yeah yeah in terms of the eval how are
- 00:29:41you do are you just compar
- 00:30:01yeah that's that's a great question um
- 00:30:02yeah so the question was like what are
- 00:30:04we measuring how are we doing these
- 00:30:05evals um in this case are we just
- 00:30:07comparing ourselves to an existing
- 00:30:09connector that we know is good or uh he
- 00:30:11said he's heard of some examples of
- 00:30:13using an llm to evaluate how the other
- 00:30:15llm did um it's a great question uh
- 00:30:21so what we see across successful
- 00:30:24projects varies a lot um part of what
- 00:30:27makes the actually difficult is that
- 00:30:29they rarely fit this like clean academic
- 00:30:32standard for what you would want to see
- 00:30:34um clean input output pairs great ground
- 00:30:37truth you know how to compare these
- 00:30:38things and how to measure them sometimes
- 00:30:39the thing we're measuring ourselves
- 00:30:40against is we like ship an example
- 00:30:43output to some team somewhere and we're
- 00:30:44like you're the experts on this domain
- 00:30:45did we do a good job or not they ship it
- 00:30:47back and like trying to evaluate based
- 00:30:48on that and so the the mess wrangling
- 00:30:51the mess is hard um we have seen
- 00:30:54successful examples of using it that
- 00:30:57that technique is called llm as judge
- 00:30:59where you you have an llm evaluate how
- 00:31:01you're doing it's good for like very
- 00:31:03subjective things if you're generating
- 00:31:04free form text and you're like does this
- 00:31:06seem like it answered my question that's
- 00:31:08like a task for an llm in this case we
- 00:31:10were able to circumvent that I think in
- 00:31:13every case uh we do some like
- 00:31:15deterministic fuzzy stuff where we're
- 00:31:17like does this name almost match that
- 00:31:19name if so we're good um uh and so there
- 00:31:22is some like deep Logic for like trying
- 00:31:26to score ourselves uh in a way way
- 00:31:28that's not not as straightforward is
- 00:31:29just like does this thing equal that
- 00:31:31thing um um but we've seen sort of
- 00:31:34everything and at some point you do need
- 00:31:36to sort of stop like looking for the
- 00:31:38perfect thing and find something
- 00:31:39directionally useful um we've had
- 00:31:41projects where like you have a workflow
- 00:31:43diagram this pop this this complicated
- 00:31:45and the only thing we're able to measure
- 00:31:46is like what's going on down here um
- 00:31:48because it's like the only place where
- 00:31:49you can design clean EV EV vals and then
- 00:31:52you just sort of put up with that and
- 00:31:54and do the best you can
- 00:32:17AG
- 00:32:40so it's so what does the output look
- 00:32:43like is actually very critical to what I
- 00:32:45think made this possible here um so
- 00:32:47we've actually built uh sort of uh AI
- 00:32:51powered integration Builders multiple
- 00:32:53times um this is this is one of them for
- 00:32:55airite I think one amazing asset that
- 00:32:58airb has here is they have this format
- 00:33:01that they call their their well I don't
- 00:33:03know what you call it your low low code
- 00:33:04cdk format your your this this spec for
- 00:33:07how to define an API integration as
- 00:33:09configuration instead of as code big
- 00:33:12file that describ and in fact in our
- 00:33:14pipeline we never have an llm write this
- 00:33:18thing as output we write this as output
- 00:33:20deterministically using code and we use
- 00:33:22the llm to answer specific questions we
- 00:33:24have about this process so we ask it
- 00:33:27picking off authentication method for me
- 00:33:28and then we use that to
- 00:33:29deterministically generate the
- 00:33:30authentication part of this that's part
- 00:33:32of what makes this an approachable
- 00:33:34problem we've built this before uh where
- 00:33:37the end goal is to write code performs
- 00:33:40way worse um and even in that process we
- 00:33:43have uh under the hood we have an
- 00:33:46intermediate format that is not I mean
- 00:33:49it's like conceptually similar to this
- 00:33:52that we're using to sort of constrain
- 00:33:53the problem so much of the trick with
- 00:33:55these LMS is constraining the domain in
- 00:33:56which they're thinking right if if you
- 00:33:58say write me some code you're going to
- 00:34:00get something code shaped as output
- 00:34:01whether it's good nobody knows um if you
- 00:34:04ask it for a very specific constrainted
- 00:34:06answer where it's only allowed to answer
- 00:34:08within a very specific Universe it's
- 00:34:09much more tunable it's going to perform
- 00:34:10a lot better just kind of made that
- 00:34:12possible yeah I mean
- 00:34:25I'm I can take that
- 00:34:29so to clarify the last two questions
- 00:34:33it's I think it's both relevant to evals
- 00:34:35ands to outputs uh the way we eval is we
- 00:34:39compare what the model gives us with
- 00:34:41what we have in connectors we know is
- 00:34:43good it's not always one to one because
- 00:34:46for example if you have a stream that's
- 00:34:48called capital T transactions is it
- 00:34:51still the same or like is it if if the
- 00:34:54wording is slightly different but the
- 00:34:55scheme is very similar if the schemas
- 00:34:58are compatible but the columns are not
- 00:34:59the same is it is it a match is it not
- 00:35:01match like that that kind of stuff the
- 00:35:03output is uh are pieces of the Manifest
- 00:35:07and the AI Builder thing like we have a
- 00:35:10python library that enforces the format
- 00:35:14of the Manifest essentially think
- 00:35:16kubernetes resource definitions right
- 00:35:18there are fields that are required they
- 00:35:20can be only of certain format so Builder
- 00:35:23before outputting that as a suggestion
- 00:35:26validates that it's
- 00:35:28legit and then one use case is sure
- 00:35:32right just a co-pilot thing in Builder
- 00:35:35itself um what we see is the match
- 00:35:38success rate like we see successful good
- 00:35:41suggestions very very often like it's
- 00:35:43probably north of 90% on each particular
- 00:35:46field today but the thing is there's a
- 00:35:48bunch of fields and those probabilities
- 00:35:50multiply so the probability that you get
- 00:35:53full connector end to end correctly is
- 00:35:57you slightly lower but we're getting
- 00:35:59there this use case is okay let's get a
- 00:36:02lot of connectors let's make new
- 00:36:03connectors Let's help people make
- 00:36:06connectors for themselves and then share
- 00:36:07them with our community but also I have
- 00:36:11450 connectors and like more than 250 of
- 00:36:14them are in that format so the whole
- 00:36:16connector is just a big manifest file
- 00:36:18and what I can do is I already have a CI
- 00:36:20pipeline that runs every week and you
- 00:36:23see there's this thing called version
- 00:36:24right like this is the version of the
- 00:36:26framework that it's using
- 00:36:28and my CI pipeline checks hey do I have
- 00:36:30a newer version of the framework and if
- 00:36:32I do I'm going to update all of my
- 00:36:35manifest as long as it's not breaking
- 00:36:37another thing we could do basically on
- 00:36:39CI uh or regularly is uh create another
- 00:36:42endpoint in our AI assist thing and have
- 00:36:46another flow where we say hey here's the
- 00:36:49name of the connector here's the API
- 00:36:51docs here's the existing manifest do you
- 00:36:54think there may be some new streams that
- 00:36:56we don't have
- 00:36:59and like these or you know like maybe
- 00:37:01there's a new authentication method
- 00:37:03maybe there are some deprecations that
- 00:37:04we want to clean up today the way this
- 00:37:07works is connector fails for someone the
- 00:37:09stream doesn't work anymore somebody
- 00:37:11files in a GitHub issue they say well
- 00:37:13we're open source you're very welcome to
- 00:37:14contribute they contribute we run
- 00:37:16regression tests verify it's not broken
- 00:37:18then we merge when we had just the
- 00:37:20python framework it took months now it
- 00:37:23takes days but if I can automate this
- 00:37:27cool so thank you for the
- 00:37:34suggestion should I okay I'll do it
- 00:37:37you're oh thanks um all right I kind of
- 00:37:41want to like pull on a Thro a little bit
- 00:37:43more that Samantha brought up which is
- 00:37:44like you can Envision a future of like
- 00:37:47an an agent or something doing this like
- 00:37:50since the GPT era started it seems like
- 00:37:54there's always something new it's
- 00:37:55exciting that people are talking about
- 00:37:56you know it was rag agents um graph rag
- 00:38:01there's countless things in a year from
- 00:38:03now do you feel any of these will
- 00:38:05continue to be just as pertinent a part
- 00:38:07of the conversation or do you think
- 00:38:09something new will be the dominant point
- 00:38:11of
- 00:38:16discussion and if you do think something
- 00:38:18new what is that
- 00:38:21thing I'm less of an AI futurist and
- 00:38:25more of an AI today practitioner uh but
- 00:38:29um you know when people talk about
- 00:38:32agents for example I think there's like
- 00:38:33multiple things they might mean um think
- 00:38:37one thing they might mean is like build
- 00:38:41a thing that's got a lot of autonomy
- 00:38:42around what it can do you give give
- 00:38:44something a bunch of tools and you let
- 00:38:46it sort of decide it's less of this
- 00:38:47deterministic we do this then we do this
- 00:38:49then we do this and you sort of give it
- 00:38:50access to whatever it wants
- 00:38:54um I've yet to see anything like that
- 00:38:56come to fruition in practice for a
- 00:38:58significant system that could see that
- 00:39:00changing over time um but right now it
- 00:39:02seems very um theoretical to me and like
- 00:39:06may may happen if it gets driven by you
- 00:39:08know big big boost to what Foundation
- 00:39:11models are capable of
- 00:39:13um but I think the more interesting
- 00:39:15today thing for for agents what people
- 00:39:18tend to mean is like less around
- 00:39:20autonomy more around specialization like
- 00:39:22how do you break your problem down into
- 00:39:24specific components that are in charge
- 00:39:26of a very small subdomain and are
- 00:39:28experts in that subdomain that I think
- 00:39:30is going to get even more common I think
- 00:39:31people are
- 00:39:33realizing a the complexity of these
- 00:39:35projects in practice you know what looks
- 00:39:37at a high level like hey chat GPT give
- 00:39:39me give me a connector it looks more
- 00:39:40like this under the hood and also that
- 00:39:44so much of uh the sort of mystery of
- 00:39:46what it's like to build with LMS is
- 00:39:48actually just software engineering under
- 00:39:49the hood um I think that is going to
- 00:39:51drive more adoption of these sort of
- 00:39:54that type of agent system um and I we're
- 00:39:58seeing more and more of it we're talking
- 00:39:59about a very sort of tech tech forward
- 00:40:00company Tech forward use case but we
- 00:40:02also see like you know 100y old big
- 00:40:05equipment manufacturers talking about
- 00:40:06these workflows in a very realistic way
- 00:40:09that I think is is going to be in
- 00:40:10production within the next year at at a
- 00:40:12company like that um that you might call
- 00:40:14an agentic workflow um so I see that
- 00:40:16that part of it being very real over the
- 00:40:17next
- 00:40:25year Tak only
- 00:40:29jumped to deep in building this thing my
- 00:40:32Horizon of thinking about AI things a
- 00:40:34year from now is very very
- 00:40:36short my personal biggest thing is like
- 00:40:40we we have manifest connect also have
- 00:40:43python connectors and Java connectors
- 00:40:46and we also have bug bugs in those so my
- 00:40:48biggest dreams are around just those
- 00:40:51software programming agents which can be
- 00:40:53as simple as a little bash script that
- 00:40:55says hey here's a GitHub un call issue
- 00:40:59here's the bug report here are the logs
- 00:41:01here's the directory with all of the
- 00:41:03source files and here's the script that
- 00:41:05builds and tests the connector here's
- 00:41:07the bug
- 00:41:08output can you fix it and then the
- 00:41:11script applies the changes proposed by
- 00:41:13the model runs the tests and if they
- 00:41:15fail it says yeah that didn't work try
- 00:41:18again in a while loop just until it
- 00:41:21wraps up this is my next hobby project I
- 00:41:24think after this thing is successful
- 00:41:27what that means for other Industries and
- 00:41:31for programmers and businesses that
- 00:41:33build with AI EDD is the boss
- 00:41:44there I don't know that we have a final
- 00:41:46one fully
- 00:41:51together I I don't think there is a full
- 00:41:54final one and
- 00:42:03very little framework code under the
- 00:42:05hood there's there's some but it's it's
- 00:42:06not
- 00:42:10substantial it's kind of not
- 00:42:12representative of the final no that's
- 00:42:14okay maybe this is closing the biggest
- 00:42:16place where I think the the diagram
- 00:42:18diverged is like around the The Crawling
- 00:42:21of the docks like we don't do an upfront
- 00:42:23crawling Step at all um and so it's it
- 00:42:26stops looking like
- 00:42:27I guess the other big change is like at
- 00:42:29that point we
- 00:42:31envisioned URL to API docs as input
- 00:42:35connector as output one shot build the
- 00:42:37whole thing all at
- 00:42:38once and uh where it ended up going was
- 00:42:42that is what the initial experience is
- 00:42:44like in the UI but there's lots of
- 00:42:45little buttons you can push to fill in
- 00:42:46fields here and there and so the flow is
- 00:42:48much more decomposed into a set of a
- 00:42:50bunch of different endpoints and smaller
- 00:42:52workflows that leverage some shared
- 00:42:54shared stuff under the hood and so it's
- 00:42:56not exactly left to right end to end
- 00:42:58thing it's like 12 end to end things
- 00:43:00that have some shared
- 00:43:15stuff yes so so there's actually two
- 00:43:17inputs uh you can give us an open API
- 00:43:20spec uh as as input for those that don't
- 00:43:23know an open API spec is like a it's a
- 00:43:25common standard format you can use to
- 00:43:27describe uh an API um it's optional but
- 00:43:30if you give it to us we'll we'll use it
- 00:43:32um we also have our own curated kind of
- 00:43:36repo like common common apis that that
- 00:43:39are out there in their specs that we
- 00:43:41sometimes use um other supplemental
- 00:43:44information is is it's all stuff living
- 00:43:48on the web it's like Google searching um
- 00:43:52uh
- 00:43:54crawling anything El yeah I think that's
- 00:43:56all the supplemental stuff
- 00:44:04I wonder if you for some stages that are
- 00:44:06disconnected andon to an artifact have
- 00:44:10you tried to combine
- 00:44:21them I ended
- 00:44:24up but then
- 00:44:34I
- 00:44:37you but but sometimes it makes sense and
- 00:44:40the following question if you have
- 00:44:47Doney so I think the first part was
- 00:44:50around like instead of treating these
- 00:44:52these different alternative steps for
- 00:44:53finding information as as fallbacks to
- 00:44:55one another can you sort of do them in
- 00:44:56parallel and then try and try and
- 00:44:58combine the information is that is that
- 00:45:09right that are somewhere in here in a
- 00:45:12sequence so have you tried to reconcile
- 00:45:14them in a single step you know with
- 00:45:18let's say let's call it a gentic
- 00:45:19application or a gentic step in which
- 00:45:23you do both tasks you can
- 00:45:28right so the two tasks here are like
- 00:45:31um they're
- 00:45:33basically go out and find the relevant
- 00:45:36information to a question like
- 00:45:38authentication and
- 00:45:39then I cannot
- 00:45:42read imagine that there are two simple
- 00:45:44Tas that you have separated by an
- 00:45:47artifact you generate
- 00:45:52one you instead
- 00:45:57yeah I think it actually often starts
- 00:46:00the opposite way it's like we start with
- 00:46:01a larger problem we're like build this
- 00:46:03whole thing and we're like this needs to
- 00:46:04be broken down and sub
- 00:46:12components possible that's happened
- 00:46:14somewhere in the details I'm like less
- 00:46:15less familiar no use case there is
- 00:46:18jumping to mind but like I think the
- 00:46:19tactic makes sense to me
- 00:46:22um
- 00:46:24uh yeah in practice like one area we've
- 00:46:27had to break things down is like sort of
- 00:46:29deeply nested questions um where like we
- 00:46:33may be asking the the llm like which of
- 00:46:36these authentication methods is used and
- 00:46:37like if it's this one I need this
- 00:46:38information if it's that one I need that
- 00:46:40information it's sort of asking these
- 00:46:41deeply nested questions it like sort of
- 00:46:42falls off and gets lazy and stops
- 00:46:44following the instructions so we've had
- 00:46:45to sort of chop it up into the sub
- 00:46:47pieces so this a little bit like the
- 00:46:48opposite of the flow you're describing
- 00:46:50but like I could see if we if we'
- 00:46:52started out with the sort of multi-step
- 00:46:54version being like I wonder if we can do
- 00:46:55this all at once which does save you on
- 00:46:58latency and
- 00:47:03cost more easier you have that you
- 00:47:11try yeah at least try to recile
- 00:47:16something for example when I started
- 00:47:19doing group
- 00:47:22ofel so I start basically from functions
- 00:47:25and I automate function with a agent
- 00:47:29step AG step and then I I I link
- 00:47:33together but then I say okay this two
- 00:47:36maybe can recile single yeah you instead
- 00:47:39of
- 00:47:41having step agent to maintain have it
- 00:47:45doesn't have to make sense for
- 00:47:46everything and end up a single blob of
- 00:47:49agent that that performs everything it's
- 00:47:51not going to work what you were saying
- 00:47:52at the very begin yeah we've seen I
- 00:47:55think this is only tangentially Rel
- 00:47:56related to what you're asking but we
- 00:47:57have seen on another another project so
- 00:48:00it looks pretty different to this but
- 00:48:02it's fundamentally basically like it's a
- 00:48:04Content moderation projects it's for a a
- 00:48:06company called change.org where they
- 00:48:08they have like a petition uh platform
- 00:48:11where people can can post petitions
- 00:48:13about you know political things and
- 00:48:15local things and stuff like that um and
- 00:48:17they have kind of a challenging content
- 00:48:20moderation problem because it's not as
- 00:48:21simple as saying like did someone just
- 00:48:24post spam or did someone just post hate
- 00:48:25speech it's actually like a valid use of
- 00:48:27their platform to say something like
- 00:48:29somewhat inflammatory but like it can't
- 00:48:31cross the lines of of their Community
- 00:48:33guidelines and so um getting uh these
- 00:48:37agents to sort of understand the
- 00:48:38different nuances of like what does it
- 00:48:40mean to to um to violate our policies is
- 00:48:44is challenging and under the hood what
- 00:48:46we do is we have these sort of
- 00:48:47specialist agents that do look at this
- 00:48:49through different lenses they write out
- 00:48:51their sort of reasoning their Chain of
- 00:48:53Thought they give us confidence scores
- 00:48:54at the end and then we take a bunch of
- 00:48:56these different answers together at the
- 00:48:57end and we give it to one bigger process
- 00:48:59that's like all right now that you
- 00:49:00understand all the Nuance of these
- 00:49:01different angles make a final decision
- 00:49:03and it's sort of combining um these
- 00:49:05different sort of Sub sub viewpoints if
- 00:49:06that makes sense it's not exactly what
- 00:49:08you were talking about but it's a sort
- 00:49:09similar idea um on the 01 question uh he
- 00:49:13asked if we had tried o one at any point
- 00:49:15um we have um uh the biggest drawback
- 00:49:20with o one is that it's slow um so this
- 00:49:23is like just too latency sensitive of an
- 00:49:25application we already have um takes a
- 00:49:27while to build to generate a connector
- 00:49:29here there's a lot of substeps if you
- 00:49:30added 20 seconds to one of the prompts
- 00:49:32it would probably be a nonstarter
- 00:49:34especially given that the bottleneck
- 00:49:35here is less the ai's intelligence and
- 00:49:39more our ability to give the AI the
- 00:49:40right information at the right
- 00:49:42time we are getting that point where we
- 00:49:46have a lot of pizza that people still
- 00:49:49toat so I I want to start putting the
- 00:49:52bows on the present here and just
- 00:49:54confirm is there anything else that you
- 00:49:55all wanted share with the audience that
- 00:49:57we haven't had a chance to talk about
- 00:49:59and I will also give the opportunity if
- 00:50:01you have any burning final questions
- 00:50:03feel free getting those in there but I
- 00:50:05know there's slides there's a lot of
- 00:50:06things that you all might want to show
- 00:50:08anything you wanted to kind of TOS
- 00:50:13out anything else
- 00:50:17yes to cont
- 00:50:36so we're trying toise some
- 00:50:49to
- 00:50:51resp speak
- 00:51:24spaghet so I guess I'll start by saying
- 00:51:26this domain sounds very hard
- 00:51:29um the the thing that makes me say it
- 00:51:31sounds hard is that um hirings sounds
- 00:51:35hard and like uh we struggle to train
- 00:51:37humans to do it today um so getting
- 00:51:43getting if I struggle to picture how to
- 00:51:45get uh a pretty Junior uh person to
- 00:51:49figure out how to reliably produce this
- 00:51:50output then I also struggle to see how
- 00:51:52to get an LM to do it the the analogy
- 00:51:54that jumps to mind though is um
- 00:51:58this kind of problem is present for AI
- 00:52:01phone agent applications there's a lot
- 00:52:03of you know people trying to put AI
- 00:52:04agents on the phone they have to sort of
- 00:52:06be robust in the face of people can say
- 00:52:09anything um it's hard to build
- 00:52:12uh customer support bot for an airline
- 00:52:14if if you're afraid that it's you know
- 00:52:16gonna just like give someone a free
- 00:52:17ticket because you say ignore previous
- 00:52:19instructions you know um I don't get the
- 00:52:22sense that anyone's like figured this
- 00:52:23out super well um the tactic they use
- 00:52:26there is is sort of a hybrid
- 00:52:28between a um almost like a what you
- 00:52:32picture for like a phone tree where you
- 00:52:33can just you know press press one if
- 00:52:35you're a good candidate um and and and
- 00:52:38still leveraging uh you know like the
- 00:52:40the lm's ability to handle inputs as
- 00:52:43never seen before and so it tends to
- 00:52:45look like a state machine where you have
- 00:52:47different states that the agent can be
- 00:52:49in it's trying to assess it's very
- 00:52:50specific narrow things at each point in
- 00:52:52the state but that the way it decides to
- 00:52:54move from state to state is B based on
- 00:52:56llm logic you know logic described in
- 00:52:58English not a very deterministic uh sort
- 00:53:01of thing um and then I would still take
- 00:53:04the approach of build evals based on Old
- 00:53:07transcripts of calls that have gone off
- 00:53:08the rails and measure yourself against
- 00:53:09like known known bad use cases um
- 00:53:13getting to Perfection on this sounds
- 00:53:15sounds pretty challenging um also
- 00:53:17getting nlms to to state how confident
- 00:53:20they are in something is his own sort of
- 00:53:21sub problem and so like you may be able
- 00:53:23to get this eventually to a point where
- 00:53:25it can tell you when it doesn't know but
- 00:53:27tuning that is also going to be
- 00:53:28challenging because they sort of
- 00:53:29overstate their confidence
- 00:54:00yes but the interpretation and tuning is
- 00:54:02is like a real challenge like
- 00:54:04um a lot of our projects have have steps
- 00:54:07in the middle of the workflow where
- 00:54:09we're asking we're asking for an
- 00:54:12evaluation of the form of like think out
- 00:54:14loud then come up with your answer and
- 00:54:17then tell us you know how confident are
- 00:54:19you in your answer it's usually not a
- 00:54:20number it's usually like low medium high
- 00:54:22very high and then you don't just trust
- 00:54:24what that means you measure it against
- 00:54:25your EV like is this predictive of
- 00:54:27anything like um seems like very high
- 00:54:30means like maybe possibly correct and so
- 00:54:32you only filter down to maybe
- 00:54:35high so do you have
- 00:54:38any talking you have a things that are
- 00:54:43work out really well like give you one
- 00:54:45example I found out that for me if I put
- 00:54:48in uh some example inputs and some
- 00:54:51perfect outputs into the context then
- 00:54:54you know splits out result like simar
- 00:54:57like
- 00:55:01recation the examples thing does work uh
- 00:55:04um showing it examples usually gets us
- 00:55:06sort back on the rails like I'm sure
- 00:55:07you've seen all the sort of trendy
- 00:55:08little tricks you know offered a big tip
- 00:55:10like say put a bunch of exclamation
- 00:55:12points in there offer to fire it if it's
- 00:55:13not going to do a good job like those
- 00:55:14those things I think you
- 00:55:16know may give you lift uh it's going to
- 00:55:19be challenging to know if you don't if
- 00:55:20you don't measure it
- 00:55:23um I think more often in practice it's
- 00:55:26it's around
- 00:55:28um finding specific cases where you did
- 00:55:31poorly and then baking them into your
- 00:55:32prompt um uh you know trying a wide
- 00:55:35variety of things noticing that it's
- 00:55:37sort of off on this case and then
- 00:55:39describing that case to it
- 00:55:42um I don't have handy like a list of
- 00:55:44things I mean um but uh and I bet if I
- 00:55:48pulled the folks on our team everybody's
- 00:55:50got a different uh set of favorite bag
- 00:55:53of tricks um which I think is is um
- 00:55:57it's also a danger on on these AR
- 00:55:59projects is that um it's easy to fall
- 00:56:02into like you just like it's like a
- 00:56:04really good nerd snip machine right like
- 00:56:06you can be like I'm pretty sure tipping
- 00:56:07is g to be a great the great thing to
- 00:56:09try on this project and so the evals
- 00:56:11help keep you keep you on on task there
- 00:56:13um the set of tactics is out there right
- 00:56:15you can Google search for for people's
- 00:56:16long list of tactics one random thing
- 00:56:18we've had good success with is is
- 00:56:20anthropic has a this prompt generator um
- 00:56:23and you can just paste in your your
- 00:56:25current prompt and it'll rewrite it
- 00:56:27we've had surprising results where like
- 00:56:28visually it doesn't look any better
- 00:56:30we're like that's kind of what I already
- 00:56:31said in my prompt and then like the
- 00:56:33metrics just go up
- 00:56:35um but it's not one weird trick it's
- 00:56:38like try lots of things and measure your
- 00:56:41progress all right thank you everybody
- 00:56:44for coming tonight we're super excited
- 00:56:47uh that you made their time to be with
- 00:56:49us and quick round Applause for Eddie
- 00:56:51and
- 00:56:57the office is going to be open for the
- 00:56:58next 20 minutes or so so like I said
- 00:57:00lots of pizza to eat there's still
- 00:57:02drinks as well so go enjoy pester Eddie
- 00:57:05and with any further questions maybe you
- 00:57:07didn't get a chance to ask it now they
- 00:57:09are going to be around and if they
- 00:57:10weren't planning to now they are um but
- 00:57:13again thank you for being here hope you
- 00:57:14had a great time let's keep partying
- Airite
- IA Assist
- connecteurs API
- évaluations (evals)
- développement IA
- plateforme de données
- intégration IA
- outil co-pilot
- automatisation
- flux de travail agents