Autonomous AI SRE: The Future of Site Reliability Engineering

00:55:58
https://www.youtube.com/watch?v=24Ow2HnnoRc

Résumé

TLDRIn this podcast episode, host Demetrios interviews William, CTO of Cleric AI, discussing the role of AI Site Reliability Engineers (SREs) and the challenges they face in dynamic production environments. They delve into the use of knowledge graphs to effectively diagnose root causes of system issues. William shares insights on the importance of confidence scoring, memory management in AI, tool integrations, and chaos engineering, highlighting the complexity of operational environments and how Cleric aims to enhance engineer productivity without increasing alert fatigue. They also touch on pricing strategies that encourage usage while managing operational costs.

A retenir

  • ☕️ Cleric AI is solving tough problems in AI and infrastructure.
  • 🧠 Knowledge graphs aid in diagnosing issues efficiently.
  • 🔍 Confidence scoring helps prioritize alerts for engineers.
  • 💡 AI agents learn from episodic and procedural memories.
  • 📊 Usage-based pricing is being explored to optimize adoption.
  • ⚙️ Integration with tools like DataDog is crucial for performance.
  • 🌪️ Chaos engineering is used to test AI robustness in production.
  • ⚠️ Engineers are cautious about AI making changes to critical systems.
  • 📈 The complexity of AI in dynamic environments poses unique challenges.
  • 🤝 Building trust is essential for AI adoption among engineers.

Chronologie

  • 00:00:00 - 00:05:00

    Introduction of the hosts and the context of the discussion focusing on AI SRE and knowledge graphs; William's background with cleric AI and Feast. Begins with some light notes about Christmas sweater and caffeine consumption, introducing casual vibe to the conversation.

  • 00:05:00 - 00:10:00

    Discussion of AI SRE being a complex problem tied to MLOps; highlighted the differences between development and production environments, and how operational complexity increases with deployment across different applications, leading to challenges in root cause analysis.

  • 00:10:00 - 00:15:00

    Exploration of how agent systems are being built with modular components, balancing understanding and responsibility versus the productivity gains, leading to operational instability in larger organizations where pressure is high.

  • 00:15:00 - 00:20:00

    Innovation through the use of knowledge graphs to diagnose incidents within systems. Discussion emphasizes the complexity of relationships within IT infrastructures, and how mappings of even small clusters begin to reveal potential issues.

  • 00:20:00 - 00:25:00

    Deep dive into how agents scan systems to identify problems prior to alerts being raised, and the necessity for ongoing updating of knowledge graphs to maintain relevancy in fast-paced production environments.

  • 00:25:00 - 00:30:00

    Relationship between proactive monitoring of production state and agents' ability to query information; the graph's dual role in driving effective diagnosis and enabling exploration of root causes through structured data.

  • 00:30:00 - 00:35:00

    Understanding the background job of graph building and updating during investigations; the potential for agents to uncover hidden issues through continuous environment scanning, including the fiscal impacts of these processes.

  • 00:35:00 - 00:40:00

    Confidence scoring approach for agents ensuring they don't overwhelm engineers with false positives, emphasizing trust and utility; integrating human feedback into performance evaluations to boost efficiency.

  • 00:40:00 - 00:45:00

    The challenge of decision-making in AI-driven environments, where agents need to discern context to retrieve useful information without causing disruption; exploring trade-offs between human input and automated systems.

  • 00:45:00 - 00:50:00

    Agency Learning from experiences, including procedural and episodic memory management creating layered structures; feedback-driven iterations are enhancing performance, building a feedback loop with engineers to improve knowledge repositories.

  • 00:50:00 - 00:55:58

    Examining market pricing strategies for AI agents; approaching pricing based on usage while ensuring engineers can operate without cost-induced anxiety over their investigative actions. Emphasis on a thoughtful revenue model to drive engagement instead of risking underutilization.

Afficher plus

Carte mentale

Vidéo Q&R

  • What is Cleric AI doing?

    Cleric AI is developing innovative solutions using knowledge graphs and AI to help diagnose issues in production environments.

  • What is an AI SRE?

    An AI SRE refers to an AI Site Reliability Engineer who uses AI technologies to manage and maintain the reliability of systems.

  • What challenges do AI agents face in production environments?

    Key challenges include the dynamic nature of systems, the unsupervised nature of problems, and the complexity of understanding relations among various components.

  • How does the knowledge graph help in troubleshooting?

    The knowledge graph maps relationships and dependencies in the production environment, helping diagnose root causes of issues efficiently.

  • What role does confidence scoring play?

    Confidence scoring helps prioritize alerts and determine the reliability of findings before presenting them to engineers.

  • How does Cleric AI handle memory?

    Cleric AI uses episodic and procedural memories to learn from past actions and improves its diagnostics based on feedback.

  • What pricing model is Cleric AI exploring?

    Cleric AI is considering a usage-based pricing model to encourage adoption while covering operational costs.

  • What external tools does Cleric AI integrate with?

    Cleric AI integrates with tools like DataDog to access logs, monitor system health, and gather necessary data for analysis.

  • How does Cleric AI incorporate chaos engineering?

    Cleric AI employs chaos engineering by simulating failures and evaluating agent performance in controlled scenarios.

  • What are the major concerns with AI changes in production environments?

    Concerns include maintaining system reliability, avoiding unnecessary disruptions, and ensuring that AI assists rather than complicates.

Voir plus de résumés vidéo

Accédez instantanément à des résumés vidéo gratuits sur YouTube grâce à l'IA !
Sous-titres
en
Défilement automatique:
  • 00:00:00
    will and Pinar CTO of cleric we're
  • 00:00:03
    building an
  • 00:00:04
    aisr we're based in San
  • 00:00:07
    Francisco black coffee is the way to go
  • 00:00:10
    and if you want to join a team of
  • 00:00:12
    veterans and Ai and infrastructure and a
  • 00:00:15
    really tough problem yeah come and chat
  • 00:00:17
    to us boom welcome back to the m the
  • 00:00:20
    lobs Community podcast I'm your host
  • 00:00:22
    Demetrios today we are talking with my
  • 00:00:25
    good friend William some of you may know
  • 00:00:27
    him as the CTO of cleric a I doing some
  • 00:00:31
    pretty novel stuff with the AIS Sr which
  • 00:00:36
    we dive into very deep in this next hour
  • 00:00:41
    we talk all about how he's using
  • 00:00:43
    knowledge graphs to triage root cause
  • 00:00:46
    issues with their AI agent solution and
  • 00:00:50
    others of you may know Willam because he
  • 00:00:52
    is also the same guy that built the open
  • 00:00:56
    source feature store Feast that's where
  • 00:00:59
    I got to know him back four five years
  • 00:01:02
    ago and since then I've been following
  • 00:01:05
    what he is doing very closely and it's
  • 00:01:08
    safe to say this guy never fails to
  • 00:01:11
    disappoint let's get into the
  • 00:01:13
    conversation right
  • 00:01:15
    [Music]
  • 00:01:20
    now let's start by prefacing this
  • 00:01:23
    conversation with we are recording 2
  • 00:01:26
    days before Christmas so when it comes
  • 00:01:29
    out
  • 00:01:30
    this sweater that I'm wearing is not
  • 00:01:32
    going to be okay but today it is totally
  • 00:01:35
    inbounds for me being able to wear
  • 00:01:38
    it unfortunately I don't have a cool
  • 00:01:40
    sweater like you and I'm in Sunny San
  • 00:01:42
    Francisco but I guess it's got the fog
  • 00:01:46
    yeah it's Christmas
  • 00:01:48
    viod dude I found out three four days
  • 00:01:52
    ago that if you have
  • 00:01:55
    F this pill magic pill with caffeine it
  • 00:02:00
    like minimizes the Jitters so I have
  • 00:02:04
    taken that as an excuse LCN or which
  • 00:02:07
    yeah you've heard of it yeah yeah dude I
  • 00:02:10
    I've just been abusing my caffeine
  • 00:02:12
    intake and pounding these pills with it
  • 00:02:16
    it's amazing I am so much more
  • 00:02:18
    productive so that's my 2025 secret for
  • 00:02:20
    everyone okay I and a bit of magnesium
  • 00:02:23
    for bitter sleep or actual
  • 00:02:26
    steep all right man enough of that you
  • 00:02:29
    been building cleric you've been coming
  • 00:02:32
    on occasionally to the
  • 00:02:35
    different conferences that we've had and
  • 00:02:38
    sharing your learnings but recently you
  • 00:02:41
    put out a blog post and I want to go
  • 00:02:42
    super deep on this blog post on what an
  • 00:02:45
    AI SRE is just because it feels like
  • 00:02:49
    sres are very close to the mlops world
  • 00:02:52
    and AI agents are very much what we've
  • 00:02:56
    been talking about a lot as we were
  • 00:02:58
    presenting at the agents production
  • 00:03:00
    conference the first thing that we
  • 00:03:03
    should start with is just what a hard
  • 00:03:06
    problem this is and why is it hard we
  • 00:03:10
    can dive into those areas and I think
  • 00:03:11
    we're going to get into that in this
  • 00:03:13
    this conversation maybe just a set the
  • 00:03:15
    stage everyone is building agents like
  • 00:03:17
    agents of all the hype right now but
  • 00:03:19
    every use case is different right you've
  • 00:03:22
    got agents in law you've got agents for
  • 00:03:24
    writing blog post you've got agents for
  • 00:03:26
    social media one of the tricky things
  • 00:03:28
    about our space is really if you
  • 00:03:31
    consider two main things that an
  • 00:03:33
    engineer does is the credit software and
  • 00:03:36
    then they deployed into production
  • 00:03:37
    environment and it runs and operates
  • 00:03:38
    actually has to have an impact on the
  • 00:03:39
    real world that second world the
  • 00:03:42
    operational environment is quite
  • 00:03:44
    different from the development
  • 00:03:45
    environment the development environment
  • 00:03:46
    has tests it has an IDE it has Type
  • 00:03:49
    feedback Cycles often it has ground
  • 00:03:51
    truth right so you can make a change and
  • 00:03:54
    see if your test balse there's
  • 00:03:56
    permissionless data sets that are off
  • 00:03:57
    there so you can go to get up and you
  • 00:03:59
    can find like
  • 00:04:00
    millions of issues that people cre PRS
  • 00:04:02
    that are like the solutions to those
  • 00:04:04
    issues yeah but consider like the
  • 00:04:07
    production environment of an Enterprise
  • 00:04:09
    company where do you find the data set
  • 00:04:13
    that represents all the problems that
  • 00:04:14
    they've had and all the solutions it's
  • 00:04:16
    not just laying out there right you can
  • 00:04:18
    get some like root causes and things
  • 00:04:20
    that people have posted as blog posts
  • 00:04:22
    but this is an unsupervised problem for
  • 00:04:24
    the most part it's a very complicated
  • 00:04:26
    problem I I guess we can get get into
  • 00:04:28
    those details in a in a bit but that's
  • 00:04:30
    really what makes this challenging it's
  • 00:04:32
    complex sprawling Dynamic
  • 00:04:35
    systems yeah the complexity of the
  • 00:04:38
    systems does not help and I also think
  • 00:04:40
    with the rise of the coding
  • 00:04:44
    co-pilots does that not also make things
  • 00:04:47
    more complex because you're running
  • 00:04:49
    stuff in a production environment that
  • 00:04:52
    maybe you know how it got created maybe
  • 00:04:55
    you don't massively and I think even at
  • 00:04:58
    our scale small startup it's become a
  • 00:05:01
    topic
  • 00:05:02
    internally how much do we delegate to AI
  • 00:05:05
    because we're also Outsourcing and
  • 00:05:07
    delegating to our own agents internally
  • 00:05:09
    the produce code so I think all teams
  • 00:05:12
    are trying to get to the boundaries of
  • 00:05:14
    understanding and confidence so you're
  • 00:05:16
    building these modular components like
  • 00:05:18
    Lego blocks with internals you're unsure
  • 00:05:20
    about but you're shipping into
  • 00:05:21
    production and seeing how that succeeds
  • 00:05:23
    and fails because it gives you so much
  • 00:05:25
    velocity so the ROI is there but the
  • 00:05:27
    understanding is like one of the things
  • 00:05:28
    you lose over time and I think at scale
  • 00:05:31
    where the incentives aren't aligned
  • 00:05:32
    where you have many different teams and
  • 00:05:34
    they're all being pressured to ship more
  • 00:05:37
    belts are being tightened so there's not
  • 00:05:39
    a lot of head count and they have to do
  • 00:05:40
    more the production environment is
  • 00:05:43
    really people are like putting their
  • 00:05:45
    figers in that damn wall but eventually
  • 00:05:47
    it's going to break it's unstable at a
  • 00:05:49
    lot of companies yeah so coding is going
  • 00:05:52
    to make or AI generated coding is really
  • 00:05:54
    going to make this a much more complex
  • 00:05:56
    system to deal with so the Dynamics
  • 00:05:59
    between these components that
  • 00:06:01
    interrelate where there's much less
  • 00:06:03
    understanding is going to explod yeah
  • 00:06:06
    we're already seeing that dude there's
  • 00:06:08
    so many different pieces on the complex
  • 00:06:10
    systems that I want to dive into but the
  • 00:06:13
    first one that stood out to me and has
  • 00:06:15
    continued to replay in my mind is this
  • 00:06:19
    knowledge graph that you presented at
  • 00:06:21
    the conference and then subsequently in
  • 00:06:24
    your blog post and you made the point of
  • 00:06:27
    saying this is a
  • 00:06:30
    Knowledge Graph that we created on a
  • 00:06:32
    production environment but it's not like
  • 00:06:34
    it's a gigantic kubernetes cluster it
  • 00:06:38
    was a fairly small kubernetes cluster
  • 00:06:41
    and all of the different relations from
  • 00:06:43
    that and all the slack messages and all
  • 00:06:45
    the GitHub issues and everything that is
  • 00:06:48
    involved in that kubernetes cluster
  • 00:06:50
    you've mapped out and that's just for
  • 00:06:52
    one kubernetes cluster so I can't
  • 00:06:54
    imagine across in a whole entire
  • 00:06:56
    organization like an Enterprise size how
  • 00:06:59
    complex this gets yeah so if you
  • 00:07:03
    consider that specific cluster or graph
  • 00:07:05
    I showed you was the op Telemetry
  • 00:07:07
    reference architecture it's like a demo
  • 00:07:09
    stack it's like an e-c Converse store
  • 00:07:10
    it's got about 12 13 Services yeah
  • 00:07:14
    roughly in that
  • 00:07:15
    range I've only shown you literally like
  • 00:07:18
    10% of the relations maybe even less and
  • 00:07:20
    it's only at the infrastructure layer
  • 00:07:21
    right so it's not even talking about
  • 00:07:22
    like buckets and Cloud infras nothing
  • 00:07:25
    about nodes nothing about application
  • 00:07:27
    internals right so if you consider one
  • 00:07:28
    cloud project like a gcp project or AWS
  • 00:07:32
    project mhm there's a whole tree there
  • 00:07:34
    the networks the regions down to the
  • 00:07:36
    kubernetes Clusters within a cluster
  • 00:07:38
    there's the nodes there's the containers
  • 00:07:40
    within the containers there s the pods
  • 00:07:42
    there's multiple containers potentially
  • 00:07:44
    within each of those many processes each
  • 00:07:46
    process has code with variables and each
  • 00:07:50
    let so creates this tree structure but
  • 00:07:52
    then between those noes in the tree can
  • 00:07:54
    also have inter relations right like a
  • 00:07:55
    piece of code here would be referencing
  • 00:07:57
    an IP address but that IP address is
  • 00:08:00
    by some cloud service somewhere and it's
  • 00:08:02
    also connected to some other
  • 00:08:04
    systems and you can't not use that
  • 00:08:07
    information right because if a problem
  • 00:08:09
    arrives and you're you know Landing your
  • 00:08:11
    LA and you have to causally walk that
  • 00:08:14
    graph to go Upstream to find the root
  • 00:08:15
    cause in the security space this is a
  • 00:08:19
    pretty well studied problem and there
  • 00:08:21
    are traditional techniques people have
  • 00:08:23
    been using to extract this from cloud
  • 00:08:25
    environments but LMS really unlock a new
  • 00:08:28
    level of understanding there so they're
  • 00:08:29
    extremely good at extracting these
  • 00:08:32
    relationships taking really unstructured
  • 00:08:33
    data so it can be conversations that you
  • 00:08:36
    and I have it can be kubernetes objects
  • 00:08:38
    it can be all of these like the whole
  • 00:08:40
    Spectrum from unstructured to structured
  • 00:08:42
    you can extract structured information
  • 00:08:43
    so you can build these graphs the
  • 00:08:46
    challenge really is toold so you know
  • 00:08:48
    you need to use this graph to get to a
  • 00:08:50
    root cause but it's fuzzy right as soon
  • 00:08:54
    as you extract that information you
  • 00:08:56
    build that graph it's out of date almost
  • 00:08:58
    instantly because system shange so
  • 00:08:59
    quickly right so somebody's deploying
  • 00:09:01
    something an IP address gets rolled po
  • 00:09:04
    names change and so you needed to be
  • 00:09:08
    able to make efficient decisions with
  • 00:09:10
    your agent right so just to uh anchor
  • 00:09:13
    this our agent is essentially a
  • 00:09:16
    diagnostic agent right now so it helps
  • 00:09:18
    teams quickly root cause a problem so if
  • 00:09:21
    you've got an alert that fires or could
  • 00:09:23
    an engineer presents an issue to the
  • 00:09:25
    agent it's it quickly Advocates this
  • 00:09:28
    graph and its awareness of your
  • 00:09:30
    production environment to find the root
  • 00:09:32
    C if it didn't have the graph it could
  • 00:09:35
    still do it through first principles
  • 00:09:36
    right it could still say looking at
  • 00:09:38
    everything that's available I'll try
  • 00:09:40
    this I'll try that but the graph allows
  • 00:09:42
    it to very efficiently get to the root
  • 00:09:44
    course um and so that fuzziness is one
  • 00:09:48
    of the challenges that the fact that
  • 00:09:49
    it's out of date so quickly but it's so
  • 00:09:52
    important to still have it
  • 00:09:54
    regardless there's a few things that you
  • 00:09:56
    mentioned about how with the vision or
  • 00:10:00
    the understanding of the graph you can
  • 00:10:03
    escalate up issues that may have been
  • 00:10:06
    looked at in isolation is not that big
  • 00:10:08
    of a deal and so can you explain how
  • 00:10:11
    that works a little
  • 00:10:12
    bit so the graph is essentially there's
  • 00:10:16
    two if if you draw box around the
  • 00:10:17
    production environment right there are
  • 00:10:19
    two kinds of issues right there's on you
  • 00:10:21
    you have alerts for and you're awareness
  • 00:10:23
    of so the you tell us like okay my alert
  • 00:10:26
    fired here's a problem go look at it
  • 00:10:27
    another is we scan the environment and
  • 00:10:30
    we identify problems the graph is built
  • 00:10:33
    in two ways one is a background job
  • 00:10:36
    where it's just like looking through
  • 00:10:37
    your infrastructure and finding new
  • 00:10:39
    things and updating itself continuously
  • 00:10:41
    and the other is when the agent's doing
  • 00:10:42
    investigation and it sees new
  • 00:10:44
    information and it just throws that back
  • 00:10:45
    into the graph because it's got the
  • 00:10:47
    information mil just usage update the
  • 00:10:49
    graph but in this background scanning
  • 00:10:51
    process it might uncover things that it
  • 00:10:54
    didn't realize was a problem but then it
  • 00:10:56
    sees it CU this is actually a problem
  • 00:10:58
    for example it could process your
  • 00:11:01
    metrics or it could look at your
  • 00:11:03
    configuration of your objects in
  • 00:11:05
    kubernetes or maybe it finds a bucket
  • 00:11:07
    and it's try to create that node the
  • 00:11:10
    updated state of the bucket and it sees
  • 00:11:12
    exposed publicly so then you could
  • 00:11:14
    surface the student engineer and say
  • 00:11:16
    your data is being exposed publicly or
  • 00:11:19
    they've misconfigured this pod and the
  • 00:11:21
    memory is growing this application and
  • 00:11:24
    in about an hour or two this is going to
  • 00:11:26
    crash yeah so there's a massive opportun
  • 00:11:29
    for Elin to be used as reasoning engines
  • 00:11:32
    where it can infer and predict a failure
  • 00:11:35
    imminently and you can prev that so you
  • 00:11:37
    get to a proactive state of alerting
  • 00:11:40
    that is of course quite inefficient
  • 00:11:42
    today if you use an LM to just slap it
  • 00:11:45
    on a vision model onto a metrix graph or
  • 00:11:48
    yeah onto you your objects in your Cloud
  • 00:11:51
    infrastructure but there's a massive low
  • 00:11:53
    hanging fruit there where you just still
  • 00:11:55
    a lot of those inferencing capabilities
  • 00:11:57
    to fine- tuned or more purpose both
  • 00:11:59
    models for each one of these tasks but
  • 00:12:02
    how does the scanning work because I
  • 00:12:05
    know that you also mention the agents
  • 00:12:09
    will go until they run out of credit or
  • 00:12:13
    something or until they hit their like
  • 00:12:14
    spend limit when they're trying to root
  • 00:12:17
    cause analysis some kind of a problem
  • 00:12:21
    but I can imagine that you're not just
  • 00:12:24
    continuously scanning or are you kicking
  • 00:12:26
    off scans every x amount of seconds or
  • 00:12:28
    minutes or days yeah so there are
  • 00:12:31
    different parts to this if we do
  • 00:12:33
    background scanning graph building we
  • 00:12:35
    try and use more efficient models so
  • 00:12:39
    because of the volume of data you don't
  • 00:12:41
    use expensive models that are used for
  • 00:12:43
    like you know very accurate reasoning
  • 00:12:46
    yeah and so the costs all lower and so
  • 00:12:47
    you set it like a daily budget of that
  • 00:12:49
    and then you run up to the budget this
  • 00:12:52
    is not something that's constantly
  • 00:12:53
    running and processing large amounts of
  • 00:12:55
    information think about it as like a
  • 00:12:57
    human right you wouldn't process all
  • 00:12:59
    logs and all information your Cloud
  • 00:13:01
    infrastructure you just get like a lay
  • 00:13:03
    of the land like like what are the most
  • 00:13:05
    recent deployments what are the most
  • 00:13:06
    recent conversations people are having
  • 00:13:08
    it's like get like a a playby play so
  • 00:13:11
    that when an issue comes up you can
  • 00:13:13
    quickly jump into action you you can
  • 00:13:14
    fast thinking you can make the right
  • 00:13:16
    decisions quickly but in investigation
  • 00:13:19
    we set a cap we say per
  • 00:13:23
    investigation let's say make it 10 cents
  • 00:13:25
    or make it a dollar or whatever and then
  • 00:13:28
    we tell the AG this is how much you've
  • 00:13:30
    been assigned use it as best you can go
  • 00:13:33
    find information that you need through
  • 00:13:35
    your
  • 00:13:35
    tools and they allow the human to say
  • 00:13:38
    okay go a bit further or stop here I'll
  • 00:13:41
    take over wow and so we bring the human
  • 00:13:43
    in the loop as soon as the agent has
  • 00:13:45
    something valuable to present to it so
  • 00:13:48
    if the agent goes off on a quest and it
  • 00:13:50
    finds almost nothing it can present that
  • 00:13:52
    to the human say up nothing or say okay
  • 00:13:54
    couldn't find anything or just remain
  • 00:13:56
    quiet depends on how you've configured
  • 00:13:58
    it but it'll always at that budget limit
  • 00:14:01
    yeah the benefit of it not finding
  • 00:14:04
    anything also is that it will narrow
  • 00:14:07
    down where the human has to go and
  • 00:14:09
    search so now the human doesn't have to
  • 00:14:12
    go and look through all this crap that
  • 00:14:14
    the AI agent just looked through because
  • 00:14:17
    ideally if the agent didn't catch
  • 00:14:19
    anything it's hopefully not there and so
  • 00:14:22
    the human can go and look in other
  • 00:14:24
    places first and if they exhaust all
  • 00:14:26
    their options they can go back and try
  • 00:14:28
    and see where the agent was looking and
  • 00:14:30
    see if that's where the problem
  • 00:14:32
    is I think this comes back to the
  • 00:14:34
    fundamental problem here and maybe we
  • 00:14:37
    glassed over some of this like tools
  • 00:14:41
    don't solve the problem of operation
  • 00:14:43
    operations is an on call no amount of
  • 00:14:46
    data dogs or dashboards or cube C
  • 00:14:49
    commands will free your senior Engineers
  • 00:14:52
    up from getting into the production
  • 00:14:54
    environment
  • 00:14:55
    so what we're trying to get to is ENT to
  • 00:14:59
    end resolution when we find a problem
  • 00:15:02
    can the agent go all the way multiple
  • 00:15:05
    steps which today requires Engineers
  • 00:15:07
    reasoning and judgment looking at
  • 00:15:09
    different tools understanding tribal
  • 00:15:11
    knowledge understanding why systems have
  • 00:15:12
    been deployed we want to get the agents
  • 00:15:15
    there but you can't St there because
  • 00:15:17
    this is an unsupervised problem you
  • 00:15:19
    can't just start changing things in
  • 00:15:20
    production nobody would do that right
  • 00:15:23
    now if you scale that back from
  • 00:15:25
    resolution meaning change like code
  • 00:15:27
    level change ter for things in your
  • 00:15:31
    Reapers if you walk it back from that
  • 00:15:33
    it's understanding what the problem is
  • 00:15:34
    and if you walk it back further from
  • 00:15:35
    that it's search space reduction
  • 00:15:37
    triangulating the problem into a
  • 00:15:39
    specific area maybe not saying the line
  • 00:15:41
    of code but saying here's the service or
  • 00:15:43
    here's the cluster and that's already
  • 00:15:45
    very compelling to a human or you can
  • 00:15:47
    say it's not this these 400 other Cloud
  • 00:15:50
    clusters or providers or Services is
  • 00:15:53
    probably in this one and that is
  • 00:15:56
    extremely useful to an engineer today so
  • 00:15:59
    space reduction is one of the things
  • 00:16:01
    that we are very reliable at and where
  • 00:16:02
    we've started and we start in a kind of
  • 00:16:05
    collaborative mode so we quickly reduce
  • 00:16:08
    the search space we tell you what we
  • 00:16:09
    checked and what we didn't and then we
  • 00:16:11
    as an engineer we can say okay here's
  • 00:16:12
    some more context go a bit further and
  • 00:16:14
    try this piece of information and in
  • 00:16:17
    that steering and then collaboration we
  • 00:16:20
    learn from engineers and they teach us
  • 00:16:22
    and we get better and better over time
  • 00:16:23
    on this like road to
  • 00:16:25
    resolution yeah I know you mentioned
  • 00:16:27
    memory and I want to get into that in a
  • 00:16:28
    sec but but keeping on the theme of
  • 00:16:31
    money and cost and the Agents having
  • 00:16:36
    more or less a budget that they can go
  • 00:16:38
    expend and try and find what they're
  • 00:16:40
    looking for do you see that agents will
  • 00:16:44
    get stuck in recursive loops and then
  • 00:16:46
    use their whole budget and not really
  • 00:16:49
    get much of anything or is that
  • 00:16:50
    something that was fairly
  • 00:16:53
    common six or 10 months ago but now
  • 00:16:57
    you've found ways to counterbalance that
  • 00:17:01
    problem this problem space is one where
  • 00:17:04
    small little additions to your or
  • 00:17:06
    improvements to your product make a big
  • 00:17:08
    difference over time because they
  • 00:17:10
    compound we've learned a lot from
  • 00:17:12
    decoding agents like s agent and others
  • 00:17:15
    so one of the things they found was that
  • 00:17:17
    when the agent succeeds it succeeds very
  • 00:17:19
    quickly when fails very slowly so
  • 00:17:22
    typically you can even see as a proxy
  • 00:17:23
    has the agent run for 3 4 5 6 7 minutes
  • 00:17:27
    it's probably wrong even if you don't
  • 00:17:29
    score it at all and if it ran into like
  • 00:17:32
    it came to a conclusion quickly like in
  • 00:17:33
    30 seconds it's probably going to be
  • 00:17:35
    right our agents sometimes do chase
  • 00:17:38
    their tails so we have a confidence
  • 00:17:40
    score and we have a critiquer at the end
  • 00:17:42
    that assesses the agent so we try and
  • 00:17:45
    not you know spam the human ultimately
  • 00:17:48
    it's about attention and saving them
  • 00:17:49
    time so if you keep throwing like bad
  • 00:17:51
    findings and bad information they really
  • 00:17:53
    they'll just rip you out of your their
  • 00:17:55
    production environment because it's
  • 00:17:56
    going to be noisy right that's the last
  • 00:17:57
    thing they want so yes depending on the
  • 00:18:01
    use case the agent can go a recursive
  • 00:18:04
    Loop or it can go on a direction that it
  • 00:18:06
    should so for us a really effective
  • 00:18:09
    mechanism to manage that is
  • 00:18:12
    understanding where we're good and where
  • 00:18:13
    we're bad so for each issue or event
  • 00:18:16
    that comes in we do an enrichment and
  • 00:18:17
    then we build the full context of that
  • 00:18:19
    issue and then we look at have we seen
  • 00:18:21
    this in the past similar issues have we
  • 00:18:24
    Sol how have we solved this in the p and
  • 00:18:26
    have we had positive feedback and so if
  • 00:18:27
    we Che the right historical context we
  • 00:18:30
    get a good idea of our confidence on
  • 00:18:31
    something before presenting that
  • 00:18:33
    information to a human like the the
  • 00:18:34
    ultimate set of findings but yeah
  • 00:18:37
    sometimes it does go a
  • 00:18:39
    ride I'm trying to think is the
  • 00:18:41
    knowledge graph something that you are
  • 00:18:44
    creating once getting an idea the lay of
  • 00:18:47
    the land and then there's almost like
  • 00:18:51
    stuff doesn't really get updated until
  • 00:18:53
    there's an incident and you go and you
  • 00:18:55
    Explore More and what kind of knowledge
  • 00:18:58
    graphs are using or you use many
  • 00:18:59
    different knowledge graphs is it just
  • 00:19:01
    one big one how does that even look in
  • 00:19:04
    practice we originally started with one
  • 00:19:06
    big Knowledge Graph the thing with these
  • 00:19:08
    knowledge graphs is that they're often
  • 00:19:10
    the F threat them is deterministic
  • 00:19:12
    Method so you can run CU cuddle and you
  • 00:19:15
    can just walk the cluster with
  • 00:19:17
    traditional techniques there's no no a
  • 00:19:19
    or all am involved but then you want to
  • 00:19:22
    layer on top of that this the fuzzy
  • 00:19:24
    relationships where you see this
  • 00:19:25
    container has this sort of a reference
  • 00:19:27
    to something over there or confit map
  • 00:19:29
    mentions something that I've I've seen
  • 00:19:32
    somewhere else and so what we've gone
  • 00:19:35
    towards is a more layered approach so we
  • 00:19:38
    have like multiple grow off layers where
  • 00:19:40
    some of them have a higher confidence
  • 00:19:42
    and durability and can be updated
  • 00:19:44
    quickly or perhaps using different
  • 00:19:46
    techniques and then you layer on the
  • 00:19:48
    more fuzzy layers on top of that or or
  • 00:19:51
    different lay so you could use an owl in
  • 00:19:52
    to kind of canvas the landscape between
  • 00:19:55
    clusters or from accum cluster to maybe
  • 00:19:59
    the application layer or to the layers
  • 00:20:00
    below but using smaller micro graphs has
  • 00:20:03
    been easier for us from like a data
  • 00:20:05
    management
  • 00:20:07
    perspective what are other data points
  • 00:20:09
    that you're then mapping out for the
  • 00:20:11
    knowledge graph that can be helpful
  • 00:20:13
    later on when the AI s re is trying to
  • 00:20:19
    triage different
  • 00:20:21
    problems in most teams there's an
  • 00:20:25
    820 like burrito distribution of value
  • 00:20:29
    um yeah so some of the key factors are
  • 00:20:31
    often find in the same system I think it
  • 00:20:33
    was M meta or yeah that's had some
  • 00:20:37
    internal survey where they found out
  • 00:20:39
    that 50 or 60% of their production
  • 00:20:41
    issues were just due to config or code
  • 00:20:43
    changes anything that disrupted their
  • 00:20:45
    prod environment so if you're just
  • 00:20:48
    looking at what people are deploying
  • 00:20:49
    like you're following the humans you're
  • 00:20:50
    going to probably find a lot of the
  • 00:20:51
    problems so monitoring slack monitoring
  • 00:20:55
    deployments is one of the most effective
  • 00:20:57
    things to do looking at like releases or
  • 00:21:01
    changes that people are scheduling and
  • 00:21:03
    understanding those events so having an
  • 00:21:05
    assessment of that and then in the
  • 00:21:07
    resolution path there's also or the way
  • 00:21:10
    to build the resolution looking at run
  • 00:21:12
    books looking at how people have solved
  • 00:21:14
    problems in the past
  • 00:21:16
    like often what happens is like a slack
  • 00:21:19
    thread is created right so the slack
  • 00:21:21
    thread is like a contextual container
  • 00:21:24
    for how do you go from a problem which
  • 00:21:26
    somebody produ upgrades trade for so a
  • 00:21:28
    Sol
  • 00:21:29
    and summarizing these slack phrases is
  • 00:21:31
    extremely useful so you can basically
  • 00:21:33
    say like this engineer came into this
  • 00:21:35
    problem this was the discussion and this
  • 00:21:37
    is the final conclusion and there's
  • 00:21:38
    often like a PR attached to that so you
  • 00:21:40
    can condense that down to almost like a
  • 00:21:42
    guidance or like a run book yeah and
  • 00:21:45
    attaching that into like novel scenarios
  • 00:21:48
    is useful because it shows you how this
  • 00:21:50
    team does things and they they often
  • 00:21:52
    contain probable knowledge right so this
  • 00:21:54
    is how we solve problems at our company
  • 00:21:56
    we connect to our vpns like this we
  • 00:21:59
    access to system think these are the Key
  • 00:22:00
    Systems right the the most important
  • 00:22:02
    systems in your production environment
  • 00:22:03
    will be referenced by Engineers
  • 00:22:05
    constantly yeah um often through Shand
  • 00:22:09
    notations um and if you speak to
  • 00:22:11
    Engineers of most companies those will
  • 00:22:14
    be the two bigger problems right one is
  • 00:22:17
    you don't understand our systems and our
  • 00:22:20
    processes and our context and the second
  • 00:22:23
    one is that you don't know how to
  • 00:22:24
    integrate or access these because
  • 00:22:26
    they're custom and bespoke and homegrown
  • 00:22:29
    and so those are the two challenges that
  • 00:22:31
    we face as like agencies basically we're
  • 00:22:34
    like a new engineer on the team and you
  • 00:22:35
    need to be taught by this engineering
  • 00:22:37
    team if you're not taught then you're
  • 00:22:39
    never going to succeed I hope that
  • 00:22:41
    answered your question yeah and how do
  • 00:22:43
    you overcome that you just are creating
  • 00:22:47
    some kind of a glossery with the these
  • 00:22:50
    shorthand things that the that are
  • 00:22:53
    fairly common within the organization or
  • 00:22:56
    what yeah so you there's multiple layers
  • 00:22:58
    to this and I think this is quite
  • 00:23:01
    evolving space thankfully alls are
  • 00:23:03
    pretty adaptive and forgiving in this
  • 00:23:06
    regard so we can experiment with
  • 00:23:08
    different ways to summarize different
  • 00:23:09
    levels of granularity so we've looked at
  • 00:23:11
    okay can you just take like a massive
  • 00:23:13
    amount of information and just shove
  • 00:23:15
    that to the context window give it in a
  • 00:23:17
    relatively rawal form and that works but
  • 00:23:19
    it's quite expensive yeah and then you
  • 00:23:21
    show it like more condensed form and you
  • 00:23:23
    say this is just the like tip of the
  • 00:23:25
    iceberg for any one of these topics you
  • 00:23:27
    can you can query using this tool I get
  • 00:23:29
    more information yeah and it's not
  • 00:23:33
    always easy to know which one is the
  • 00:23:36
    best because it's dependent on the issue
  • 00:23:37
    at hand right because sometimes a key
  • 00:23:39
    factor needle and aasta is buried one
  • 00:23:41
    level deeper and the agent can't see it
  • 00:23:44
    because it has a call a tool to get to
  • 00:23:45
    it so we typically ear on the side of
  • 00:23:48
    spending more money and just having the
  • 00:23:51
    agent see it and then optimizing cost
  • 00:23:53
    and latency over time for us it's really
  • 00:23:56
    about being valuable out of the gate
  • 00:23:59
    Engineers should find this valuable and
  • 00:24:01
    in that value the collaboration starts
  • 00:24:04
    and then it creates a virtuous cycle
  • 00:24:06
    where they feed feed us more information
  • 00:24:07
    they give us more information they get
  • 00:24:09
    more value because we take more grunt
  • 00:24:11
    work off their plate and and it's it's
  • 00:24:14
    like training a new person on your team
  • 00:24:16
    if you see that oh this person is taking
  • 00:24:18
    more and more tasks yeah I'll just get
  • 00:24:20
    them more information I'll give them
  • 00:24:21
    more scope yeah I want to go into a
  • 00:24:23
    little bit of the ideas that you're
  • 00:24:27
    talking about there like how you can
  • 00:24:28
    interact with the agent and but I feel
  • 00:24:32
    like the gravitational pull towards
  • 00:24:35
    asking you about memory and how you're
  • 00:24:38
    doing that is too strong so we got to go
  • 00:24:40
    down that route first and
  • 00:24:43
    specifically are you just caching these
  • 00:24:47
    answers are you caching like successful
  • 00:24:50
    runs how do you go about knowing that a
  • 00:24:53
    something was successful and then where
  • 00:24:55
    do you store it how do you like give
  • 00:24:57
    that access
  • 00:24:58
    or agents get access to that and they
  • 00:25:01
    know that oh we've seen this before yeah
  • 00:25:03
    cool boom it feels like that is quite
  • 00:25:07
    complex in theory you would be like yeah
  • 00:25:10
    of course we're just going to store
  • 00:25:11
    these successful runs but then when you
  • 00:25:13
    break it down and you say all right what
  • 00:25:15
    does success mean and what where we
  • 00:25:17
    going to store it and who's going to
  • 00:25:20
    have access to that and how we going to
  • 00:25:21
    label that as successful like I was
  • 00:25:23
    thinking how do you even go about
  • 00:25:25
    labeling this kind of because is it
  • 00:25:28
    you sitting there clicking and human
  • 00:25:30
    annotating stuff or is it you're
  • 00:25:33
    throwing it to another llm to say yay
  • 00:25:36
    success what does it look like break
  • 00:25:38
    that whole thing down for me because
  • 00:25:40
    memory feels quite complex and that when
  • 00:25:43
    you really look at
  • 00:25:45
    it it is a big part of this is also the
  • 00:25:48
    ux challenge because people don't want
  • 00:25:50
    to just sit there and label I think
  • 00:25:52
    people are just like especially
  • 00:25:54
    Engineers are really tired of slop code
  • 00:25:56
    and they're just being thrown this like
  • 00:25:58
    slop and then they have to review they
  • 00:26:00
    want to create and I think that's what
  • 00:26:02
    we're trying to do is free them out from
  • 00:26:03
    support but in doing so you don't want
  • 00:26:05
    to get them to like constantly review
  • 00:26:08
    your work with no benefit so that's the
  • 00:26:11
    key thing there has to be interaction
  • 00:26:13
    where there's implicit feedback and they
  • 00:26:15
    get value out of that and so I'm getting
  • 00:26:19
    to your point about memory so
  • 00:26:21
    effectively there is three types of
  • 00:26:23
    memory there's the like Knowledge Graph
  • 00:26:25
    which captures the system State and the
  • 00:26:27
    relations between
  • 00:26:28
    things then there's episodic and
  • 00:26:31
    procedural memory so the procedural
  • 00:26:33
    memory is like how to ride a bicycle
  • 00:26:35
    you've got your brakes here you your
  • 00:26:37
    pedal here it's like the guide It's
  • 00:26:39
    almost like the Run book but the Run
  • 00:26:41
    book doesn't describe for this specific
  • 00:26:45
    issue that we had on this date what did
  • 00:26:47
    we do the instance of that is the
  • 00:26:50
    episode or the episodic memory and both
  • 00:26:53
    of those need to be captured right so
  • 00:26:55
    when we start we're indexing your
  • 00:26:56
    environment getting all these like
  • 00:26:58
    ations and things and then we also look
  • 00:27:00
    at okay are there things that we can
  • 00:27:02
    extract from this world where we've got
  • 00:27:04
    procedures and then finally as we
  • 00:27:08
    experience things or as we understand
  • 00:27:10
    the experiences of others within this
  • 00:27:12
    environment we can store those as well
  • 00:27:14
    we have really spent a lot of time and
  • 00:27:17
    most companies care about this a lot
  • 00:27:20
    securing data so we are deployed in your
  • 00:27:23
    production environment and we only have
  • 00:27:25
    read in the AIS so our agent cannot make
  • 00:27:27
    change I didn't make suggestions so all
  • 00:27:30
    your data want to change that right that
  • 00:27:32
    later we'll talk about like how you want
  • 00:27:35
    to eventually get to a different state
  • 00:27:37
    but yeah continue yeah yeah we want to
  • 00:27:39
    get to close dup resolution but that's a
  • 00:27:42
    that's a longer part so we're storing
  • 00:27:44
    all of these memories mostly As I think
  • 00:27:48
    the valuable ones are the episodes right
  • 00:27:50
    those are the like the instances like if
  • 00:27:53
    this happened or this happened and I
  • 00:27:54
    solve it in this way we had a black rodl
  • 00:27:57
    the cluster fell over we scaled it up
  • 00:28:01
    and they later we saw it was working but
  • 00:28:03
    oh it's done and we did that two or
  • 00:28:06
    three times and we think that's a good
  • 00:28:08
    pattern like scaling is effective but
  • 00:28:10
    that's all captured in the environment
  • 00:28:14
    um of the customer our primary mean of
  • 00:28:16
    feed means of feedback is monitoring
  • 00:28:19
    system Health post um change oh nice we
  • 00:28:23
    can look at the system and see that this
  • 00:28:26
    change has been effective and we can
  • 00:28:27
    look at the code of the environment
  • 00:28:29
    whether it's the application code or the
  • 00:28:31
    INF infrastructure code basically as
  • 00:28:33
    like a masking problem do we see or can
  • 00:28:37
    we predict the change the human will
  • 00:28:39
    make in order to solve this problem and
  • 00:28:40
    if they do then make that change
  • 00:28:42
    especially if it's a recommendation then
  • 00:28:44
    we see that they has B Green look what
  • 00:28:46
    we've done right they've actually
  • 00:28:48
    approved our suggestion yeah that is not
  • 00:28:52
    super rich data source because the
  • 00:28:54
    change that they may make be slightly
  • 00:28:56
    different or we may not have access to
  • 00:28:58
    those systems a more effective way is
  • 00:29:02
    interaction so if we present findings
  • 00:29:04
    and say Here's five findings and here's
  • 00:29:06
    our diagnosis and you say this is dumb
  • 00:29:09
    try something else then we know that was
  • 00:29:10
    bad so we get a lot of negative examples
  • 00:29:13
    right so this is bad and so it's a
  • 00:29:15
    little bit lopsided but then when you
  • 00:29:17
    eventually say oh okay I'm going to
  • 00:29:19
    prove this and I'm going to blast this
  • 00:29:21
    out of the engineuity team or I'm going
  • 00:29:22
    to update my P Duty notes or I'm going
  • 00:29:24
    to I want you to generate a pull request
  • 00:29:27
    from the information then suddenly we've
  • 00:29:30
    got like positive feedback on that in
  • 00:29:32
    the user experience it's really implicit
  • 00:29:35
    source of information the interaction
  • 00:29:37
    with the engineer and that gets attached
  • 00:29:39
    to these memories But ultimately at the
  • 00:29:42
    end of the day it's still a very sparse
  • 00:29:44
    data set so these memories you you may
  • 00:29:47
    not have true labels and so for us a
  • 00:29:51
    massive investment has been our
  • 00:29:53
    evaluation bench which is external from
  • 00:29:56
    customers where we train our agents and
  • 00:29:58
    we do a lot of really hand the
  • 00:30:00
    handcrafted labeling whereas even a
  • 00:30:02
    smaller data set gets the agent to a
  • 00:30:04
    much much higher degree of accurac so
  • 00:30:07
    you want a bit of both right you want
  • 00:30:08
    the real production use cases with
  • 00:30:09
    engineering feedback which does preside
  • 00:30:12
    present good information but the eval
  • 00:30:14
    bench is ultimately what is the Firm
  • 00:30:17
    Foundation that gives you that coverage
  • 00:30:18
    at the moment but it feels like the
  • 00:30:21
    evals have to be specific to customers
  • 00:30:24
    don't they and it also feels like each
  • 00:30:26
    deployment of each agent has to be a bit
  • 00:30:29
    bespoken custom per agent or am I
  • 00:30:34
    mistaken in that one the pattern are
  • 00:30:37
    very so the agents are pretty
  • 00:30:39
    generalized the agents get contextual
  • 00:30:41
    information per customer so it gets
  • 00:30:44
    injected like localized customer
  • 00:30:46
    specific procedures and memories and all
  • 00:30:49
    those things but those are lated on the
  • 00:30:52
    base which is developed inside of our
  • 00:30:55
    product right like in the mothership or
  • 00:30:57
    actually it's called the Temple of cler
  • 00:31:00
    um so we distribute like new versions of
  • 00:31:03
    cleric and our prompts our logic our
  • 00:31:06
    reasoning generalized memories or
  • 00:31:09
    approaches to solving problems are
  • 00:31:12
    imbued in a Divine way into the cleric
  • 00:31:14
    and it's sent up it's a layering
  • 00:31:17
    challenge right because you do want to
  • 00:31:18
    have cross cutting benefits to all
  • 00:31:21
    customers and accuracy driven by the
  • 00:31:23
    eval bench but also customization at
  • 00:31:27
    their on their processes and like
  • 00:31:29
    customer specific approaches all right
  • 00:31:32
    so there's a few other things that are
  • 00:31:36
    fascinating to me when it comes to the
  • 00:31:37
    UI and the ux of how you're doing things
  • 00:31:41
    specifically how you are very keen on
  • 00:31:46
    not giving Engineers more alerts unless
  • 00:31:50
    it absolutely needs to happen and I
  • 00:31:53
    think that's something that I've been
  • 00:31:54
    hearing since
  • 00:31:56
    2018 and it was all on alert fatigue and
  • 00:32:00
    how when you have complex systems and
  • 00:32:02
    you set up all of this monitoring and
  • 00:32:04
    observability you inevitably are just
  • 00:32:06
    getting pinged continuously because
  • 00:32:09
    something is out of whack and so the
  • 00:32:14
    ways that you made sure to do this and I
  • 00:32:17
    thought this was fascinating is a have a
  • 00:32:19
    confidence score so be able to say look
  • 00:32:22
    we think that this is like this and
  • 00:32:26
    we're giving it 75%
  • 00:32:28
    confidence that this is going to happen
  • 00:32:30
    or this could be bad or whatever it may
  • 00:32:33
    be and then B if it is under a certain
  • 00:32:38
    percent confidence score you just don't
  • 00:32:41
    even tell anyone and you try and figure
  • 00:32:43
    out isn't actually a problem and I I'm
  • 00:32:45
    guessing you continue working or you
  • 00:32:47
    just forget about it explain that whole
  • 00:32:51
    user experience and how you came about
  • 00:32:53
    that yeah we realized because this is a
  • 00:32:55
    trust building exercise we can't just
  • 00:32:58
    respond with whatever we find and the
  • 00:33:00
    Agents
  • 00:33:02
    can sometimes they're just not
  • 00:33:04
    especially during the onboarding excuse
  • 00:33:06
    doing the onboarding phase they don't
  • 00:33:07
    have the necessary access and they don't
  • 00:33:09
    have the context right and so at least
  • 00:33:11
    at the start when you're training the
  • 00:33:13
    agent you don't want it to just Spam me
  • 00:33:15
    with this raw ideas and so the
  • 00:33:17
    confidence score was one that I think a
  • 00:33:20
    lot of teams are actually trying to
  • 00:33:22
    build into their products as agent
  • 00:33:23
    Builders it's extremely hard in this
  • 00:33:26
    case because it's such an un
  • 00:33:28
    unsupervised
  • 00:33:30
    problem I'm trying to not get into the
  • 00:33:32
    RO details because there's a lot of like
  • 00:33:34
    effort we've put into that like building
  • 00:33:36
    this confidence score as a big part of
  • 00:33:38
    our IP is like how do we measure our own
  • 00:33:41
    success information Divine name for the
  • 00:33:45
    IP or something it's not your IP it's
  • 00:33:48
    your what was it when Moses was up on
  • 00:33:50
    the hill and he got the Revelation it
  • 00:33:53
    was yeah this is not your IP this is
  • 00:33:55
    your Revelations that you've had yeah
  • 00:33:57
    the but so the high level is basically
  • 00:34:00
    that it's really driven by this data fly
  • 00:34:04
    wheel it's really driven by experience
  • 00:34:06
    and that's also how an engineer does
  • 00:34:08
    things but those can be again like two
  • 00:34:10
    layered like from the base layers of the
  • 00:34:12
    product but also experiences in this
  • 00:34:15
    company so we do use an LM for self
  • 00:34:18
    assessment but it's also driven and
  • 00:34:20
    grounded by existing experiences so we
  • 00:34:24
    inject a lot of those experiences and
  • 00:34:26
    whether those are positive or negative
  • 00:34:28
    outcomes and as an engineer you can set
  • 00:34:32
    the threshold so you can say oh nice
  • 00:34:35
    only extremely high relevance findings
  • 00:34:38
    or diag mines should be shown and you
  • 00:34:41
    can set the conciseness and specificity
  • 00:34:44
    so you can say I just wanted one
  • 00:34:45
    sentence or just give me a word or give
  • 00:34:49
    me all the raw information so what we do
  • 00:34:53
    today is we're very asynchronous so an
  • 00:34:56
    alert virus will go from a quest find
  • 00:34:58
    the whatever information we can and come
  • 00:34:59
    back if we're confident will respond if
  • 00:35:02
    not we'll just be quiet but then you can
  • 00:35:05
    engage with us in a synchronous way so
  • 00:35:07
    it starts async and then you can kick
  • 00:35:09
    the ball back and forth in a synchronous
  • 00:35:11
    way and in a synchronous mode the sorry
  • 00:35:14
    in the synchronous mode it's very
  • 00:35:16
    interactive and and lower latency we
  • 00:35:18
    will almost always respond if you ask us
  • 00:35:21
    a question we'll respond so then the
  • 00:35:22
    conference score is less important
  • 00:35:24
    because then it's like the user is
  • 00:35:26
    refining that answer saying go back try
  • 00:35:28
    this go back try this but for us the key
  • 00:35:31
    thing is we have to come back with good
  • 00:35:33
    initial findings and that's why the
  • 00:35:35
    conference score is so important but
  • 00:35:36
    again it's really driven by
  • 00:35:39
    experiences just to like re reiterate
  • 00:35:41
    like why this is such a complex problem
  • 00:35:44
    to solve you can't just take a
  • 00:35:46
    production environment and say okay I'm
  • 00:35:48
    going to spin this up in a Docker
  • 00:35:49
    container and reproduce it at a specif
  • 00:35:51
    point in time at many companies you
  • 00:35:53
    can't even do a low test Cross services
  • 00:35:56
    it's so complex it's all different
  • 00:35:57
    different teams they're all interrelated
  • 00:35:59
    you can do this as for a small startup
  • 00:36:01
    with one application running on Heroku
  • 00:36:02
    or versel but doing this at scale is
  • 00:36:05
    virtually impossible at most companies
  • 00:36:07
    so you you don't have that ground trick
  • 00:36:10
    you you can't say with 100% certainty
  • 00:36:12
    whether you're right or wrong and that's
  • 00:36:13
    just the state we're in right now
  • 00:36:15
    despite that the confidence score has
  • 00:36:18
    been a very powerful technique to at
  • 00:36:21
    least eliminate most true positiv like
  • 00:36:25
    or and when we know that we don't we
  • 00:36:26
    don't have anything of substance just
  • 00:36:29
    being
  • 00:36:30
    quiet but how do you know if you got
  • 00:36:35
    enough
  • 00:36:36
    information when you were doing the scan
  • 00:36:39
    or you were doing the search to go back
  • 00:36:42
    to the human and give that information
  • 00:36:46
    and also how do you know that you are
  • 00:36:49
    fully understanding what the human is
  • 00:36:52
    asking for when you're doing that back
  • 00:36:53
    and forth honestly this is one of the
  • 00:36:56
    key parts that that's very challenging
  • 00:36:58
    it's a human will say the checkout
  • 00:37:01
    service is done and you need to know
  • 00:37:04
    that there are probably maybe based on
  • 00:37:07
    who the engineer is talking about
  • 00:37:10
    production or if they've been talking
  • 00:37:12
    about developing a new feature they're
  • 00:37:14
    probably talking about the dev
  • 00:37:15
    environment and if you go down the wrong
  • 00:37:17
    path then you can spend some money and
  • 00:37:20
    like a lot of time inves something
  • 00:37:22
    that's useless so what we do is even at
  • 00:37:24
    the initial message that comes in we
  • 00:37:27
    will will ask for a clarifying question
  • 00:37:29
    if we are not sure about what you're
  • 00:37:31
    asking if you've not been specific
  • 00:37:33
    enough and most agent Boulders even if
  • 00:37:35
    cognitions Debon but do this that
  • 00:37:38
    initially they'll say okay do you mean X
  • 00:37:39
    Y and Z okay this is my plan okay I'm
  • 00:37:41
    going to go do it now so there is the
  • 00:37:44
    sense of confidence built into these
  • 00:37:45
    products from a ux layer and that's
  • 00:37:47
    where we are right now it's with chat
  • 00:37:50
    you can sometimes say or with Claude
  • 00:37:52
    something very inaccurate or vague and I
  • 00:37:56
    can probably guess the right answer
  • 00:37:58
    because the cost is not multi-step right
  • 00:38:00
    it's very cheap you can just quickly fix
  • 00:38:02
    your text but for us we have to Short
  • 00:38:05
    Circuit that and make sure that you're
  • 00:38:06
    specific enough in your initial
  • 00:38:08
    instructions and then over time loosen
  • 00:38:10
    that a bit as we understand a bit more
  • 00:38:12
    what your teams are doing what things
  • 00:38:13
    are what are you up to you can be more
  • 00:38:16
    vague but for now it requires a bit more
  • 00:38:18
    specificity and
  • 00:38:20
    guidance speaking of the multi- turns
  • 00:38:24
    and spending money for things or trying
  • 00:38:27
    to not waste money and going down the
  • 00:38:31
    wrong tree branch or Rabbit Hole how do
  • 00:38:34
    you think about pricing for agents is it
  • 00:38:38
    all consumption based are you looking at
  • 00:38:40
    what the price of an SRE would be and
  • 00:38:43
    you're saying oh we'll price a
  • 00:38:44
    percentage of that because we're saving
  • 00:38:46
    you time like what in your mind is the
  • 00:38:50
    right way to base off of
  • 00:38:55
    pricing well we're trying to build a
  • 00:38:57
    product that Engineers love to use and
  • 00:38:59
    so we want it to be a toothbrush we want
  • 00:39:01
    it to be something that you reach for
  • 00:39:03
    instead of your observability platform
  • 00:39:05
    instead of going into the console so for
  • 00:39:07
    us usage is very important so we don't
  • 00:39:10
    want to have procurement stand in the
  • 00:39:11
    way necessarily but the reality is there
  • 00:39:14
    are costs and this is a business and we
  • 00:39:17
    want to add value and money is how you
  • 00:39:19
    show us that we're valuable so the
  • 00:39:22
    original idea with agents was that there
  • 00:39:24
    would be this augmentation of
  • 00:39:26
    engineering teams and that you could
  • 00:39:29
    charge some order of magnitude less but
  • 00:39:31
    a fraction of engineering headcount or
  • 00:39:34
    employee headcount by augmenting teams I
  • 00:39:37
    think the jurry is still out on that I
  • 00:39:38
    think most agent Builders today are
  • 00:39:41
    pricing to get into production
  • 00:39:45
    environments or into these systems that
  • 00:39:47
    they need to use to solve problems to
  • 00:39:50
    get close to their Persona and if you
  • 00:39:52
    look at what Devon did I think they also
  • 00:39:54
    started at 10K per year or some pring
  • 00:39:58
    and I think it's now like 500 a month
  • 00:40:00
    but it's mostly a consumption based
  • 00:40:02
    model so you get some committed amount
  • 00:40:05
    of compute hours that is effectively
  • 00:40:08
    giving you time um to use the product
  • 00:40:11
    for us we're also orienting around that
  • 00:40:14
    model so because we're not GA our
  • 00:40:16
    pricing is still a little bit like on
  • 00:40:18
    flux and working with our initial
  • 00:40:20
    customers to figure out like what do you
  • 00:40:21
    they think is reasonable what do they
  • 00:40:22
    think is fair but I think we're going to
  • 00:40:25
    land on something that's mostly similar
  • 00:40:27
    to the Devon model where it's usage
  • 00:40:29
    based we don't want Engineers to think
  • 00:40:32
    about okay there's an investigation it's
  • 00:40:34
    going to cost me X they should just be
  • 00:40:36
    able to just run it and just see this is
  • 00:40:38
    valuable or not and increase usage but
  • 00:40:40
    it will be something about like a tiered
  • 00:40:42
    amount of compute that you can do use so
  • 00:40:45
    maybe you get 5,000 investigations a
  • 00:40:48
    month or something in that
  • 00:40:50
    order okay nice yeah because that's what
  • 00:40:53
    instantly came to my mind was you want
  • 00:40:57
    folks to just reach for this and use it
  • 00:41:00
    as much as possible but if you are on a
  • 00:41:04
    usage based pricing then
  • 00:41:07
    inevitably you're going to hit that
  • 00:41:10
    friction where it's yeah I want to use
  • 00:41:12
    it but H it's going to cost me yeah yeah
  • 00:41:16
    so you do want to have a committed
  • 00:41:18
    amount set aside at the front and we're
  • 00:41:21
    also exploring like having a free tier
  • 00:41:23
    or like a free band maybe the first X is
  • 00:41:27
    just you can just kick the tires and dry
  • 00:41:29
    it out and as you get to higher limits
  • 00:41:31
    then you can set external attemps so we
  • 00:41:35
    haven't even talked about tool usage but
  • 00:41:37
    that's another piece that feels like it
  • 00:41:40
    is so complex because you're using tools
  • 00:41:45
    you're using a different you're using an
  • 00:41:47
    array of tools and how do you tap in to
  • 00:41:50
    each of these tools right because it's
  • 00:41:52
    if you're looking at logs or are you
  • 00:41:57
    syncing directly with the data dogs of
  • 00:42:00
    the world how do you see tool usage for
  • 00:42:05
    this and what have been some
  • 00:42:06
    specifically hard challenges to overcome
  • 00:42:08
    in that
  • 00:42:10
    Arena again this kind of goes back to
  • 00:42:12
    why this is so challenging and
  • 00:42:14
    especially one of the key things that
  • 00:42:16
    we've seen is Agents solve problems very
  • 00:42:18
    differently from humans but they need a
  • 00:42:20
    lot of the things humans need they need
  • 00:42:22
    the same tools if you're storing all of
  • 00:42:24
    your data and data dog we may not be
  • 00:42:26
    able to find all the information we need
  • 00:42:27
    to solve a problem by just looking at
  • 00:42:29
    your actual application running and your
  • 00:42:30
    cloud in front so we need to go to data
  • 00:42:32
    do so we need access there and so
  • 00:42:34
    engineering teams give us that access if
  • 00:42:37
    you've then constructed a bunch of
  • 00:42:39
    dashboards and metrics then and that's
  • 00:42:42
    how you've laid out your let say your
  • 00:42:44
    run books and your processes to debug
  • 00:42:46
    issues we need to do things like look at
  • 00:42:49
    multiple charts or graphs and infer
  • 00:42:52
    across those in the time ranges that an
  • 00:42:54
    issue happened what are the anomalies
  • 00:42:56
    that happened across multiple services
  • 00:42:58
    so if two of them are spiking and CP
  • 00:43:01
    inter related so we should look at the
  • 00:43:03
    relations between them but these are
  • 00:43:05
    extremely hard problems for L to solve
  • 00:43:08
    even Vision models they're not
  • 00:43:10
    purposefully both purposeful for that so
  • 00:43:14
    when it comes to Tool
  • 00:43:15
    usage llms are or Foundation models are
  • 00:43:19
    good at certain types of informations
  • 00:43:21
    especially semantic ones so code config
  • 00:43:24
    logs they're slightly less good good at
  • 00:43:28
    traces but also pretty decent but they
  • 00:43:31
    really suck at metrics they really suck
  • 00:43:33
    at time series so it's really dependent
  • 00:43:36
    on your observability stack how useful
  • 00:43:39
    it's going to be because for human we
  • 00:43:41
    just sit back and look at a bunch of
  • 00:43:42
    dashboards we can see like pattern
  • 00:43:44
    matching instantly you can see like
  • 00:43:46
    these are spikes but for an LM they see
  • 00:43:49
    something different so what we'll find
  • 00:43:51
    is over time these observability tools
  • 00:43:54
    at least will probably become less and
  • 00:43:56
    less
  • 00:43:57
    human Centric and may even become
  • 00:44:00
    redundant um you may see completely
  • 00:44:03
    different means of diing problems and I
  • 00:44:06
    think the honeycomb approach the trace
  • 00:44:08
    based approach with these high
  • 00:44:09
    cardinality events is probably the thing
  • 00:44:12
    that I put my money on is the dominant
  • 00:44:15
    pattern that IA see winning because can
  • 00:44:19
    you explain that real fast I don't know
  • 00:44:21
    what that is so so basically what they
  • 00:44:22
    do is or what charity majors and some of
  • 00:44:25
    these others have been promoting for
  • 00:44:26
    years is logging out traces but with
  • 00:44:31
    Rich events attached to these so you
  • 00:44:34
    basically can follow like a request
  • 00:44:35
    through your whole application stack and
  • 00:44:39
    um you can log out like a complete
  • 00:44:42
    object payload at multiple steps along
  • 00:44:44
    the way and store that in a system where
  • 00:44:46
    you can query all the information so
  • 00:44:48
    you've got the point and time you've got
  • 00:44:50
    the whole like tree of the trace as well
  • 00:44:53
    and then at each point you can see the
  • 00:44:55
    individual attributes and Fields
  • 00:44:57
    and so you get a lot more detail in that
  • 00:45:00
    versus if you look at a time series
  • 00:45:01
    you're basically seeing okay CPU is
  • 00:45:03
    going up CPU goes down and what can you
  • 00:45:06
    clean from that you basically have to
  • 00:45:08
    like it's like witchcraft trying to find
  • 00:45:11
    the root cause right but the dad dogs of
  • 00:45:14
    the money have been making a lot of
  • 00:45:16
    sorry the dad dogs of the world have
  • 00:45:17
    making a lot of money um selling
  • 00:45:19
    consumption and selling the Witchcraft
  • 00:45:21
    to Engineers for years and so there's a
  • 00:45:23
    real incentive to keep this this data
  • 00:45:25
    score going but I think agents become
  • 00:45:27
    more dominant we'll see them gravitate
  • 00:45:30
    to the most valuable sources of
  • 00:45:32
    information and then if you give your
  • 00:45:34
    agent more and more scope you'll see
  • 00:45:36
    death is rarely involved in these
  • 00:45:39
    causings so why why are we still paying
  • 00:45:41
    for them so I'm not sure what it's going
  • 00:45:43
    to look like in the next two or three
  • 00:45:44
    years but it's going to be interesting
  • 00:45:46
    how things play out as agents become the
  • 00:45:49
    go-to for diagnosing and solving
  • 00:45:52
    problems yeah I hadn't even thought
  • 00:45:54
    about that how for human usage like
  • 00:45:57
    maybe data dog is set up wonderfully
  • 00:46:00
    because we look at it and it gives us
  • 00:46:02
    everything we need and we can root cause
  • 00:46:05
    it very quickly by pattern matching but
  • 00:46:07
    if that turns out to be one of the
  • 00:46:09
    harder things for agents to do instead
  • 00:46:12
    of making an agent better at
  • 00:46:15
    understanding metrics maybe you just
  • 00:46:17
    give it different data and so that it
  • 00:46:21
    can root cause it without those metrics
  • 00:46:23
    and it will shift away from reading the
  • 00:46:27
    information from those
  • 00:46:29
    Services yeah if you look at like chess
  • 00:46:31
    and the AI and like the stock fishes of
  • 00:46:34
    the world that's not that's just one AI
  • 00:46:36
    That's like plays against grand grand
  • 00:46:39
    Masters mhm even the top players have
  • 00:46:42
    learned from the AI so they know that a
  • 00:46:45
    porn push on the side has been extremely
  • 00:46:49
    powerful or a rook lift has been very
  • 00:46:52
    powerful so now like the the top players
  • 00:46:54
    in the world adopt these techniques they
  • 00:46:56
    learn from the AI is but that's also
  • 00:46:58
    because it's always a human in the loop
  • 00:46:59
    we still want to see people playing
  • 00:47:01
    people but if you just leave it up to
  • 00:47:02
    the AI like the way they play the game
  • 00:47:04
    is completely different they see things
  • 00:47:05
    that we don't and I know I didn't sure
  • 00:47:08
    answer at the start fully but these
  • 00:47:11
    tools are grounding actions for us so
  • 00:47:13
    the observability stack is one of them
  • 00:47:14
    but ultimately we b a complete
  • 00:47:18
    abstraction to the production
  • 00:47:19
    environment so the agent uses these
  • 00:47:23
    tools and learns how to use these tools
  • 00:47:25
    and knows which tools are the most
  • 00:47:26
    effective
  • 00:47:27
    but we also build a a transferability
  • 00:47:31
    layer so you can shift the agent from
  • 00:47:33
    the real production environment into the
  • 00:47:35
    eval stack and it doesn't even know that
  • 00:47:37
    it's running in an eval stack it's not
  • 00:47:39
    suddenly just looking at like fake
  • 00:47:40
    surfaces fake kubernetes clusters fake
  • 00:47:43
    data dos fake scenarios or fake world so
  • 00:47:47
    these tools are an incredibly important
  • 00:47:49
    abstraction it's one of the key
  • 00:47:50
    obstructions that the agent needs and
  • 00:47:52
    honestly it's memory management and
  • 00:47:54
    tools are the two big things that agent
  • 00:47:57
    team should be focusing on I'd say right
  • 00:47:59
    now wait why do you switch it to this
  • 00:48:02
    fake
  • 00:48:03
    world because that's where you've got
  • 00:48:05
    full control that's where you can
  • 00:48:06
    introduce your own scenarios your own
  • 00:48:08
    chaos and stretch your agent but if you
  • 00:48:13
    do so in a way where the tools are
  • 00:48:15
    different the worlds are different
  • 00:48:16
    experience are different there's less
  • 00:48:18
    transferability when you then take it
  • 00:48:20
    into the production environment and
  • 00:48:21
    suddenly it's going to fall flat so you
  • 00:48:23
    want the like a a real simile of the
  • 00:48:27
    environment in your your tool or your
  • 00:48:29
    eval
  • 00:48:30
    bench and are you doing any type of
  • 00:48:34
    chaos engineering to just see how the
  • 00:48:36
    agents
  • 00:48:38
    perform yes that's pretty much where our
  • 00:48:40
    eval stack is it's chaos we produce a
  • 00:48:43
    world in which the reproduce chaos and
  • 00:48:45
    then we say given this problem what's up
  • 00:48:49
    what's the underlying cause and we see
  • 00:48:51
    how close we can get to the Dost string
  • 00:48:52
    cause yeah
  • 00:48:54
    perfect opportunity for an incredible
  • 00:48:58
    name like Lucifer this is the this is
  • 00:49:03
    this is the seventh layer of hell I
  • 00:49:05
    don't know something along those lines
  • 00:49:08
    yeah we've got some ideas on the blog
  • 00:49:11
    post that will have some more players in
  • 00:49:12
    this idea so tpd I think one thing to
  • 00:49:17
    noce
  • 00:49:18
    that this is a very deep space so if you
  • 00:49:22
    look at self-driving cars the lives are
  • 00:49:24
    on the line and so people care a lot and
  • 00:49:26
    you you have to hit a much higher bar
  • 00:49:28
    than a human driving a car it's very
  • 00:49:31
    similar in the space right like these
  • 00:49:32
    production environments are sacred they
  • 00:49:35
    are important to these companies right
  • 00:49:37
    they are if they go down or if there a
  • 00:49:38
    data breach or
  • 00:49:40
    anything that their business are on the
  • 00:49:42
    line CTS really care the bar that we
  • 00:49:44
    have to hit is very high and so we take
  • 00:49:47
    security very seriously but the whole
  • 00:49:50
    product that we're building requires a
  • 00:49:52
    lot of care and there's a lot of
  • 00:49:53
    complexity that goes into that so I
  • 00:49:56
    think it's extreme
  • 00:49:57
    compelling as an engineer to work in the
  • 00:49:59
    space because there's so many compelling
  • 00:50:01
    problems to solve like the knowledge
  • 00:50:02
    grock building the confidence scoring
  • 00:50:05
    how do you do evaluation like how do you
  • 00:50:07
    learn from these environments and build
  • 00:50:09
    them into your core product the toothing
  • 00:50:10
    layers the the chaos benches all these
  • 00:50:13
    things and how you do that in a reliable
  • 00:50:16
    repeatable way I think that's the other
  • 00:50:18
    big challenge is if you're an A or TCP
  • 00:50:21
    or using this stack or different stack
  • 00:50:23
    if you're going from e-commerce to
  • 00:50:24
    gaming to social media how generaliz
  • 00:50:27
    your agent can you just St it or can you
  • 00:50:30
    only solve one cl of problem and so
  • 00:50:32
    that's one of the things that we really
  • 00:50:34
    leaning into right now is the
  • 00:50:35
    repeatability of the product and scaling
  • 00:50:37
    this out to more and more
  • 00:50:39
    Enterprises but but yeah I'd say it's a
  • 00:50:42
    extremely complex problem to solve and
  • 00:50:45
    it even though we're valuable today true
  • 00:50:48
    resolution end to end resolution maybe
  • 00:50:51
    like multiple years just like the self
  • 00:50:53
    driving cars it took years to get to a
  • 00:50:54
    point where we've got Whos on the roads
  • 00:50:57
    yeah that's what I wanted to ask you
  • 00:50:58
    about was the true resolution and how
  • 00:51:02
    that like that just scares me to think
  • 00:51:06
    about first of all and I don't have
  • 00:51:08
    anything running in production let alone
  • 00:51:10
    a multi-million dollar system so I can
  • 00:51:13
    only imagine that you would encounter a
  • 00:51:16
    lot of resistance when you bring that up
  • 00:51:19
    to
  • 00:51:21
    Engineers
  • 00:51:23
    surprisingly no there's definitely the
  • 00:51:25
    hesitation but the hesitation is most
  • 00:51:26
    mostly based on uncertainty like what
  • 00:51:28
    exactly can you do and if you show them
  • 00:51:30
    like we literally can't change things we
  • 00:51:32
    don't have the access like you literally
  • 00:51:33
    like the API keys are read only or we
  • 00:51:36
    can strain to these environments and if
  • 00:51:39
    you introduce change through the
  • 00:51:40
    processes that they have already so pool
  • 00:51:42
    requests and there's guard rails in
  • 00:51:45
    place then they're very open to those
  • 00:51:47
    ideas I think a bit part of this is
  • 00:51:49
    really Engineers really hate in Fr and
  • 00:51:53
    support so they yearn for some something
  • 00:51:56
    that can help free them from that but
  • 00:51:58
    it's a progressive trust building
  • 00:52:00
    exercise we've spoken to quite a lot of
  • 00:52:03
    Enterprises and almost all of them have
  • 00:52:06
    different classes of sensitivity you
  • 00:52:09
    have your big fish customers for example
  • 00:52:11
    that you don't want to touch their
  • 00:52:13
    critical systems but then you've got
  • 00:52:15
    your internal airflow deployments and
  • 00:52:18
    your cicd your gitlab deployment if that
  • 00:52:21
    thing Falls over we can scale it up or
  • 00:52:24
    we could try make a change this zero
  • 00:52:26
    customer impact and so those are the
  • 00:52:28
    areas really helping teams today is on
  • 00:52:31
    the lower severity or low risk places
  • 00:52:33
    where we can make changes and when if
  • 00:52:36
    you if you're crushing crushing those
  • 00:52:38
    changes over time then Engineers will
  • 00:52:39
    introduce you to the more high value
  • 00:52:41
    places but but yes right now we're
  • 00:52:44
    steering clear of the critical systems
  • 00:52:46
    because we don't want to make a change
  • 00:52:48
    that is dangerous yeah and it just feels
  • 00:52:52
    like it's
  • 00:52:53
    too loaded so even if you are doing
  • 00:52:57
    everything right because it is so high
  • 00:53:01
    maintenance you're you don't want to
  • 00:53:03
    stick yourself in there just yet let the
  • 00:53:06
    engineers bring you in when they're
  • 00:53:07
    ready and when you feel like it's ready
  • 00:53:10
    I can see that for sure yeah also
  • 00:53:12
    behaviorally Engineers won't change
  • 00:53:14
    their you know the tools they reach for
  • 00:53:17
    the process is in a wartime scenario
  • 00:53:19
    when something is like relaxed
  • 00:53:21
    environment they're willing to try Ai
  • 00:53:23
    and experiment with that and adopt that
  • 00:53:26
    but if it's a critical situation they
  • 00:53:28
    they want to introduce an AI and add
  • 00:53:30
    more chaos into the mix right so they
  • 00:53:32
    want something that reduces the
  • 00:53:34
    uncertainty yeah that reminds me about
  • 00:53:37
    one of my major things that I notice
  • 00:53:40
    whenever I'm working
  • 00:53:42
    with agents or Building Systems that
  • 00:53:46
    involve
  • 00:53:48
    AI the prompts can
  • 00:53:52
    be the biggest Hang-Ups and the PRP
  • 00:53:56
    pumps for me sometimes feel
  • 00:53:59
    like I just need to do obviously I'm not
  • 00:54:02
    building a product that relies on agents
  • 00:54:05
    most of the time so I don't have the
  • 00:54:08
    drive to see it through but a lot of
  • 00:54:12
    times I will fiddle with prompts for so
  • 00:54:16
    long that I get angry because I feel
  • 00:54:19
    like I should just do the thing that I
  • 00:54:22
    am trying to
  • 00:54:24
    do and not get AI to do it I don't
  • 00:54:28
    really have an answer for you that's
  • 00:54:29
    just the nature of the
  • 00:54:31
    Beast yes exactly I do want to just
  • 00:54:34
    double click and say everybody has that
  • 00:54:36
    problem everybody struggles with that
  • 00:54:38
    you don't know if you're like one prom
  • 00:54:39
    change away or 20 and they're very good
  • 00:54:43
    at making it seem like you're getting
  • 00:54:44
    closer and closer but you may not be we
  • 00:54:47
    found success in building Frameworks
  • 00:54:51
    evaluations so that we can at
  • 00:54:53
    least extract it either from production
  • 00:54:56
    or EVS the samples the crowd truth that
  • 00:54:58
    makes us know or gives us confidence
  • 00:55:00
    we're getting to the answer otherwise
  • 00:55:03
    you just you can go forever right like
  • 00:55:05
    just tweaking things and never getting
  • 00:55:07
    there that's it and that's frustrating
  • 00:55:10
    because some yeah sometimes you take one
  • 00:55:12
    step forward and two steps back and
  • 00:55:14
    you're like oh my God it's quite hard
  • 00:55:16
    with content creation I think it's
  • 00:55:18
    harder in your space unless I have all
  • 00:55:21
    but stopped using it for Content
  • 00:55:23
    creation that's for sure like maybe to
  • 00:55:26
    to help me fill up a blank page and get
  • 00:55:29
    directionally correct but for the most
  • 00:55:31
    part yeah I don't like the way it writes
  • 00:55:34
    I don't really even if I prompt it to
  • 00:55:36
    the maximum it doesn't feel like it
  • 00:55:38
    gives me deep insights stopped that but
  • 00:55:42
    you're still on GPD 3.5 right
  • 00:55:46
    [Music]
Tags
  • AI
  • SRE
  • Knowledge Graphs
  • Cleric AI
  • Diagnostics
  • Root Cause Analysis
  • Confidence Scoring
  • Operational Complexity
  • Chaos Engineering
  • Memory Management