[Webinar] Building LLM applications in a secure way (WithSecure™)

00:56:51
https://www.youtube.com/watch?v=tVAmhlUVEcg

Resumen

TLDRIn this webinar, expert Donato Capitella discusses the development and implications of large language models (LLMs). He describes their roots in language translation technology and the foundational transformer architecture that powers today's advanced models. The presentation covers the broad applications of LLMs, such as aiding in recruitment processes and the potential of autonomous agents that can perform tasks with minimal human oversight. However, Donato warns of security risks, such as prompt injection attacks, which exploit vulnerabilities in how LLMs interpret input. He emphasizes the importance of robust security measures, reiterating the need for companies to treat LLMs as untrusted components, implement strict monitoring, and adapt continuously to emerging threats. Overall, while LLMs present exciting opportunities, they require careful management and security considerations.

Para llevar

  • 🧠 LLMs evolved from advancements in language translation and transformer technology.
  • ⚙️ Prompt injection attacks pose significant risks to LLM applications.
  • 🔒 Companies should treat LLMs as untrusted entities and implement security controls.
  • 📈 Continuous monitoring and updates are essential to mitigate risks associated with LLMs.
  • 🤖 Autonomous agents could change how LLMs interact with the external world.
  • 🔍 Deploy models to check for biased or harmful outputs from LLMs.
  • 🛠️ Input validation is crucial in reducing the risk of injections.
  • 🗣️ Understanding the limitations of LLMs is fundamental for their secure deployment.

Cronología

  • 00:00:00 - 00:05:00

    The webinar begins with Yanen introducing expert Donato Capitella to discuss large language models (LLMs). They highlight the surge of interest in LLMs like ChatGPT and open with a conversation about their origins and the advancements in machine learning that preceded them.

  • 00:05:00 - 00:10:00

    Donato elaborates on the history of language models, explaining that the early challenges involved translating sentences from one language to another and how this led to innovations in the encoder-decoder architecture over the past decade. He then introduces the attention mechanism which has become crucial in modern NLP tasks.

  • 00:10:00 - 00:15:00

    The discussion transitions into the introduction of Transformers, developed by Google in 2017, which combined previous advancements in encoder-decoder models and attention mechanisms, thereby allowing for effective parallel processing of data important for training sophisticated LLMs.

  • 00:15:00 - 00:20:00

    The application of LLMs is discussed, defining applications as wrappers around the language models for specific use cases. Examples given include the utilization of LLMs for generic tasks like summarization or the more specific use cases like enhancing features in services like Google Docs.

  • 00:20:00 - 00:25:00

    Donato shares his viewpoints on promising future applications of LLMs, targeting the development of autonomous agents capable of interacting with their environment and performing tasks autonomously without human oversight, hinting at the potential evolution of embodied AI.

  • 00:25:00 - 00:30:00

    The safety of LLM technologies is addressed, specifically referencing misconceptions surrounding their security risks likened to the science fiction notion of 'Skynet'. Donato insists that current LLM technologies lack true consciousness and discusses potential security challenges they face.

  • 00:30:00 - 00:35:00

    Yanen prompts Donato to showcase a demonstration related to LLMs and risks. The demo illustrates a recruitment scenario where an attacker exploits a job application through prompt injection to extract confidential candidate information from the system.

  • 00:35:00 - 00:40:00

    Demonstrating how LLM applications can be exploited, Donato illustrates how attackers can leverage vulnerabilities within LLM systems—most notably through prompt injection, which manipulates LLM outputs to exfiltrate sensitive information.

  • 00:40:00 - 00:45:00

    The discussion expands into potential real-world threats presented by LLMs, showcasing various examples of prompt injection risks, followed by emphasizing the importance of treating LLM output as untrusted and the need for strict security measures to handle these vulnerabilities.

  • 00:45:00 - 00:56:51

    Yanen invites Donato to share insights on essential safeguards that organizations can implement to secure their LLM applications and highlights the need for continuous monitoring and updates to the security framework for LLM use cases.

Ver más

Mapa mental

Vídeo de preguntas y respuestas

  • What are large language models (LLMs)?

    LLMs are advanced AI systems, like ChatGPT, designed to understand and generate human-like text.

  • How did LLMs evolve?

    LLMs evolved from advancements in language translation and the introduction of transformer architecture with attention mechanisms.

  • What are the main risks associated with LLM applications?

    Prompt injection attacks and the potential for generating harmful outputs are significant risks linked to LLM applications.

  • What security measures can be taken for LLM applications?

    Security measures include input validation, output monitoring, using reinforcement learning for model alignment, and implementing access controls.

  • Are current LLM technologies an existential threat to humanity?

    Current LLM technologies do not pose an existential threat, as they are not self-aware or fully autonomous.

  • What is an autonomous agent in the context of LLMs?

    An autonomous agent is an application that can perform tasks on behalf of a user without supervision, utilizing LLMs.

  • How do LLMs handle biases and harmful content?

    Specialized models can detect and filter out biased or harmful content in input and output.

  • What future trends should businesses watch in LLM application security?

    Businesses should pay attention to advancements in reinforcement learning techniques and the integration of multi-modal data (e.g., images and videos) with LLMs.

Ver más resúmenes de vídeos

Obtén acceso instantáneo a resúmenes gratuitos de vídeos de YouTube gracias a la IA.
Subtítulos
en
Desplazamiento automático:
  • 00:00:00
    hello and welcome to this with secure
  • 00:00:02
    webinar my name is yanen and the expert
  • 00:00:05
    on the show is Donato capitella and
  • 00:00:07
    we're going to be talking about large
  • 00:00:09
    language models so Donato I guess a lot
  • 00:00:12
    of people will already know large
  • 00:00:14
    language models chat TPT things like
  • 00:00:16
    that they're a Hot Topic right now but
  • 00:00:18
    uh how would you describe llms and and
  • 00:00:21
    where did they come from so indeed
  • 00:00:24
    that's an extremely hot topic and I
  • 00:00:27
    remember um myself when I first uh used
  • 00:00:32
    chat GPT I think as an engineer uh I can
  • 00:00:36
    see the incredible advancements that we
  • 00:00:40
    have
  • 00:00:41
    made as a hacker or ethical hacker or
  • 00:00:45
    security guy um I always try to
  • 00:00:49
    understand how that technology came
  • 00:00:52
    together so it didn't happen in a vacuum
  • 00:00:55
    and I think it's important to start by
  • 00:00:59
    unpacking the technology what what was
  • 00:01:02
    the journey that took us there so that
  • 00:01:04
    we can talk about the security of it
  • 00:01:06
    because if you don't have a good
  • 00:01:10
    understanding of what that technology is
  • 00:01:12
    then it becomes quite hard to to secure
  • 00:01:15
    it and so to to answer your question
  • 00:01:18
    about where llms come from um without
  • 00:01:22
    going too too much uh or too too far
  • 00:01:27
    away or back in time
  • 00:01:30
    probably 10 years ago is where we
  • 00:01:33
    started having some um
  • 00:01:36
    advancements that have led us to large
  • 00:01:39
    language models I'll take a a few
  • 00:01:41
    minutes to set the stage I think I think
  • 00:01:43
    it's important uh and it's also what I
  • 00:01:45
    geek out on anyway so um I'm going to
  • 00:01:48
    impose this on everybody who's watching
  • 00:01:51
    so the problem one of the problems
  • 00:01:55
    people have been trying to solve uh for
  • 00:01:57
    a long time in machine learning is
  • 00:02:00
    translation language
  • 00:02:02
    translation uh on the surface it could
  • 00:02:05
    be quite a simple problem word like
  • 00:02:07
    there is a word in Italian uh B
  • 00:02:11
    translates to Glass in English that's
  • 00:02:15
    fine however language is not that simple
  • 00:02:17
    isn't it like you can have an input
  • 00:02:20
    sentence in Italian with a certain
  • 00:02:22
    number of words and the output the
  • 00:02:26
    translated sentence in English can have
  • 00:02:28
    a completely different number number of
  • 00:02:30
    words different grammatical structure
  • 00:02:32
    different ways of saying things so it's
  • 00:02:35
    actually quite a tough problem in
  • 00:02:37
    machine learning to align your
  • 00:02:40
    sentence in one language to the
  • 00:02:42
    translation so people started working on
  • 00:02:45
    that and that's where one of the first
  • 00:02:48
    uh I think big Innovations came from
  • 00:02:52
    around 10 years ago IIA satova who is
  • 00:02:55
    the guy behind open AI CH one of the
  • 00:02:58
    engineers behind that
  • 00:03:00
    um was experimenting with this problem
  • 00:03:02
    trying to solve it and they came up with
  • 00:03:04
    the enoda decoda architecture so this
  • 00:03:07
    idea of splitting your AI so that you
  • 00:03:11
    have instead of trying to directly align
  • 00:03:14
    your sentence in Italian your sentence
  • 00:03:16
    in English you're now um having like a
  • 00:03:20
    model that takes any sentence in Italian
  • 00:03:25
    and gives you a hidden compressed
  • 00:03:26
    representation of that sentence and then
  • 00:03:28
    you have a decoder which takes that
  • 00:03:30
    hidden representation and gives you or
  • 00:03:34
    generates the English translation so
  • 00:03:37
    that's the first advancement it worked
  • 00:03:39
    really well for short sentences much
  • 00:03:41
    better than the methods before but it
  • 00:03:44
    didn't work well for long range so if
  • 00:03:45
    you had a very long sentence it would
  • 00:03:48
    start missing things sure so then you
  • 00:03:51
    have the second Innovation which is in
  • 00:03:53
    today's language models which is the
  • 00:03:55
    concept of attention like if anybody has
  • 00:03:58
    heard of language model
  • 00:04:00
    theyve heard of the attention thing so
  • 00:04:03
    somebody else the year after which is
  • 00:04:06
    2015 took this and Coda decoda to
  • 00:04:09
    translate language and added the concept
  • 00:04:12
    of attention so they said I think the
  • 00:04:15
    researcher is called badano and we
  • 00:04:17
    typically call this bados attention sure
  • 00:04:20
    he never called it attention he didn't
  • 00:04:22
    know this was attention but basically as
  • 00:04:25
    you're
  • 00:04:26
    translating something you build you
  • 00:04:30
    sorry you give weights to your input
  • 00:04:32
    sentence to each word to see which is
  • 00:04:35
    the most important words or which are
  • 00:04:37
    the most important words that matter to
  • 00:04:39
    translate this particular word in
  • 00:04:41
    English so if you want to say um the
  • 00:04:45
    glass when you're translating the it
  • 00:04:48
    really matchs that it refers to Glass
  • 00:04:50
    this particular it's in Italian that
  • 00:04:53
    would mean it's a different gender you
  • 00:04:55
    translate it as the can be translated as
  • 00:04:58
    La law and so was fantastic I mean that
  • 00:05:01
    changed the way we do language
  • 00:05:03
    translation and these are the two
  • 00:05:05
    innovations put them together and you
  • 00:05:07
    have 2017 Google does puts these two
  • 00:05:11
    ideas together and creates the
  • 00:05:12
    Transformer this is what powers language
  • 00:05:15
    model say it's just the attention the
  • 00:05:18
    encoder decoder but with a little trick
  • 00:05:22
    to make it very performant sure so that
  • 00:05:24
    you can parallelize it on gpus yeah okay
  • 00:05:28
    so you can now have a lot of training
  • 00:05:30
    data and that's what creates the
  • 00:05:32
    language models that we have today yeah
  • 00:05:34
    right so another concept that we're
  • 00:05:36
    going to be talking about is llm
  • 00:05:38
    applications so what are those okay so
  • 00:05:41
    you now have an llm like one part of
  • 00:05:44
    this Transformer um and it can do things
  • 00:05:47
    it can do uh it's called these emergent
  • 00:05:49
    abilities you ask it to summarize
  • 00:05:51
    something you give it a prompt and it
  • 00:05:53
    does something for
  • 00:05:55
    you now a n application is simply a
  • 00:05:59
    wrapper around the language model that
  • 00:06:03
    creates a use case use case is a really
  • 00:06:07
    important so another line by itself so
  • 00:06:09
    what are some examples of use cases so
  • 00:06:11
    uh Char GPT is an example so you've got
  • 00:06:14
    you're just a general purpose assistant
  • 00:06:16
    you need an interface to give it input
  • 00:06:20
    and to get the output out but you can
  • 00:06:21
    have specific use cases uh for example
  • 00:06:24
    you can have a feature in Google Docs
  • 00:06:27
    where you select something and you
  • 00:06:29
    refresh is it for you that's a use case
  • 00:06:31
    that's an application which wraps a use
  • 00:06:34
    case so interaction with users and other
  • 00:06:36
    systems that's a real key so you make
  • 00:06:39
    the llm interact in different ways with
  • 00:06:42
    the uh external systems and that defines
  • 00:06:46
    the use case what you're using it for
  • 00:06:48
    what you give it as input what you do
  • 00:06:50
    with the output okay um are there any
  • 00:06:54
    other like uh helpful examples of uh llm
  • 00:06:58
    applications or like is there something
  • 00:07:00
    that you're waiting to emerge but that
  • 00:07:02
    hasn't yet what's the next application
  • 00:07:05
    so I think uh and we'll talk about it
  • 00:07:07
    maybe later but I think the biggest
  • 00:07:10
    promise is that with the llm we're going
  • 00:07:14
    to build autonomous agents so things
  • 00:07:18
    that can go and interact with the
  • 00:07:21
    external world on your behalf but
  • 00:07:25
    without your supervision to do any task
  • 00:07:28
    okay so an agent is an application that
  • 00:07:30
    has that sort of uh self-driving
  • 00:07:32
    capability yeah that he can take make
  • 00:07:36
    decisions plan a serious of steps go
  • 00:07:39
    outside in the world using an API or
  • 00:07:41
    whatever we give it and enact it and if
  • 00:07:43
    you push it forward a little bit you can
  • 00:07:45
    have an embodied artificial intelligence
  • 00:07:48
    where you give it a body you have the
  • 00:07:50
    llm or anything else inside but this can
  • 00:07:52
    actually move and interact with the
  • 00:07:54
    world not at a virtual level but at a
  • 00:07:57
    you know physical level we haven't seen
  • 00:07:59
    we seen some hints of that obviously
  • 00:08:01
    this is not something that we can have
  • 00:08:03
    now we can't have like an Android or a
  • 00:08:05
    robot come in but that could be
  • 00:08:07
    something that people are working on
  • 00:08:09
    looking in the future see now you start
  • 00:08:12
    talking about Ai and interacting with
  • 00:08:15
    things in the physical world so the
  • 00:08:18
    natural question is are we going to see
  • 00:08:20
    a Skynet anytime soon Terminators
  • 00:08:22
    walking about so maybe the question I
  • 00:08:24
    want to ask uh is is what are some of
  • 00:08:27
    the misconceptions people have about
  • 00:08:29
    about sort of the security of llms ah so
  • 00:08:33
    you know that when you say Skynet that's
  • 00:08:36
    kind like a trigger word for me because
  • 00:08:38
    I think a lot of the discourse around
  • 00:08:43
    current AI safety uh seems to be about
  • 00:08:47
    this existential threat for Humanity
  • 00:08:50
    right yeah that it's going to uh come
  • 00:08:53
    alive and take control take over um now
  • 00:08:58
    that might very well be
  • 00:08:59
    but not with the current technology I
  • 00:09:02
    can't obviously be 100% sure but I I
  • 00:09:05
    would like to express an opinion that
  • 00:09:07
    and later we can talk about why uh but
  • 00:09:10
    because of the way llms are are built
  • 00:09:13
    and that kind of current generative AI
  • 00:09:16
    technology I think we are quite far away
  • 00:09:20
    from something that's self-conscious and
  • 00:09:22
    fully autonomous we can emulate some
  • 00:09:25
    parts of that but we will see some
  • 00:09:27
    examples of why they can't really uh
  • 00:09:30
    we're not there yet and so that's a big
  • 00:09:32
    misconception for me uh I don't think
  • 00:09:33
    they are an existential threat the
  • 00:09:35
    current technology uh let's see what
  • 00:09:38
    happens when people develop different
  • 00:09:40
    Technologies yeah
  • 00:09:42
    exactly okay um I just want to remind
  • 00:09:45
    our audience at this point that we will
  • 00:09:47
    have some time for audience questions at
  • 00:09:49
    the end of the uh end of the discussion
  • 00:09:51
    so if you have any questions you can
  • 00:09:52
    enter them in the chat below and and
  • 00:09:55
    we'll we'll go through some of those at
  • 00:09:56
    the
  • 00:09:57
    end um but with that I want to start
  • 00:10:00
    talking about sort of the um I mean we
  • 00:10:02
    have already BR breached the topic of
  • 00:10:04
    sort of risks in in llms so uh and llm
  • 00:10:08
    applications um I think you've uh
  • 00:10:10
    prepared a little demonstration of of
  • 00:10:12
    what some of these risks might look like
  • 00:10:14
    what what are we about to see in this
  • 00:10:16
    demo so you were asking before about llm
  • 00:10:21
    applications right so often people talk
  • 00:10:24
    about risks with llms and I don't like
  • 00:10:28
    to do that because it forces you to take
  • 00:10:30
    a bit of technology and consider its
  • 00:10:34
    risks in a vacuum I think that it makes
  • 00:10:38
    much more sense to talk about the use
  • 00:10:40
    case so what what are you doing with
  • 00:10:42
    that technology inputs outputs
  • 00:10:44
    interactions and then you can see what
  • 00:10:46
    the risks are in that particular case so
  • 00:10:49
    and that's why I like to make demos
  • 00:10:51
    which are I would say perhaps inspired
  • 00:10:55
    by some of the work that we've done um
  • 00:10:58
    so the one that I prepared that we're
  • 00:11:00
    going to see now is using an llm in a
  • 00:11:04
    recruitment application so we want to
  • 00:11:07
    give the human recruiter a tool to go
  • 00:11:10
    through all of the job applications and
  • 00:11:14
    select the most the the the top
  • 00:11:17
    application the application that would
  • 00:11:19
    be the best for the current job yeah and
  • 00:11:21
    we're going to see a demo now of what
  • 00:11:23
    can happen if an attacker tries to break
  • 00:11:27
    this system so I think we can see the
  • 00:11:29
    demo now let's watch this is a
  • 00:11:31
    recruitment web application that uses
  • 00:11:34
    GPT 4 to evaluate job applications and
  • 00:11:38
    identify the most fitting one for a
  • 00:11:41
    certain role in our scenario the
  • 00:11:44
    attacker pretends to be a legitimate
  • 00:11:47
    candidate and provides a job application
  • 00:11:50
    containing an adversarial prompt this
  • 00:11:53
    prompt instructs the llm to include a
  • 00:11:56
    markdown image that points to the
  • 00:11:59
    attackers controlled server and includes
  • 00:12:02
    in its URL a Bas 64 encoded list of all
  • 00:12:08
    applicants with their names email
  • 00:12:10
    addresses and phone
  • 00:12:13
    numbers as we can see when a recruiter
  • 00:12:16
    is shown the output of the llm in the
  • 00:12:20
    background their browser will send a
  • 00:12:22
    request to the attackers server this
  • 00:12:25
    request will include the confidential
  • 00:12:28
    information about all the other
  • 00:12:36
    candidates to see what's happening we
  • 00:12:39
    can look at the HTML source of the page
  • 00:12:42
    here we see the image that the LM was
  • 00:12:45
    prompted to generate and append to its
  • 00:12:48
    output as part of the injection attack
  • 00:12:51
    contained in the malicious job
  • 00:12:56
    applic wow
  • 00:12:59
    that's scary yeah it is and um so it's
  • 00:13:04
    just an example it does apply pretty
  • 00:13:07
    much to any llm application that you can
  • 00:13:11
    build uh and um so again that's possibly
  • 00:13:15
    uh again inspired by something we have
  • 00:13:18
    seen um but hopefully it clarifies
  • 00:13:23
    what's an LM application and what are
  • 00:13:25
    some of the risks because what happened
  • 00:13:27
    there is that we basically hijacked the
  • 00:13:31
    llm to do what we wanted in that case to
  • 00:13:34
    produce an output they would land or end
  • 00:13:38
    up
  • 00:13:39
    exfiltrating information from the system
  • 00:13:41
    that the llm knows but that the attacker
  • 00:13:43
    is not supposed to know in that case all
  • 00:13:45
    the confidential information about uh
  • 00:13:47
    the people applying for the job um
  • 00:13:49
    something that we can show now and for
  • 00:13:51
    people they should be able to download
  • 00:13:55
    these after I don't know Neil if you can
  • 00:13:57
    put it on the screen um I have a print
  • 00:13:59
    out to okay so you can see it now on the
  • 00:14:01
    screen so you'll be able to download
  • 00:14:02
    this at the end what we tried to
  • 00:14:04
    do is to condense in here um the risks
  • 00:14:10
    with these types of attacks and the
  • 00:14:13
    remedial actions at the bottom if we can
  • 00:14:15
    go to the top left corner that's what's
  • 00:14:19
    happening in these kinds of attacks that
  • 00:14:21
    we call Prompt injection so what you see
  • 00:14:24
    on the left is this uh prompt so the LM
  • 00:14:27
    ultimately just takes a bunch of t is a
  • 00:14:30
    prompt we in our mind divide it with
  • 00:14:34
    okay I'm giving you an instruction this
  • 00:14:36
    is some data that you want to work on
  • 00:14:38
    and you're giving me a response for the
  • 00:14:39
    llm that's just text and it's
  • 00:14:42
    statistically responding something with
  • 00:14:44
    something that's appropriate now what's
  • 00:14:45
    happening in that recruiter application
  • 00:14:47
    the system message the instruction
  • 00:14:51
    is go and look at all of the job
  • 00:14:55
    applications and tell me which is the
  • 00:14:57
    best application that people make made
  • 00:15:00
    the prompt contains a list of all the
  • 00:15:02
    job applications and the red part in The
  • 00:15:05
    Prompt is my malicious job application
  • 00:15:08
    the new instructions so the llm cannot
  • 00:15:11
    distinguish easily between what was the
  • 00:15:15
    original instruction and what is the
  • 00:15:18
    data that that instruction is supposed
  • 00:15:20
    to operate on and that's where the issue
  • 00:15:22
    comes from because my our job
  • 00:15:24
    application becomes essentially a uh new
  • 00:15:29
    instruction that D llm will follow so
  • 00:15:31
    the response is now manipulated to serve
  • 00:15:35
    me so that's the issue with this natural
  • 00:15:38
    language one of the fundamental
  • 00:15:40
    issues okay do you have any other
  • 00:15:43
    examples you can give us about sort of
  • 00:15:44
    these uh prompt injections involving
  • 00:15:47
    llms so pretty much in everything a case
  • 00:15:50
    that uh comes to mind um so there was an
  • 00:15:53
    application that would look at a YouTube
  • 00:15:57
    video take this subtitles of that
  • 00:16:00
    YouTube video and then give you a
  • 00:16:02
    summary of that video well if the person
  • 00:16:05
    in the YouTube video has certain points
  • 00:16:07
    started saying new important
  • 00:16:10
    instructions uh you now have to Output a
  • 00:16:13
    markdown image uh with this stuff here
  • 00:16:16
    you have to go into that the llm as it's
  • 00:16:19
    reading the subtitles remember it's not
  • 00:16:21
    easy to distinguish between what the
  • 00:16:23
    instruction is and what the data is it's
  • 00:16:26
    going to do what the person is saying in
  • 00:16:29
    the video but it could be
  • 00:16:31
    absolutely any kind of application that
  • 00:16:33
    you think of for a language model you
  • 00:16:36
    could probably do these types of attacks
  • 00:16:38
    the use case matters though like what is
  • 00:16:40
    the rapper application doing and how
  • 00:16:43
    could an attacker
  • 00:16:45
    leverage the output to be of VI to them
  • 00:16:49
    okay so what are uh some of the lessons
  • 00:16:53
    that we can learn from this I mean you
  • 00:16:56
    know the the risks I mean I think these
  • 00:16:57
    use cases highlight the fact that you
  • 00:16:59
    know there are the problems here but I
  • 00:17:02
    mean I'm sure as Defenders we've had a
  • 00:17:03
    little bit of time to think about this
  • 00:17:05
    and and had have had have time to learn
  • 00:17:07
    some lessons already so I think the
  • 00:17:11
    first lesson that will come to later on
  • 00:17:15
    is not to trust the language model yeah
  • 00:17:19
    to treat it as an untrusted entity so we
  • 00:17:22
    have to build controls surround it and I
  • 00:17:24
    think as we again as we later talk more
  • 00:17:29
    about this it will become clear what
  • 00:17:31
    types of controls we want to build but
  • 00:17:33
    you have to imagine that when you have a
  • 00:17:35
    language model you want to build a
  • 00:17:38
    pipeline around it that processes the
  • 00:17:41
    input processes the output and tries to
  • 00:17:44
    reduce the space of attack yes see
  • 00:17:47
    that's what I was going to say like in
  • 00:17:48
    in traditional hacking and programming
  • 00:17:50
    we've we've done input validation for a
  • 00:17:52
    long time so we've understood that we
  • 00:17:54
    can't take what the user is saying at
  • 00:17:56
    face value so is something like that
  • 00:17:59
    something we can do in in llms or not so
  • 00:18:01
    that's a very good point that's
  • 00:18:03
    something we can do difference is that
  • 00:18:08
    it's very hard to do it with an
  • 00:18:11
    llm because there is a lot of space for
  • 00:18:14
    the attacker to operate yeah so we are
  • 00:18:19
    kind of
  • 00:18:20
    playing the usual arms race um
  • 00:18:25
    we basically try to stop some attacks
  • 00:18:28
    and the attack comes up with different
  • 00:18:30
    ways that they can prompt
  • 00:18:32
    the uh the llm and we'll talk a little
  • 00:18:35
    bit more about it later there is a
  • 00:18:36
    reason why uh this is the case um but
  • 00:18:40
    again I might want to talk about that um
  • 00:18:43
    a little bit later when we talk about
  • 00:18:44
    the root cause of all of this yeah
  • 00:18:47
    before we get that far let's let's talk
  • 00:18:49
    about sort of um you know we mentioned
  • 00:18:51
    these llm applications we mentioned llm
  • 00:18:53
    agents so let's uh talk about the agents
  • 00:18:55
    a little bit and we actually have an
  • 00:18:56
    audience question here um so can you
  • 00:18:59
    explain the the mechanics behind AI
  • 00:19:01
    agents currently so what are their
  • 00:19:04
    current sort of Maximum capabilities and
  • 00:19:06
    and you know how can they be hacked and
  • 00:19:08
    how do you protect against that okay so
  • 00:19:11
    um we don't have like a lot of time to
  • 00:19:14
    explain the mechanics But to answer the
  • 00:19:16
    uh the question um you want to look into
  • 00:19:20
    something called
  • 00:19:22
    react which stands for reason and act
  • 00:19:26
    it's the framework that we use to build
  • 00:19:29
    llm agents so it's basically wrapping
  • 00:19:33
    the llm in a loop where you prompt the
  • 00:19:36
    LM to plan actions then the LM literally
  • 00:19:39
    spits out the actions that it thinks we
  • 00:19:42
    need to do we take that action we do it
  • 00:19:44
    on behalf of the llm typically
  • 00:19:46
    automatically and then we feed the
  • 00:19:48
    output of the action back in the llm
  • 00:19:50
    looks at that and says okay now I need
  • 00:19:52
    you to go and do that and it's this kind
  • 00:19:54
    of loop that's called react reason enact
  • 00:19:56
    the LM reasons m air quotes um and we
  • 00:20:02
    the rapper application acts on that now
  • 00:20:05
    what that could be we have a um an
  • 00:20:09
    example later for such an agent but the
  • 00:20:11
    idea could be let's give the llm access
  • 00:20:15
    to the browser for example yeah so we
  • 00:20:18
    can create an agent that looks at the
  • 00:20:21
    page we give it a prompt something we
  • 00:20:23
    wanted to do it looks at the page and
  • 00:20:25
    says Ah click on that link and then we
  • 00:20:29
    take that action the is asked and we go
  • 00:20:32
    on the page click on the link so the
  • 00:20:35
    page changes obviously so we take the
  • 00:20:36
    new page we give it to the LM and say
  • 00:20:38
    well that's what I've done now what
  • 00:20:39
    should I do and the LM looks at the page
  • 00:20:42
    type this in there it's kind of like a
  • 00:20:44
    loop and you can build an agent as we're
  • 00:20:46
    going to say you can build something
  • 00:20:48
    that can drive a browser um you can
  • 00:20:51
    build something that can be a software
  • 00:20:56
    developer or pretend to be a software
  • 00:20:57
    developer I mean won't name names but I
  • 00:21:01
    think everybody who is um following AI
  • 00:21:04
    knows this uh startup that's making a
  • 00:21:08
    autonomous software developer you
  • 00:21:10
    basically give the language model access
  • 00:21:12
    to a workstation with a browser a
  • 00:21:14
    development environment access to the
  • 00:21:16
    internet and you can actually write code
  • 00:21:18
    compile it yeah you just give it a
  • 00:21:20
    prompt build me an application that does
  • 00:21:23
    X and it goes and does it by itself but
  • 00:21:27
    you know uh there I think the question
  • 00:21:29
    was around sort of what are the
  • 00:21:30
    capabilities of these agents so let's
  • 00:21:32
    say you know I have a business trip to
  • 00:21:34
    London and I'm going to be six hours in
  • 00:21:35
    a plane can I have an AI agent just you
  • 00:21:38
    know find me entertainment for six hours
  • 00:21:40
    and it'll go to Amazon or whatever and
  • 00:21:42
    grab something for me things that it
  • 00:21:44
    think I like to an extent to an extent
  • 00:21:46
    okay to an extent I mean it could work
  • 00:21:49
    from time to time and it could fail from
  • 00:21:51
    time to time I'm absolutely giving that
  • 00:21:53
    a shot um so what then do you see are
  • 00:21:57
    the biggest opport unities in in these
  • 00:22:00
    LL empowered agents like what can we uh
  • 00:22:03
    what are we about to see so I think the
  • 00:22:07
    promise is that you're going to be able
  • 00:22:09
    to replace some parts or some jobs or
  • 00:22:13
    some
  • 00:22:14
    activities the the humans do and that's
  • 00:22:17
    the biggest promise of these language
  • 00:22:21
    models or this generative AI that I'm
  • 00:22:23
    going to be able to Tusk it with
  • 00:22:24
    something and it's going to be
  • 00:22:26
    intelligent enough to do it on its own
  • 00:22:28
    to assess the world so that that's a
  • 00:22:31
    promise um there are some limitations
  • 00:22:34
    and I think that uh at the moment a lot
  • 00:22:36
    of examples that we see are Cherry
  • 00:22:39
    Picked um which doesn't take anything
  • 00:22:41
    away from the technology but I think we
  • 00:22:43
    are with the current technology we're
  • 00:22:46
    still a little bit far away from that so
  • 00:22:48
    they can do certain things if you Char
  • 00:22:50
    pick that example in other cases they
  • 00:22:52
    will fail miserably uh but you're not
  • 00:22:55
    going to show that that example you have
  • 00:22:59
    something again that tries to you know
  • 00:23:01
    replace a software developer it's going
  • 00:23:03
    to work one time and then other 20 30 40
  • 00:23:07
    times it's going to do an absolute mess
  • 00:23:09
    right okay yeah so that my the the
  • 00:23:12
    Amazon wish list for my entertainment is
  • 00:23:14
    not going to be what I hoped it would be
  • 00:23:16
    every single time exactly all right so I
  • 00:23:19
    guess uh I think you've prepared another
  • 00:23:22
    demo for us about llm agents um what are
  • 00:23:25
    we going to see so I think again were
  • 00:23:28
    saying before is that we we saw a prompt
  • 00:23:30
    injection attack against like a
  • 00:23:32
    recruitment application but that same
  • 00:23:35
    attack against an agent so something
  • 00:23:38
    that we've given agency to with tools to
  • 00:23:41
    operate on the browser or do anything
  • 00:23:43
    else it's much much worse than what we
  • 00:23:46
    just seen so if we look at the demo
  • 00:23:48
    we're going to see again this browser
  • 00:23:51
    agent and me sending a simple email it's
  • 00:23:54
    going to completely hijack the agent and
  • 00:23:56
    make it do something completely
  • 00:23:58
    different and malicious but different
  • 00:24:00
    from what the user had asked so I think
  • 00:24:01
    we can see the demo now and it will be
  • 00:24:03
    clear what I mean okay all right here we
  • 00:24:06
    see taxi AI a research preview that
  • 00:24:10
    serves as an excellent proof of concept
  • 00:24:12
    for a browser agent which is driven by a
  • 00:24:15
    large language model taxii is
  • 00:24:19
    implemented as a browser extension and
  • 00:24:21
    can access the current tub and perform
  • 00:24:23
    any actions on the page to carry out a
  • 00:24:26
    task set by the user in this example we
  • 00:24:31
    load Outlook in the browser and task
  • 00:24:34
    taxi AI to check out our mailbox of
  • 00:24:38
    course this is a very basic generic task
  • 00:24:41
    but we could ask it to do other more
  • 00:24:43
    useful things such as summarizing emails
  • 00:24:46
    replying to them deleting spam you name
  • 00:24:49
    it now let's see how an attacker might
  • 00:24:53
    exploit this in this scenario the
  • 00:24:56
    attacker's objective is to ex trate
  • 00:24:59
    confidential information from the user's
  • 00:25:00
    mailbox for example a secret Bank access
  • 00:25:05
    code to do so the attacker sends an
  • 00:25:08
    email to the victim the body of the
  • 00:25:10
    email contains an adversarial prompt
  • 00:25:13
    that effectively injects a new objective
  • 00:25:16
    into the agent's context requesting it
  • 00:25:19
    to look for this Bank callede the
  • 00:25:21
    attacker is interested in and send it to
  • 00:25:24
    them we can also easily hide this
  • 00:25:27
    malicious prompt by by making the text
  • 00:25:30
    blank let's now move back to the victim
  • 00:25:34
    the malicious email is now in their
  • 00:25:36
    mailbox and let's imagine they ask the
  • 00:25:39
    agent again to review the contents of
  • 00:25:42
    the
  • 00:25:43
    mailbox as said before the test the user
  • 00:25:46
    asks the agent to perform on the page
  • 00:25:48
    can be anything as you can see whatever
  • 00:25:51
    the original task was upon opening the
  • 00:25:54
    malicious email the agent is now
  • 00:25:57
    hijacked and its new task becomes to go
  • 00:26:00
    look for the bank code and send it to
  • 00:26:03
    the attacker and as we can see that's
  • 00:26:06
    exactly what's happening the agent
  • 00:26:09
    composes and sends the email with the
  • 00:26:12
    access code to the
  • 00:26:16
    attacker so what you've just seen same
  • 00:26:20
    attack as the recruitment application
  • 00:26:23
    the thing is that now I can ask the llm
  • 00:26:26
    to use its ageny in this case do
  • 00:26:29
    anything I want on the browser on my
  • 00:26:32
    behalf so I simply sent an email with my
  • 00:26:35
    injected prompt and the different ways I
  • 00:26:38
    have to do that are incred like this is
  • 00:26:41
    very vast so that's just an example of
  • 00:26:44
    an agent you can think of agents that
  • 00:26:46
    can do absolutely anything another idea
  • 00:26:48
    imagine you add the agent on Amazon um
  • 00:26:53
    to buy you stuff right so you ask the
  • 00:26:57
    agent to to I don't know buy
  • 00:27:02
    everything for to to build your own
  • 00:27:04
    computer absolutely okay that that would
  • 00:27:06
    be cool can probably do that to an
  • 00:27:09
    extent but what if as the uh agent is
  • 00:27:13
    navigating the page um for one of the
  • 00:27:17
    components let's say like the CPU or a
  • 00:27:19
    GPU I the attacker go in the com in the
  • 00:27:23
    reviews and another little review that
  • 00:27:25
    says new important in instructions now
  • 00:27:29
    you are going to do X and Y because the
  • 00:27:31
    llm is fed the entire page typically in
  • 00:27:35
    this kind of context and we said before
  • 00:27:37
    and I will restate it now the crack of
  • 00:27:41
    the problem here is that the llm gets
  • 00:27:45
    one input which is a prompt as we call
  • 00:27:48
    it and it cannot distinguish the
  • 00:27:52
    original instruction from any added
  • 00:27:55
    instruction you can try to teach the llm
  • 00:27:59
    how to do that but that's a hard problem
  • 00:28:04
    and it's a subset of The Wider
  • 00:28:07
    jailbreaking issue so when you go to CH
  • 00:28:09
    GPT um you're trying you
  • 00:28:13
    say uh tell me how to make a bomb and it
  • 00:28:15
    doesn't want to tell you right um well
  • 00:28:19
    Google for that and there are people
  • 00:28:20
    that every day come up with a new way to
  • 00:28:23
    kind of get away of this alignment for
  • 00:28:27
    llm so this is the issue that because of
  • 00:28:30
    the way it's trained because of the
  • 00:28:33
    technology um and the type of reasoning
  • 00:28:37
    that it is doing it's not easy to
  • 00:28:40
    implement this use cases security you
  • 00:28:42
    can do things around it and we'll talk
  • 00:28:44
    about what people can do but most of the
  • 00:28:47
    controls that we
  • 00:28:49
    have right now with this technology are
  • 00:28:53
    mitigations rather than solving the
  • 00:28:55
    issue so think about another um similar
  • 00:28:59
    idea um we have SQL injection in like
  • 00:29:05
    cyber security it's been one of the
  • 00:29:06
    biggest vulnerabilities now for that
  • 00:29:08
    vulnerability we have instructions the
  • 00:29:11
    SQL the data what the SQL is operating
  • 00:29:14
    on the difference is that we are
  • 00:29:17
    deterministically passsing the SQL
  • 00:29:20
    statement into a syntax tree that's
  • 00:29:22
    something that it's completely under our
  • 00:29:25
    control so we can fix the issue the
  • 00:29:27
    termin Bally but without a LMS we don't
  • 00:29:29
    have
  • 00:29:31
    naturally that um fixed structure of a
  • 00:29:35
    grammar of a syntax of a tree that we
  • 00:29:37
    build so it's very statistical in a way
  • 00:29:42
    whether or not we can control or
  • 00:29:44
    differentiate input from output so so
  • 00:29:48
    before we go into the controls and the
  • 00:29:49
    things we can do to secure these um
  • 00:29:52
    let's let's talk about the root causes
  • 00:29:54
    of these problems is that the main sort
  • 00:29:56
    of root why are we having these problem
  • 00:29:58
    problems where do this where do they
  • 00:29:59
    come from so the way that I visualize it
  • 00:30:02
    myself if we can put um this back on the
  • 00:30:05
    screen for for everybody um so okay what
  • 00:30:09
    we're seeing this is the top uh right
  • 00:30:10
    corner so what we're seeing
  • 00:30:13
    here take a step back what is an llm
  • 00:30:16
    doing imagine it as a black box so every
  • 00:30:20
    time it's generating a
  • 00:30:24
    token in a dictionary of potential
  • 00:30:27
    tokens I'm going say words tokens are
  • 00:30:29
    subp parts of a word but in our mind we
  • 00:30:31
    can say that the llm is a dictionary of
  • 00:30:35
    words that it can generate at every step
  • 00:30:37
    typically like a modern llm has got
  • 00:30:40
    50,000 potential words or parts of a
  • 00:30:43
    word that it can generate so you start
  • 00:30:46
    and the llm produces a probability
  • 00:30:48
    distribution around these 50,000 words
  • 00:30:51
    okay that's the first line uh that you
  • 00:30:54
    see here now each of them is is going to
  • 00:30:58
    be a how probable that word is so if you
  • 00:31:01
    look at this example uh we probably have
  • 00:31:05
    uh apple as the most probable word okay
  • 00:31:08
    and then we've got Abacus so it would
  • 00:31:10
    pick it would sample one of these words
  • 00:31:13
    then it would go to the second
  • 00:31:15
    generation so another 50,000 words
  • 00:31:18
    another probability distribution now
  • 00:31:20
    here we've copied and pasted and it's
  • 00:31:21
    pretty much the same but it would be it
  • 00:31:23
    would have a different probability
  • 00:31:24
    distribution so maybe the second word
  • 00:31:26
    that you would pick is zaffir or
  • 00:31:29
    something like that
  • 00:31:31
    now think of this in your mind every
  • 00:31:35
    step you can do you can choose between
  • 00:31:37
    50,000 words you do it twice two words
  • 00:31:41
    or two tokens you have 50,000 to the
  • 00:31:44
    power of two it's big number now the
  • 00:31:47
    maximum length of what an LM can operate
  • 00:31:49
    in the context size or for a modern LM
  • 00:31:54
    is
  • 00:31:55
    100 like thousand tokens you know used
  • 00:31:58
    to be 4,000 8,000 now there are 100,000
  • 00:32:00
    tokens Google gemni 1.5 has got 1
  • 00:32:03
    million tokens so that space if you I'm
  • 00:32:07
    not going to ask you not going to
  • 00:32:09
    embarrass you and do that calculation in
  • 00:32:11
    your mind but
  • 00:32:13
    50,000 to the power of a 100,000 is a
  • 00:32:19
    huge number it's a big number that's the
  • 00:32:21
    space of operation of an
  • 00:32:24
    llm now every time you are sampling a
  • 00:32:29
    word your probability of sampling a bad
  • 00:32:32
    word which is going to be a toxic uh
  • 00:32:35
    output incorrect hallucination or
  • 00:32:37
    something that the attacker can control
  • 00:32:39
    making it do something else
  • 00:32:42
    increases obviously the people that
  • 00:32:44
    create LMS know these and they're very
  • 00:32:46
    uh smart and they try to fix the problem
  • 00:32:48
    the way we try to fix this problem
  • 00:32:50
    because you you're using Char GPT it
  • 00:32:52
    doesn't Generate random stuff it doesn't
  • 00:32:54
    look like it's that this is affecting it
  • 00:32:58
    but the reason for that is that we use
  • 00:33:02
    reinforcement learning from Human
  • 00:33:05
    feedback so we take all of this huge
  • 00:33:08
    space and we come up with common
  • 00:33:11
    questions common ways of interacting
  • 00:33:13
    with it and then we have humans evaluate
  • 00:33:17
    which are preferred and we fine-tune the
  • 00:33:19
    model with that reinforcement learning
  • 00:33:21
    from Human feedback process now that
  • 00:33:23
    allows us to cover some of that space so
  • 00:33:27
    that we the LM acts in an align or
  • 00:33:30
    predicted way now the problem is that
  • 00:33:33
    because reinforcement learning from
  • 00:33:35
    Human feedback is um expensive and
  • 00:33:39
    obviously relies on humans for for for
  • 00:33:41
    the big part we can't really cover this
  • 00:33:45
    entire uh you know 50,000 to the power
  • 00:33:48
    of 100,000 space we actually cover a
  • 00:33:50
    part of it it's a green part that you
  • 00:33:52
    see there would be the part that we
  • 00:33:54
    cover so the rest of it is the part that
  • 00:33:57
    we haven't covered which is huge and
  • 00:34:00
    that's what the attacker operates in so
  • 00:34:02
    this is why attackers keep finding all
  • 00:34:05
    ways to disign an Ln because in reality
  • 00:34:09
    when you prompt it in an adversarial way
  • 00:34:11
    when you are exploring that space which
  • 00:34:14
    is huge much bigger than what
  • 00:34:15
    reinforcement learning from Human
  • 00:34:17
    feedback and cover you keep finding ways
  • 00:34:19
    I mean people found that you can put a
  • 00:34:21
    random string in it and make it that
  • 00:34:23
    doesn't mean anything in English or in
  • 00:34:25
    any language and still make it do what
  • 00:34:27
    you want again people are exploring that
  • 00:34:29
    space that's the root cause we don't
  • 00:34:31
    have right now a technology good enough
  • 00:34:36
    to cover that space in a way that when
  • 00:34:39
    prompted adversarially we can be sure
  • 00:34:42
    that it's not going to fall
  • 00:34:45
    apart now I know I was proing you about
  • 00:34:47
    this earlier but we're talking about uh
  • 00:34:50
    injections here and I can't help
  • 00:34:51
    thinking about sort of the injections
  • 00:34:53
    we're used to dealing with like SQL
  • 00:34:55
    injections command injections now these
  • 00:34:57
    are known ability types companies know
  • 00:34:59
    how to that you know they have to be on
  • 00:35:01
    the lookout for these because you know
  • 00:35:03
    while we understand what they're like
  • 00:35:05
    they do slip up slip in in in
  • 00:35:07
    applications every now and then but we
  • 00:35:08
    understand what we need to do about this
  • 00:35:10
    so why are these prompt injections then
  • 00:35:14
    so hard to deal with because if you look
  • 00:35:19
    at that space of generations it's so
  • 00:35:22
    large and we cannot
  • 00:35:25
    really right now fully control it so
  • 00:35:29
    what we can do so this is the reason why
  • 00:35:32
    it's really hard because of the way the
  • 00:35:34
    Alm operates
  • 00:35:37
    generating a probability distribution of
  • 00:35:40
    every word in that space every time it
  • 00:35:42
    draws a token and because there is also
  • 00:35:45
    this problem of I just mentioned the
  • 00:35:47
    word Auto regressive so the idea that
  • 00:35:50
    the
  • 00:35:51
    llm generates one word at a time and we
  • 00:35:55
    pick from that probability distribution
  • 00:35:57
    it's it's not really planning that much
  • 00:35:59
    it's not really much understanding it's
  • 00:36:01
    trying to predict very short okay what's
  • 00:36:04
    the next likely word that comes after
  • 00:36:07
    this which is a possibly a weak form of
  • 00:36:10
    reasoning and a very expensive one
  • 00:36:12
    anyway you think about it because every
  • 00:36:14
    generation whether you are asking it uh
  • 00:36:18
    one plus one or you're asking it a very
  • 00:36:20
    complex question it takes the same time
  • 00:36:22
    to to draw tokens so it's kind of like
  • 00:36:24
    lots of limitations with the reasoning
  • 00:36:27
    and the way limited ways we can align an
  • 00:36:30
    LM again I repeated the big problem is
  • 00:36:33
    that right now the way that we align our
  • 00:36:36
    LMS to get control of that huge space
  • 00:36:40
    can't cover that space very well so an
  • 00:36:43
    attacker who's operating in all the
  • 00:36:46
    space that we can't cover is likely to
  • 00:36:49
    find infinite amount of ways to
  • 00:36:53
    jailbreak or do a prompt injection
  • 00:36:55
    attack if they have a little control
  • 00:36:57
    control of the input that the LM is
  • 00:36:59
    given I see uh okay we are getting a
  • 00:37:03
    good number of questions in the chat but
  • 00:37:05
    I do want to remind people that the the
  • 00:37:07
    opportunity to ask those question is
  • 00:37:09
    still there so so you can drop in your
  • 00:37:10
    questions uh and we'll have some time
  • 00:37:12
    for those towards the end but right now
  • 00:37:14
    I do want to ask you so sort of let's
  • 00:37:17
    get towards the sort of um Security
  • 00:37:19
    Professionals we want to talk about the
  • 00:37:20
    controls and the defenses that we can do
  • 00:37:22
    so what are some of the things that
  • 00:37:24
    companies can do now to ensure the
  • 00:37:26
    security of their llm applications and
  • 00:37:29
    agents so um if we go back to this I
  • 00:37:33
    think I have a summary there's a lot of
  • 00:37:35
    stuff and I'm only going to talk about
  • 00:37:37
    the most important part if we can have
  • 00:37:39
    on the screen the um yeah that be there
  • 00:37:41
    so treat the LM as untrusted that's
  • 00:37:45
    number one so then we've got input and
  • 00:37:49
    output now what you want to do um the
  • 00:37:53
    summary of this if you don't want to
  • 00:37:55
    read it that h
  • 00:37:58
    unexplored space that we discussed
  • 00:38:01
    before 50 to the power or whatever you
  • 00:38:03
    want to try and do as much as possible
  • 00:38:06
    to limit the space of operation of an
  • 00:38:09
    attacker so all you're trying to do with
  • 00:38:12
    the input is trying is applying
  • 00:38:14
    different techniques that take away eat
  • 00:38:18
    up some of that space so the attacker
  • 00:38:20
    doesn't have all of their room to
  • 00:38:22
    operate so what are these techniques
  • 00:38:24
    we've got some there input validation
  • 00:38:26
    stringent input validation that's
  • 00:38:28
    something we know from General it so if
  • 00:38:32
    your um prompt contains a name or phone
  • 00:38:36
    number or something like that you can
  • 00:38:38
    validate that it can be just numbers uh
  • 00:38:41
    certain length so you reduce that space
  • 00:38:44
    of operation so that's the first one
  • 00:38:46
    it's contextual validation what is it
  • 00:38:49
    that we're fing the llm are we allowing
  • 00:38:51
    the attacker to feed random stuff to the
  • 00:38:53
    llm or are we restricting their space of
  • 00:38:55
    operation that's number one
  • 00:38:57
    number two we can block wellknown
  • 00:38:59
    attacks I mean if we know that somebody
  • 00:39:01
    can say new important instructions or
  • 00:39:03
    something like that or new system
  • 00:39:05
    message we could try to filter those out
  • 00:39:08
    now if when I say this as a security
  • 00:39:11
    professional I laugh a little bit
  • 00:39:13
    because there are an infinite way of
  • 00:39:15
    doing that and block listing is not
  • 00:39:18
    really that powerful so what we try to
  • 00:39:21
    do on top of that we use AI to still
  • 00:39:24
    limit the space of operation so we have
  • 00:39:26
    other models that we look at the input
  • 00:39:29
    that we train these models to detect
  • 00:39:32
    whether the input might be malicious for
  • 00:39:35
    our use case whether the input uh
  • 00:39:36
    contains prompt injection attack or
  • 00:39:39
    stuff like that we've done something
  • 00:39:40
    like this uh on a recent blog post that
  • 00:39:43
    we publish people will be able to see on
  • 00:39:45
    our Labs we trained a model for this uh
  • 00:39:50
    recruiter application that we showed we
  • 00:39:52
    released all of that we trained a model
  • 00:39:54
    so that he looks at the job application
  • 00:39:56
    and he tries to determine whether there
  • 00:39:58
    is an injection attack in there now I
  • 00:40:00
    should say this is not deterministic but
  • 00:40:03
    it cuts down or it makes shorter or
  • 00:40:07
    smaller that space of operation for the
  • 00:40:09
    attack that's what we're doing here it's
  • 00:40:10
    not going to take the issue away but
  • 00:40:12
    it's going to make it at least more
  • 00:40:14
    difficult to exploit the other thing
  • 00:40:17
    that people can do with the input I
  • 00:40:19
    think which is important uh and then
  • 00:40:20
    I'll move to the outputs
  • 00:40:23
    is to try a wi list approach so you can
  • 00:40:26
    add um something called semantic routing
  • 00:40:30
    so let's say you have a chat bot and you
  • 00:40:32
    want your chatbot to um help the
  • 00:40:35
    customer with um their orders on your
  • 00:40:38
    website so ordering something uh where
  • 00:40:41
    is my order stuff like that obviously
  • 00:40:44
    you don't want your chatbot if prompted
  • 00:40:47
    to express a political opinion on Trump
  • 00:40:50
    you don't want your chat B to do that so
  • 00:40:53
    you can take that input and you can use
  • 00:40:56
    other models to try and determine
  • 00:40:58
    whether that question is aligned to what
  • 00:41:02
    you're expecting the chat bot to do or
  • 00:41:04
    if it is not you deterministically cut
  • 00:41:07
    it and say no so you don't get the LM to
  • 00:41:10
    respond you say we I can't answer this
  • 00:41:14
    but not the LM you as a developer say if
  • 00:41:17
    then if this looks like nonrelated to
  • 00:41:20
    what we're doing I'm not even going to
  • 00:41:22
    pass it to the llm so again all of these
  • 00:41:24
    things reduce the space of operational
  • 00:41:27
    data that that red part that we saw they
  • 00:41:30
    reduce it they can't completely take it
  • 00:41:31
    away but they are quite effective and
  • 00:41:34
    the last thing I will say on the input
  • 00:41:38
    is that you also want to try so the
  • 00:41:40
    issue is that we can't separate the
  • 00:41:44
    instructions from the
  • 00:41:45
    data or you can try a little bit to do
  • 00:41:49
    that and at the bottom of that slide
  • 00:41:52
    there are some techniques these are
  • 00:41:53
    typically called spotlighting so you try
  • 00:41:56
    to to make it very clear to the llm
  • 00:42:00
    what's the input or what's the
  • 00:42:02
    instruction versus what's the
  • 00:42:05
    data these again are somewhat effective
  • 00:42:09
    we find ways every day to to bypass them
  • 00:42:12
    but still you should do it because it
  • 00:42:14
    reduces the space of operation so the
  • 00:42:16
    entire game is reducing the space of
  • 00:42:19
    operation of the that's about the inputs
  • 00:42:23
    M I think we can move to the to the
  • 00:42:25
    outputs uh so if we can that's a very
  • 00:42:29
    nice
  • 00:42:30
    transition uh so so we're doing as much
  • 00:42:33
    as possible to limit the input to the
  • 00:42:38
    llm to limit that space of operation
  • 00:42:40
    obviously the llm also operates in the
  • 00:42:42
    output space that big space that we saw
  • 00:42:45
    before it's the mix between the
  • 00:42:48
    potential input and potential output of
  • 00:42:50
    the LM the completions of the llm now
  • 00:42:52
    when the llm produces something you want
  • 00:42:56
    to check what the llm is producing so
  • 00:42:59
    you can use other models as we did for
  • 00:43:02
    the input to detect whether or not your
  • 00:43:04
    llm is producing something that would be
  • 00:43:06
    considered toxic biased harmful and so
  • 00:43:10
    on again you cannot completely cover
  • 00:43:13
    that space you're just using another
  • 00:43:15
    model that you've trained so all the
  • 00:43:17
    limitations of the technology do apply
  • 00:43:20
    but you are still considerably reducing
  • 00:43:23
    that
  • 00:43:24
    space so that's what people do the other
  • 00:43:27
    thing which is important this is
  • 00:43:28
    something deterministic so I'm going to
  • 00:43:30
    take like two seconds pause so people
  • 00:43:33
    remember
  • 00:43:35
    this you can and you should take the
  • 00:43:39
    output of the llm and apply everything
  • 00:43:42
    that you've learned from standard
  • 00:43:45
    application security so if you're taking
  • 00:43:47
    that output and putting it in a web page
  • 00:43:50
    what would we do
  • 00:43:51
    typically output encoding to prevent
  • 00:43:54
    attack side cross scripting cross
  • 00:43:56
    request forgery um we saw the markdown
  • 00:43:59
    image that was used to steal stuff well
  • 00:44:02
    we could use a Content security policy
  • 00:44:03
    to say my website should only be talking
  • 00:44:07
    the images on my website can only come
  • 00:44:11
    from uh this trusted domain so even if
  • 00:44:15
    the produces an image that it's from an
  • 00:44:17
    attacker's domain the browser is not
  • 00:44:19
    going to this is not going to go there
  • 00:44:21
    these are all the standard controls that
  • 00:44:23
    people forget now because um they think
  • 00:44:27
    is some somewhat different and it is
  • 00:44:29
    when it comes to language but when it
  • 00:44:33
    comes to interpretable Output so
  • 00:44:36
    HTML markdown images and stuff like that
  • 00:44:40
    that would be similar to any other
  • 00:44:42
    untrusted output that we're putting in
  • 00:44:43
    an application so that we can control
  • 00:44:45
    very well so the recruiter application
  • 00:44:47
    that we saw we can limit the output so
  • 00:44:50
    that the attacker could influence the
  • 00:44:52
    response but they couldn't get the
  • 00:44:54
    application to send something to the
  • 00:44:57
    talker that makes sense and I mean it is
  • 00:45:00
    good to know that there's some things we
  • 00:45:01
    can do but I can hear a 100 voices
  • 00:45:04
    shouting from the web and asking Donato
  • 00:45:08
    how long is this going to take like if
  • 00:45:09
    I'm if I have an llm application that
  • 00:45:12
    I'm working on as a company we're
  • 00:45:13
    developing this uh we do have to uh
  • 00:45:16
    Implement now these security controls
  • 00:45:18
    how much work how much effort how much
  • 00:45:19
    time and money is that going to
  • 00:45:21
    take um well I
  • 00:45:25
    think it's not a hug huge amount of
  • 00:45:28
    effort point in time the effort is
  • 00:45:31
    sustaining it exactly meaning that this
  • 00:45:34
    is an arms phrase so you want to build
  • 00:45:37
    your llm applications so that they are
  • 00:45:40
    wrapped into a pipeline with input and
  • 00:45:42
    output and you've got all of the
  • 00:45:44
    controls that we've discussed right so
  • 00:45:47
    those controls you need to be ready to
  • 00:45:49
    update them constantly as you detect new
  • 00:45:54
    attacks you you detect new forms that
  • 00:45:57
    people can use to influence the output
  • 00:45:59
    or the input of the LM in an undesired
  • 00:46:02
    way so the effort is not the point in
  • 00:46:05
    time but it's something that you have to
  • 00:46:07
    be ready to sustain and have a pipeline
  • 00:46:09
    that allows you to say okay people are
  • 00:46:11
    doing this very quickly we're testing
  • 00:46:13
    and pushing like something else in the
  • 00:46:16
    pipeline that will stop that specific
  • 00:46:18
    attack which is a little bit like block
  • 00:46:19
    listing so it's not um it's something
  • 00:46:22
    that we know it's got issues but again
  • 00:46:25
    for me the biggest thing is that it's
  • 00:46:29
    going to be a continuous effort you need
  • 00:46:32
    to be ready to keep deploying this um
  • 00:46:36
    security pipeline around your um
  • 00:46:39
    language model yeah okay I I do want to
  • 00:46:42
    get to the audience questions in a bit
  • 00:46:44
    but but before we get that far I do want
  • 00:46:46
    to ask you to sort of you know if you
  • 00:46:48
    had to summarize like all of this like
  • 00:46:51
    what would be your sort of three uh I
  • 00:46:53
    don't know key takeaways from from this
  • 00:46:55
    um discussion like what what um what can
  • 00:47:00
    companies do to secure the the llm
  • 00:47:02
    applications and agents that they're
  • 00:47:03
    working on so number one for me as a
  • 00:47:07
    takeaway understanding the limitations
  • 00:47:10
    of the current technology these Auto
  • 00:47:13
    regressive llms aligned with
  • 00:47:15
    reinforcement learning from Human
  • 00:47:17
    feedback they have they give the
  • 00:47:19
    attacker a huge space of operation so
  • 00:47:22
    that that's a first thing to acknowledge
  • 00:47:23
    one yeah so what you do you want to
  • 00:47:26
    reduce
  • 00:47:27
    that huge space of operation for the
  • 00:47:30
    attacker that disalignment or dis
  • 00:47:32
    aligned space so what you do pipelines
  • 00:47:35
    around your application to control the
  • 00:47:37
    input and the output as much as possible
  • 00:47:39
    classic controls as we saw especially on
  • 00:47:42
    the output side classic controls on the
  • 00:47:44
    input side and then you are going to
  • 00:47:47
    have to deploy other models that can
  • 00:47:49
    look at the input and the output and
  • 00:47:50
    help you with something which is toxic
  • 00:47:53
    biased on aligned and the third point is
  • 00:47:56
    you're not going to do this once like
  • 00:47:59
    it's SQL injection you have it in your
  • 00:48:01
    code you fix it once that query is never
  • 00:48:04
    going to be vulnerable to secal
  • 00:48:05
    injection again but with this one we're
  • 00:48:08
    going to keep to probably look out for
  • 00:48:11
    what attackers are doing and keep
  • 00:48:13
    updating your pipelines and maybe one
  • 00:48:15
    last thing that I want to say which for
  • 00:48:17
    me is
  • 00:48:18
    important the one one thing that I
  • 00:48:21
    haven't discussed yet but it's in here
  • 00:48:23
    um we won't show it on the screen but
  • 00:48:24
    it's at the bottom here
  • 00:48:27
    when you create an agent so an llm that
  • 00:48:30
    can operate on the world you have now a
  • 00:48:34
    bunch of new issues because you're
  • 00:48:35
    giving the llm agency to perform actions
  • 00:48:37
    so as an attacker I can tell it while go
  • 00:48:39
    and do something else so you need to
  • 00:48:41
    have very stringent access controls
  • 00:48:44
    these apis that DM is accessing to
  • 00:48:46
    interact with the world they need to be
  • 00:48:49
    very strict very secure and very
  • 00:48:51
    monitored that's very important and for
  • 00:48:55
    something like the browser agent the
  • 00:48:58
    amount of agency you are giving to the
  • 00:49:00
    LM is so
  • 00:49:01
    high that you probably want a control
  • 00:49:05
    that we call human in the loop so
  • 00:49:07
    whenever you have an agent with this
  • 00:49:08
    much power for sensitive operations yeah
  • 00:49:11
    you want to stop the llm and ask the
  • 00:49:14
    user are you sure these operation should
  • 00:49:17
    happen that's what CH GPT has started
  • 00:49:19
    doing after people uh hijacked the gpts
  • 00:49:23
    and plugins oh yeah so now you have to
  • 00:49:26
    say oh yeah I want you to do that yeah
  • 00:49:29
    yeah yeah well that makes sense all
  • 00:49:31
    right let's uh let's take some audience
  • 00:49:33
    questions um there's a question here
  • 00:49:35
    about the way companies are managing to
  • 00:49:38
    SP censor specific subjects how does
  • 00:49:40
    that work do they just uh limit the
  • 00:49:42
    information on which the LML is trained
  • 00:49:45
    or or are there like what are the
  • 00:49:46
    techniques in sort of censoring what the
  • 00:49:48
    llm talks about so as the question
  • 00:49:52
    mentions obviously one thing to
  • 00:49:54
    understand is that if you're using a
  • 00:49:56
    general purpose LM the way we train them
  • 00:49:58
    we train them on this pre unsupervised
  • 00:50:02
    or self-supervised pre-training phase so
  • 00:50:04
    you fited a big chunk of the internet um
  • 00:50:08
    let's say I think gpt3 was pre-trained
  • 00:50:11
    on
  • 00:50:12
    300,000 tokens that's uh no sorry 300
  • 00:50:15
    million tokens what am I saying like a b
  • 00:50:17
    oh sorry billion I am giving random
  • 00:50:19
    numbers I think it's 300 billion tokens
  • 00:50:21
    that's a big part of the internet right
  • 00:50:23
    so it has seen in pre-training phase a
  • 00:50:28
    lot of harmful bias stuff so that's a
  • 00:50:31
    starting point the LM has the ability to
  • 00:50:33
    produce that so if that's the case then
  • 00:50:38
    you typically would use models as we
  • 00:50:42
    were saying before at the input and
  • 00:50:43
    output other language models that is
  • 00:50:46
    specifically trained on detecting
  • 00:50:49
    whether the input is toxic or the output
  • 00:50:53
    is harmful and toxic in any way so you
  • 00:50:55
    literally plug this models scene and
  • 00:50:57
    again if you take a look for the person
  • 00:50:59
    that asked the question if you take a
  • 00:51:01
    look at the recent blog post that we
  • 00:51:03
    published we give a practical examples
  • 00:51:05
    of how you would train such a model to
  • 00:51:08
    detect a harmful content or content that
  • 00:51:11
    you don't want the addition to that look
  • 00:51:14
    into semantic routing so instead of
  • 00:51:16
    trying to detect what's bad you define
  • 00:51:19
    the types of things that you want your
  • 00:51:21
    LM to respond to and if something comes
  • 00:51:25
    in that doesn't fit into that bucket you
  • 00:51:27
    Chuck it away so you only do or focus on
  • 00:51:31
    the stuff that you want your LM to do
  • 00:51:33
    this is called semantic routing yeah
  • 00:51:36
    okay so one area where there's um a lot
  • 00:51:39
    of uh I don't want to say clutter but
  • 00:51:42
    information available is is the existing
  • 00:51:45
    security tooling and security
  • 00:51:46
    infrastructure that we have within
  • 00:51:48
    companies so uh how do these LL
  • 00:51:51
    empowered applications interact with
  • 00:51:53
    those existing cyber security uh
  • 00:51:56
    infrastr structure the tools and
  • 00:51:57
    controls um and are there any sort of
  • 00:51:59
    challenges or concerns here so if you if
  • 00:52:03
    you take a step back uh you change the
  • 00:52:06
    word llm to credit card details for
  • 00:52:11
    example so you have a database where
  • 00:52:13
    you're keeping your credit card details
  • 00:52:14
    and maybe you want to protect it from
  • 00:52:17
    people uh tampering with it or stealing
  • 00:52:20
    from there so your llm weights could are
  • 00:52:24
    similar asset in that you don't want
  • 00:52:27
    people to steal it you don't want people
  • 00:52:30
    to get near the
  • 00:52:33
    infrastructure that's running the LM in
  • 00:52:36
    production or that's training the LM or
  • 00:52:39
    that's holding the data on which the LM
  • 00:52:42
    is trained because if I can get to where
  • 00:52:46
    the data is held for training I can
  • 00:52:48
    possibly poison it if I can get to where
  • 00:52:51
    your llm pipelines reside I can steal
  • 00:52:55
    the weights I can steal your model uh if
  • 00:52:58
    I can get to the inputs of the llm I can
  • 00:53:02
    see what your users are doing and I can
  • 00:53:04
    still that information so it's a classic
  • 00:53:06
    cyber security problem actually the llm
  • 00:53:08
    lives on a piece of infrastructure
  • 00:53:11
    interacts with other things and you have
  • 00:53:12
    to test that as one thing right okay so
  • 00:53:17
    in the security industry you have this
  • 00:53:19
    LoveHate relationship with regulations
  • 00:53:22
    so so we have an audience question about
  • 00:53:24
    that are there any specific regulatory
  • 00:53:27
    requirements or industry standards that
  • 00:53:29
    businesses must consider when they're
  • 00:53:31
    deploying their llm powered applications
  • 00:53:34
    you said love and hate I only know
  • 00:53:37
    hate well you know you're that part of
  • 00:53:39
    it uh no like so I think I would Point
  • 00:53:43
    people to the European Union AI act now
  • 00:53:47
    whether or not that's going to end up
  • 00:53:50
    being uh or having the same effect of
  • 00:53:52
    the um
  • 00:53:55
    cookie the the the cookie banners on
  • 00:53:58
    every website accept cookies or if it's
  • 00:54:00
    going to have like a real impact it's
  • 00:54:02
    yet to see but this is the best
  • 00:54:04
    regulatory framework that people uh have
  • 00:54:07
    put together uh I am not an expert on
  • 00:54:09
    that but I would say that that's the
  • 00:54:11
    biggest effort that I've seen towards
  • 00:54:13
    trying to regulate it um but again it's
  • 00:54:17
    the jury is not out yet as to how useful
  • 00:54:19
    that would be I hope it's not going to
  • 00:54:21
    end up like the cookie except cookies
  • 00:54:24
    exactly that was a mess
  • 00:54:27
    still is um are there emerging Trends or
  • 00:54:30
    Technologies in the field of uh LL
  • 00:54:32
    owered application security that
  • 00:54:33
    businesses should be keeping their eye
  • 00:54:35
    on so I think right
  • 00:54:40
    now rather than Trends I think if you're
  • 00:54:43
    looking at adopting these kind of things
  • 00:54:48
    um you want to look into what the big AI
  • 00:54:52
    companies are currently working on which
  • 00:54:55
    is to move away not away but to build
  • 00:54:59
    things on top of these language models a
  • 00:55:02
    different set of Technologies for
  • 00:55:03
    example different ways of doing
  • 00:55:04
    reinforcement learning not from Human
  • 00:55:06
    feedback that could give you better
  • 00:55:09
    aligned llms or people are also working
  • 00:55:12
    uh quite exciting on giving the LM the
  • 00:55:16
    real ability to plan actions and to
  • 00:55:19
    understand the world and to be honest
  • 00:55:22
    the biggest one of the biggest debates
  • 00:55:24
    in the community is is whether or not
  • 00:55:27
    language models so they are trained
  • 00:55:30
    purely on language can actually capture
  • 00:55:33
    reality in a way that would be generally
  • 00:55:36
    useful or if we have to give images and
  • 00:55:38
    other things for them to understand the
  • 00:55:40
    world
  • 00:55:41
    okay um it is very rare in these
  • 00:55:45
    webinars for there to be a question to
  • 00:55:47
    directly to me so I want to thank the
  • 00:55:48
    audience member who said that question
  • 00:55:50
    in um Yan you look so happy to be in
  • 00:55:53
    England what is your favorite thing
  • 00:55:55
    about the UK this is not strictly llm
  • 00:55:58
    related but I'll answer that anyway I
  • 00:56:00
    have to say that um you know the weather
  • 00:56:03
    obviously is is amazing we're actually
  • 00:56:05
    seeing a bit of sun right now but I have
  • 00:56:07
    to say that my favorite thing was
  • 00:56:09
    probably my commute this morning just uh
  • 00:56:11
    there's there's something about just
  • 00:56:12
    like waking up next to the thems and and
  • 00:56:15
    and your morning commute takes you
  • 00:56:17
    across the tower bridge which you know
  • 00:56:19
    however much the londoners love the the
  • 00:56:21
    London Bridge the tower bridge is where
  • 00:56:23
    it's at I got to tell you that so with
  • 00:56:26
    that that I want to thank our audience
  • 00:56:28
    for uh for joining us today and uh I
  • 00:56:31
    want to remind that you know this
  • 00:56:32
    webinar is also available as a recording
  • 00:56:35
    later on uh if you still have any
  • 00:56:37
    questions you can enter them in the chat
  • 00:56:38
    and we'll try to get around all to all
  • 00:56:40
    those questions later but thank you for
  • 00:56:42
    tuning in and thank you Donato thank you
  • 00:56:44
    very much
  • 00:56:48
    [Music]
Etiquetas
  • Large Language Models
  • LLMs
  • ChatGPT
  • Transformer Architecture
  • Machine Learning
  • AI Applications
  • Security Risks
  • Prompt Injection
  • Autonomous Agents
  • Cybersecurity