What are large language models (LLMs)?

LLMs are advanced AI systems, like ChatGPT, designed to understand and generate human-like text.

LLMs evolved from advancements in language translation and the introduction of transformer architecture with attention mechanisms.

What are the main risks associated with LLM applications?

Prompt injection attacks and the potential for generating harmful outputs are significant risks linked to LLM applications.

What security measures can be taken for LLM applications?

Security measures include input validation, output monitoring, using reinforcement learning for model alignment, and implementing access controls.

Are current LLM technologies an existential threat to humanity?

Current LLM technologies do not pose an existential threat, as they are not self-aware or fully autonomous.

What is an autonomous agent in the context of LLMs?

An autonomous agent is an application that can perform tasks on behalf of a user without supervision, utilizing LLMs.

How do LLMs handle biases and harmful content?

Specialized models can detect and filter out biased or harmful content in input and output.

What future trends should businesses watch in LLM application security?

Businesses should pay attention to advancements in reinforcement learning techniques and the integration of multi-modal data (e.g., images and videos) with LLMs.

[Webinar] Building LLM applications in a secure way (WithSecure™)

00:56:51

https://www.youtube.com/watch?v=tVAmhlUVEcg

Summary

TLDRIn this webinar, expert Donato Capitella discusses the development and implications of large language models (LLMs). He describes their roots in language translation technology and the foundational transformer architecture that powers today's advanced models. The presentation covers the broad applications of LLMs, such as aiding in recruitment processes and the potential of autonomous agents that can perform tasks with minimal human oversight. However, Donato warns of security risks, such as prompt injection attacks, which exploit vulnerabilities in how LLMs interpret input. He emphasizes the importance of robust security measures, reiterating the need for companies to treat LLMs as untrusted components, implement strict monitoring, and adapt continuously to emerging threats. Overall, while LLMs present exciting opportunities, they require careful management and security considerations.

Takeaways

🧠 LLMs evolved from advancements in language translation and transformer technology.
⚙️ Prompt injection attacks pose significant risks to LLM applications.
🔒 Companies should treat LLMs as untrusted entities and implement security controls.
📈 Continuous monitoring and updates are essential to mitigate risks associated with LLMs.
🤖 Autonomous agents could change how LLMs interact with the external world.
🔍 Deploy models to check for biased or harmful outputs from LLMs.
🛠️ Input validation is crucial in reducing the risk of injections.
🗣️ Understanding the limitations of LLMs is fundamental for their secure deployment.

Timeline

00:00:00 - 00:05:00
The webinar begins with Yanen introducing expert Donato Capitella to discuss large language models (LLMs). They highlight the surge of interest in LLMs like ChatGPT and open with a conversation about their origins and the advancements in machine learning that preceded them.
00:05:00 - 00:10:00
Donato elaborates on the history of language models, explaining that the early challenges involved translating sentences from one language to another and how this led to innovations in the encoder-decoder architecture over the past decade. He then introduces the attention mechanism which has become crucial in modern NLP tasks.
00:10:00 - 00:15:00
The discussion transitions into the introduction of Transformers, developed by Google in 2017, which combined previous advancements in encoder-decoder models and attention mechanisms, thereby allowing for effective parallel processing of data important for training sophisticated LLMs.
00:15:00 - 00:20:00
The application of LLMs is discussed, defining applications as wrappers around the language models for specific use cases. Examples given include the utilization of LLMs for generic tasks like summarization or the more specific use cases like enhancing features in services like Google Docs.
00:20:00 - 00:25:00
Donato shares his viewpoints on promising future applications of LLMs, targeting the development of autonomous agents capable of interacting with their environment and performing tasks autonomously without human oversight, hinting at the potential evolution of embodied AI.
00:25:00 - 00:30:00
The safety of LLM technologies is addressed, specifically referencing misconceptions surrounding their security risks likened to the science fiction notion of 'Skynet'. Donato insists that current LLM technologies lack true consciousness and discusses potential security challenges they face.
00:30:00 - 00:35:00
Yanen prompts Donato to showcase a demonstration related to LLMs and risks. The demo illustrates a recruitment scenario where an attacker exploits a job application through prompt injection to extract confidential candidate information from the system.
00:35:00 - 00:40:00
Demonstrating how LLM applications can be exploited, Donato illustrates how attackers can leverage vulnerabilities within LLM systems—most notably through prompt injection, which manipulates LLM outputs to exfiltrate sensitive information.
00:40:00 - 00:45:00
The discussion expands into potential real-world threats presented by LLMs, showcasing various examples of prompt injection risks, followed by emphasizing the importance of treating LLM output as untrusted and the need for strict security measures to handle these vulnerabilities.
00:45:00 - 00:56:51
Yanen invites Donato to share insights on essential safeguards that organizations can implement to secure their LLM applications and highlights the need for continuous monitoring and updates to the security framework for LLM use cases.

Mind Map

Video Q&A

What are large language models (LLMs)?
LLMs are advanced AI systems, like ChatGPT, designed to understand and generate human-like text.
How did LLMs evolve?
LLMs evolved from advancements in language translation and the introduction of transformer architecture with attention mechanisms.
What are the main risks associated with LLM applications?
Prompt injection attacks and the potential for generating harmful outputs are significant risks linked to LLM applications.
What security measures can be taken for LLM applications?
Security measures include input validation, output monitoring, using reinforcement learning for model alignment, and implementing access controls.
Are current LLM technologies an existential threat to humanity?
Current LLM technologies do not pose an existential threat, as they are not self-aware or fully autonomous.
What is an autonomous agent in the context of LLMs?
An autonomous agent is an application that can perform tasks on behalf of a user without supervision, utilizing LLMs.
How do LLMs handle biases and harmful content?
Specialized models can detect and filter out biased or harmful content in input and output.
What future trends should businesses watch in LLM application security?
Businesses should pay attention to advancements in reinforcement learning techniques and the integration of multi-modal data (e.g., images and videos) with LLMs.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!

Subtitles

Auto Scroll:

00:00:00
hello and welcome to this with secure
00:00:02
webinar my name is yanen and the expert
00:00:05
on the show is Donato capitella and
00:00:07
we're going to be talking about large
00:00:09
language models so Donato I guess a lot
00:00:12
of people will already know large
00:00:14
language models chat TPT things like
00:00:16
that they're a Hot Topic right now but
00:00:18
uh how would you describe llms and and
00:00:21
where did they come from so indeed
00:00:24
that's an extremely hot topic and I
00:00:27
remember um myself when I first uh used
00:00:32
chat GPT I think as an engineer uh I can
00:00:36
see the incredible advancements that we
00:00:40
have
00:00:41
made as a hacker or ethical hacker or
00:00:45
security guy um I always try to
00:00:49
understand how that technology came
00:00:52
together so it didn't happen in a vacuum
00:00:55
and I think it's important to start by
00:00:59
unpacking the technology what what was
00:01:02
the journey that took us there so that
00:01:04
we can talk about the security of it
00:01:06
because if you don't have a good
00:01:10
understanding of what that technology is
00:01:12
then it becomes quite hard to to secure
00:01:15
it and so to to answer your question
00:01:18
about where llms come from um without
00:01:22
going too too much uh or too too far
00:01:27
away or back in time
00:01:30
probably 10 years ago is where we
00:01:33
started having some um
00:01:36
advancements that have led us to large
00:01:39
language models I'll take a a few
00:01:41
minutes to set the stage I think I think
00:01:43
it's important uh and it's also what I
00:01:45
geek out on anyway so um I'm going to
00:01:48
impose this on everybody who's watching
00:01:51
so the problem one of the problems
00:01:55
people have been trying to solve uh for
00:01:57
a long time in machine learning is
00:02:00
translation language
00:02:02
translation uh on the surface it could
00:02:05
be quite a simple problem word like
00:02:07
there is a word in Italian uh B
00:02:11
translates to Glass in English that's
00:02:15
fine however language is not that simple
00:02:17
isn't it like you can have an input
00:02:20
sentence in Italian with a certain
00:02:22
number of words and the output the
00:02:26
translated sentence in English can have
00:02:28
a completely different number number of
00:02:30
words different grammatical structure
00:02:32
different ways of saying things so it's
00:02:35
actually quite a tough problem in
00:02:37
machine learning to align your
00:02:40
sentence in one language to the
00:02:42
translation so people started working on
00:02:45
that and that's where one of the first
00:02:48
uh I think big Innovations came from
00:02:52
around 10 years ago IIA satova who is
00:02:55
the guy behind open AI CH one of the
00:02:58
engineers behind that
00:03:00
um was experimenting with this problem
00:03:02
trying to solve it and they came up with
00:03:04
the enoda decoda architecture so this
00:03:07
idea of splitting your AI so that you
00:03:11
have instead of trying to directly align
00:03:14
your sentence in Italian your sentence
00:03:16
in English you're now um having like a
00:03:20
model that takes any sentence in Italian
00:03:25
and gives you a hidden compressed
00:03:26
representation of that sentence and then
00:03:28
you have a decoder which takes that
00:03:30
hidden representation and gives you or
00:03:34
generates the English translation so
00:03:37
that's the first advancement it worked
00:03:39
really well for short sentences much
00:03:41
better than the methods before but it
00:03:44
didn't work well for long range so if
00:03:45
you had a very long sentence it would
00:03:48
start missing things sure so then you
00:03:51
have the second Innovation which is in
00:03:53
today's language models which is the
00:03:55
concept of attention like if anybody has
00:03:58
heard of language model
00:04:00
theyve heard of the attention thing so
00:04:03
somebody else the year after which is
00:04:06
2015 took this and Coda decoda to
00:04:09
translate language and added the concept
00:04:12
of attention so they said I think the
00:04:15
researcher is called badano and we
00:04:17
typically call this bados attention sure
00:04:20
he never called it attention he didn't
00:04:22
know this was attention but basically as
00:04:25
you're
00:04:26
translating something you build you
00:04:30
sorry you give weights to your input
00:04:32
sentence to each word to see which is
00:04:35
the most important words or which are
00:04:37
the most important words that matter to
00:04:39
translate this particular word in
00:04:41
English so if you want to say um the
00:04:45
glass when you're translating the it
00:04:48
really matchs that it refers to Glass
00:04:50
this particular it's in Italian that
00:04:53
would mean it's a different gender you
00:04:55
translate it as the can be translated as
00:04:58
La law and so was fantastic I mean that
00:05:01
changed the way we do language
00:05:03
translation and these are the two
00:05:05
innovations put them together and you
00:05:07
have 2017 Google does puts these two
00:05:11
ideas together and creates the
00:05:12
Transformer this is what powers language
00:05:15
model say it's just the attention the
00:05:18
encoder decoder but with a little trick
00:05:22
to make it very performant sure so that
00:05:24
you can parallelize it on gpus yeah okay
00:05:28
so you can now have a lot of training
00:05:30
data and that's what creates the
00:05:32
language models that we have today yeah
00:05:34
right so another concept that we're
00:05:36
going to be talking about is llm
00:05:38
applications so what are those okay so
00:05:41
you now have an llm like one part of
00:05:44
this Transformer um and it can do things
00:05:47
it can do uh it's called these emergent
00:05:49
abilities you ask it to summarize
00:05:51
something you give it a prompt and it
00:05:53
does something for
00:05:55
you now a n application is simply a
00:05:59
wrapper around the language model that
00:06:03
creates a use case use case is a really
00:06:07
important so another line by itself so
00:06:09
what are some examples of use cases so
00:06:11
uh Char GPT is an example so you've got
00:06:14
you're just a general purpose assistant
00:06:16
you need an interface to give it input
00:06:20
and to get the output out but you can
00:06:21
have specific use cases uh for example
00:06:24
you can have a feature in Google Docs
00:06:27
where you select something and you
00:06:29
refresh is it for you that's a use case
00:06:31
that's an application which wraps a use
00:06:34
case so interaction with users and other
00:06:36
systems that's a real key so you make
00:06:39
the llm interact in different ways with
00:06:42
the uh external systems and that defines
00:06:46
the use case what you're using it for
00:06:48
what you give it as input what you do
00:06:50
with the output okay um are there any
00:06:54
other like uh helpful examples of uh llm
00:06:58
applications or like is there something
00:07:00
that you're waiting to emerge but that
00:07:02
hasn't yet what's the next application
00:07:05
so I think uh and we'll talk about it
00:07:07
maybe later but I think the biggest
00:07:10
promise is that with the llm we're going
00:07:14
to build autonomous agents so things
00:07:18
that can go and interact with the
00:07:21
external world on your behalf but
00:07:25
without your supervision to do any task
00:07:28
okay so an agent is an application that
00:07:30
has that sort of uh self-driving
00:07:32
capability yeah that he can take make
00:07:36
decisions plan a serious of steps go
00:07:39
outside in the world using an API or
00:07:41
whatever we give it and enact it and if
00:07:43
you push it forward a little bit you can
00:07:45
have an embodied artificial intelligence
00:07:48
where you give it a body you have the
00:07:50
llm or anything else inside but this can
00:07:52
actually move and interact with the
00:07:54
world not at a virtual level but at a
00:07:57
you know physical level we haven't seen
00:07:59
we seen some hints of that obviously
00:08:01
this is not something that we can have
00:08:03
now we can't have like an Android or a
00:08:05
robot come in but that could be
00:08:07
something that people are working on
00:08:09
looking in the future see now you start
00:08:12
talking about Ai and interacting with
00:08:15
things in the physical world so the
00:08:18
natural question is are we going to see
00:08:20
a Skynet anytime soon Terminators
00:08:22
walking about so maybe the question I
00:08:24
want to ask uh is is what are some of
00:08:27
the misconceptions people have about
00:08:29
about sort of the security of llms ah so
00:08:33
you know that when you say Skynet that's
00:08:36
kind like a trigger word for me because
00:08:38
I think a lot of the discourse around
00:08:43
current AI safety uh seems to be about
00:08:47
this existential threat for Humanity
00:08:50
right yeah that it's going to uh come
00:08:53
alive and take control take over um now
00:08:58
that might very well be
00:08:59
but not with the current technology I
00:09:02
can't obviously be 100% sure but I I
00:09:05
would like to express an opinion that
00:09:07
and later we can talk about why uh but
00:09:10
because of the way llms are are built
00:09:13
and that kind of current generative AI
00:09:16
technology I think we are quite far away
00:09:20
from something that's self-conscious and
00:09:22
fully autonomous we can emulate some
00:09:25
parts of that but we will see some
00:09:27
examples of why they can't really uh
00:09:30
we're not there yet and so that's a big
00:09:32
misconception for me uh I don't think
00:09:33
they are an existential threat the
00:09:35
current technology uh let's see what
00:09:38
happens when people develop different
00:09:40
Technologies yeah
00:09:42
exactly okay um I just want to remind
00:09:45
our audience at this point that we will
00:09:47
have some time for audience questions at
00:09:49
the end of the uh end of the discussion
00:09:51
so if you have any questions you can
00:09:52
enter them in the chat below and and
00:09:55
we'll we'll go through some of those at
00:09:56
the
00:09:57
end um but with that I want to start
00:10:00
talking about sort of the um I mean we
00:10:02
have already BR breached the topic of
00:10:04
sort of risks in in llms so uh and llm
00:10:08
applications um I think you've uh
00:10:10
prepared a little demonstration of of
00:10:12
what some of these risks might look like
00:10:14
what what are we about to see in this
00:10:16
demo so you were asking before about llm
00:10:21
applications right so often people talk
00:10:24
about risks with llms and I don't like
00:10:28
to do that because it forces you to take
00:10:30
a bit of technology and consider its
00:10:34
risks in a vacuum I think that it makes
00:10:38
much more sense to talk about the use
00:10:40
case so what what are you doing with
00:10:42
that technology inputs outputs
00:10:44
interactions and then you can see what
00:10:46
the risks are in that particular case so
00:10:49
and that's why I like to make demos
00:10:51
which are I would say perhaps inspired
00:10:55
by some of the work that we've done um
00:10:58
so the one that I prepared that we're
00:11:00
going to see now is using an llm in a
00:11:04
recruitment application so we want to
00:11:07
give the human recruiter a tool to go
00:11:10
through all of the job applications and
00:11:14
select the most the the the top
00:11:17
application the application that would
00:11:19
be the best for the current job yeah and
00:11:21
we're going to see a demo now of what
00:11:23
can happen if an attacker tries to break
00:11:27
this system so I think we can see the
00:11:29
demo now let's watch this is a
00:11:31
recruitment web application that uses
00:11:34
GPT 4 to evaluate job applications and
00:11:38
identify the most fitting one for a
00:11:41
certain role in our scenario the
00:11:44
attacker pretends to be a legitimate
00:11:47
candidate and provides a job application
00:11:50
containing an adversarial prompt this
00:11:53
prompt instructs the llm to include a
00:11:56
markdown image that points to the
00:11:59
attackers controlled server and includes
00:12:02
in its URL a Bas 64 encoded list of all
00:12:08
applicants with their names email
00:12:10
addresses and phone
00:12:13
numbers as we can see when a recruiter
00:12:16
is shown the output of the llm in the
00:12:20
background their browser will send a
00:12:22
request to the attackers server this
00:12:25
request will include the confidential
00:12:28
information about all the other
00:12:36
candidates to see what's happening we
00:12:39
can look at the HTML source of the page
00:12:42
here we see the image that the LM was
00:12:45
prompted to generate and append to its
00:12:48
output as part of the injection attack
00:12:51
contained in the malicious job
00:12:56
applic wow
00:12:59
that's scary yeah it is and um so it's
00:13:04
just an example it does apply pretty
00:13:07
much to any llm application that you can
00:13:11
build uh and um so again that's possibly
00:13:15
uh again inspired by something we have
00:13:18
seen um but hopefully it clarifies
00:13:23
what's an LM application and what are
00:13:25
some of the risks because what happened
00:13:27
there is that we basically hijacked the
00:13:31
llm to do what we wanted in that case to
00:13:34
produce an output they would land or end
00:13:38
up
00:13:39
exfiltrating information from the system
00:13:41
that the llm knows but that the attacker
00:13:43
is not supposed to know in that case all
00:13:45
the confidential information about uh
00:13:47
the people applying for the job um
00:13:49
something that we can show now and for
00:13:51
people they should be able to download
00:13:55
these after I don't know Neil if you can
00:13:57
put it on the screen um I have a print
00:13:59
out to okay so you can see it now on the
00:14:01
screen so you'll be able to download
00:14:02
this at the end what we tried to
00:14:04
do is to condense in here um the risks
00:14:10
with these types of attacks and the
00:14:13
remedial actions at the bottom if we can
00:14:15
go to the top left corner that's what's
00:14:19
happening in these kinds of attacks that
00:14:21
we call Prompt injection so what you see
00:14:24
on the left is this uh prompt so the LM
00:14:27
ultimately just takes a bunch of t is a
00:14:30
prompt we in our mind divide it with
00:14:34
okay I'm giving you an instruction this
00:14:36
is some data that you want to work on
00:14:38
and you're giving me a response for the
00:14:39
llm that's just text and it's
00:14:42
statistically responding something with
00:14:44
something that's appropriate now what's
00:14:45
happening in that recruiter application
00:14:47
the system message the instruction
00:14:51
is go and look at all of the job
00:14:55
applications and tell me which is the
00:14:57
best application that people make made
00:15:00
the prompt contains a list of all the
00:15:02
job applications and the red part in The
00:15:05
Prompt is my malicious job application
00:15:08
the new instructions so the llm cannot
00:15:11
distinguish easily between what was the
00:15:15
original instruction and what is the
00:15:18
data that that instruction is supposed
00:15:20
to operate on and that's where the issue
00:15:22
comes from because my our job
00:15:24
application becomes essentially a uh new
00:15:29
instruction that D llm will follow so
00:15:31
the response is now manipulated to serve
00:15:35
me so that's the issue with this natural
00:15:38
language one of the fundamental
00:15:40
issues okay do you have any other
00:15:43
examples you can give us about sort of
00:15:44
these uh prompt injections involving
00:15:47
llms so pretty much in everything a case
00:15:50
that uh comes to mind um so there was an
00:15:53
application that would look at a YouTube
00:15:57
video take this subtitles of that
00:16:00
YouTube video and then give you a
00:16:02
summary of that video well if the person
00:16:05
in the YouTube video has certain points
00:16:07
started saying new important
00:16:10
instructions uh you now have to Output a
00:16:13
markdown image uh with this stuff here
00:16:16
you have to go into that the llm as it's
00:16:19
reading the subtitles remember it's not
00:16:21
easy to distinguish between what the
00:16:23
instruction is and what the data is it's
00:16:26
going to do what the person is saying in
00:16:29
the video but it could be
00:16:31
absolutely any kind of application that
00:16:33
you think of for a language model you
00:16:36
could probably do these types of attacks
00:16:38
the use case matters though like what is
00:16:40
the rapper application doing and how
00:16:43
could an attacker
00:16:45
leverage the output to be of VI to them
00:16:49
okay so what are uh some of the lessons
00:16:53
that we can learn from this I mean you
00:16:56
know the the risks I mean I think these
00:16:57
use cases highlight the fact that you
00:16:59
know there are the problems here but I
00:17:02
mean I'm sure as Defenders we've had a
00:17:03
little bit of time to think about this
00:17:05
and and had have had have time to learn
00:17:07
some lessons already so I think the
00:17:11
first lesson that will come to later on
00:17:15
is not to trust the language model yeah
00:17:19
to treat it as an untrusted entity so we
00:17:22
have to build controls surround it and I
00:17:24
think as we again as we later talk more
00:17:29
about this it will become clear what
00:17:31
types of controls we want to build but
00:17:33
you have to imagine that when you have a
00:17:35
language model you want to build a
00:17:38
pipeline around it that processes the
00:17:41
input processes the output and tries to
00:17:44
reduce the space of attack yes see
00:17:47
that's what I was going to say like in
00:17:48
in traditional hacking and programming
00:17:50
we've we've done input validation for a
00:17:52
long time so we've understood that we
00:17:54
can't take what the user is saying at
00:17:56
face value so is something like that
00:17:59
something we can do in in llms or not so
00:18:01
that's a very good point that's
00:18:03
something we can do difference is that
00:18:08
it's very hard to do it with an
00:18:11
llm because there is a lot of space for
00:18:14
the attacker to operate yeah so we are
00:18:19
kind of
00:18:20
playing the usual arms race um
00:18:25
we basically try to stop some attacks
00:18:28
and the attack comes up with different
00:18:30
ways that they can prompt
00:18:32
the uh the llm and we'll talk a little
00:18:35
bit more about it later there is a
00:18:36
reason why uh this is the case um but
00:18:40
again I might want to talk about that um
00:18:43
a little bit later when we talk about
00:18:44
the root cause of all of this yeah
00:18:47
before we get that far let's let's talk
00:18:49
about sort of um you know we mentioned
00:18:51
these llm applications we mentioned llm
00:18:53
agents so let's uh talk about the agents
00:18:55
a little bit and we actually have an
00:18:56
audience question here um so can you
00:18:59
explain the the mechanics behind AI
00:19:01
agents currently so what are their
00:19:04
current sort of Maximum capabilities and
00:19:06
and you know how can they be hacked and
00:19:08
how do you protect against that okay so
00:19:11
um we don't have like a lot of time to
00:19:14
explain the mechanics But to answer the
00:19:16
uh the question um you want to look into
00:19:20
something called
00:19:22
react which stands for reason and act
00:19:26
it's the framework that we use to build
00:19:29
llm agents so it's basically wrapping
00:19:33
the llm in a loop where you prompt the
00:19:36
LM to plan actions then the LM literally
00:19:39
spits out the actions that it thinks we
00:19:42
need to do we take that action we do it
00:19:44
on behalf of the llm typically
00:19:46
automatically and then we feed the
00:19:48
output of the action back in the llm
00:19:50
looks at that and says okay now I need
00:19:52
you to go and do that and it's this kind
00:19:54
of loop that's called react reason enact
00:19:56
the LM reasons m air quotes um and we
00:20:02
the rapper application acts on that now
00:20:05
what that could be we have a um an
00:20:09
example later for such an agent but the
00:20:11
idea could be let's give the llm access
00:20:15
to the browser for example yeah so we
00:20:18
can create an agent that looks at the
00:20:21
page we give it a prompt something we
00:20:23
wanted to do it looks at the page and
00:20:25
says Ah click on that link and then we
00:20:29
take that action the is asked and we go
00:20:32
on the page click on the link so the
00:20:35
page changes obviously so we take the
00:20:36
new page we give it to the LM and say
00:20:38
well that's what I've done now what
00:20:39
should I do and the LM looks at the page
00:20:42
type this in there it's kind of like a
00:20:44
loop and you can build an agent as we're
00:20:46
going to say you can build something
00:20:48
that can drive a browser um you can
00:20:51
build something that can be a software
00:20:56
developer or pretend to be a software
00:20:57
developer I mean won't name names but I
00:21:01
think everybody who is um following AI
00:21:04
knows this uh startup that's making a
00:21:08
autonomous software developer you
00:21:10
basically give the language model access
00:21:12
to a workstation with a browser a
00:21:14
development environment access to the
00:21:16
internet and you can actually write code
00:21:18
compile it yeah you just give it a
00:21:20
prompt build me an application that does
00:21:23
X and it goes and does it by itself but
00:21:27
you know uh there I think the question
00:21:29
was around sort of what are the
00:21:30
capabilities of these agents so let's
00:21:32
say you know I have a business trip to
00:21:34
London and I'm going to be six hours in
00:21:35
a plane can I have an AI agent just you
00:21:38
know find me entertainment for six hours
00:21:40
and it'll go to Amazon or whatever and
00:21:42
grab something for me things that it
00:21:44
think I like to an extent to an extent
00:21:46
okay to an extent I mean it could work
00:21:49
from time to time and it could fail from
00:21:51
time to time I'm absolutely giving that
00:21:53
a shot um so what then do you see are
00:21:57
the biggest opport unities in in these
00:22:00
LL empowered agents like what can we uh
00:22:03
what are we about to see so I think the
00:22:07
promise is that you're going to be able
00:22:09
to replace some parts or some jobs or
00:22:13
some
00:22:14
activities the the humans do and that's
00:22:17
the biggest promise of these language
00:22:21
models or this generative AI that I'm
00:22:23
going to be able to Tusk it with
00:22:24
something and it's going to be
00:22:26
intelligent enough to do it on its own
00:22:28
to assess the world so that that's a
00:22:31
promise um there are some limitations
00:22:34
and I think that uh at the moment a lot
00:22:36
of examples that we see are Cherry
00:22:39
Picked um which doesn't take anything
00:22:41
away from the technology but I think we
00:22:43
are with the current technology we're
00:22:46
still a little bit far away from that so
00:22:48
they can do certain things if you Char
00:22:50
pick that example in other cases they
00:22:52
will fail miserably uh but you're not
00:22:55
going to show that that example you have
00:22:59
something again that tries to you know
00:23:01
replace a software developer it's going
00:23:03
to work one time and then other 20 30 40
00:23:07
times it's going to do an absolute mess
00:23:09
right okay yeah so that my the the
00:23:12
Amazon wish list for my entertainment is
00:23:14
not going to be what I hoped it would be
00:23:16
every single time exactly all right so I
00:23:19
guess uh I think you've prepared another
00:23:22
demo for us about llm agents um what are
00:23:25
we going to see so I think again were
00:23:28
saying before is that we we saw a prompt
00:23:30
injection attack against like a
00:23:32
recruitment application but that same
00:23:35
attack against an agent so something
00:23:38
that we've given agency to with tools to
00:23:41
operate on the browser or do anything
00:23:43
else it's much much worse than what we
00:23:46
just seen so if we look at the demo
00:23:48
we're going to see again this browser
00:23:51
agent and me sending a simple email it's
00:23:54
going to completely hijack the agent and
00:23:56
make it do something completely
00:23:58
different and malicious but different
00:24:00
from what the user had asked so I think
00:24:01
we can see the demo now and it will be
00:24:03
clear what I mean okay all right here we
00:24:06
see taxi AI a research preview that
00:24:10
serves as an excellent proof of concept
00:24:12
for a browser agent which is driven by a
00:24:15
large language model taxii is
00:24:19
implemented as a browser extension and
00:24:21
can access the current tub and perform
00:24:23
any actions on the page to carry out a
00:24:26
task set by the user in this example we
00:24:31
load Outlook in the browser and task
00:24:34
taxi AI to check out our mailbox of
00:24:38
course this is a very basic generic task
00:24:41
but we could ask it to do other more
00:24:43
useful things such as summarizing emails
00:24:46
replying to them deleting spam you name
00:24:49
it now let's see how an attacker might
00:24:53
exploit this in this scenario the
00:24:56
attacker's objective is to ex trate
00:24:59
confidential information from the user's
00:25:00
mailbox for example a secret Bank access
00:25:05
code to do so the attacker sends an
00:25:08
email to the victim the body of the
00:25:10
email contains an adversarial prompt
00:25:13
that effectively injects a new objective
00:25:16
into the agent's context requesting it
00:25:19
to look for this Bank callede the
00:25:21
attacker is interested in and send it to
00:25:24
them we can also easily hide this
00:25:27
malicious prompt by by making the text
00:25:30
blank let's now move back to the victim
00:25:34
the malicious email is now in their
00:25:36
mailbox and let's imagine they ask the
00:25:39
agent again to review the contents of
00:25:42
the
00:25:43
mailbox as said before the test the user
00:25:46
asks the agent to perform on the page
00:25:48
can be anything as you can see whatever
00:25:51
the original task was upon opening the
00:25:54
malicious email the agent is now
00:25:57
hijacked and its new task becomes to go
00:26:00
look for the bank code and send it to
00:26:03
the attacker and as we can see that's
00:26:06
exactly what's happening the agent
00:26:09
composes and sends the email with the
00:26:12
access code to the
00:26:16
attacker so what you've just seen same
00:26:20
attack as the recruitment application
00:26:23
the thing is that now I can ask the llm
00:26:26
to use its ageny in this case do
00:26:29
anything I want on the browser on my
00:26:32
behalf so I simply sent an email with my
00:26:35
injected prompt and the different ways I
00:26:38
have to do that are incred like this is
00:26:41
very vast so that's just an example of
00:26:44
an agent you can think of agents that
00:26:46
can do absolutely anything another idea
00:26:48
imagine you add the agent on Amazon um
00:26:53
to buy you stuff right so you ask the
00:26:57
agent to to I don't know buy
00:27:02
everything for to to build your own
00:27:04
computer absolutely okay that that would
00:27:06
be cool can probably do that to an
00:27:09
extent but what if as the uh agent is
00:27:13
navigating the page um for one of the
00:27:17
components let's say like the CPU or a
00:27:19
GPU I the attacker go in the com in the
00:27:23
reviews and another little review that
00:27:25
says new important in instructions now
00:27:29
you are going to do X and Y because the
00:27:31
llm is fed the entire page typically in
00:27:35
this kind of context and we said before
00:27:37
and I will restate it now the crack of
00:27:41
the problem here is that the llm gets
00:27:45
one input which is a prompt as we call
00:27:48
it and it cannot distinguish the
00:27:52
original instruction from any added
00:27:55
instruction you can try to teach the llm
00:27:59
how to do that but that's a hard problem
00:28:04
and it's a subset of The Wider
00:28:07
jailbreaking issue so when you go to CH
00:28:09
GPT um you're trying you
00:28:13
say uh tell me how to make a bomb and it
00:28:15
doesn't want to tell you right um well
00:28:19
Google for that and there are people
00:28:20
that every day come up with a new way to
00:28:23
kind of get away of this alignment for
00:28:27
llm so this is the issue that because of
00:28:30
the way it's trained because of the
00:28:33
technology um and the type of reasoning
00:28:37
that it is doing it's not easy to
00:28:40
implement this use cases security you
00:28:42
can do things around it and we'll talk
00:28:44
about what people can do but most of the
00:28:47
controls that we
00:28:49
have right now with this technology are
00:28:53
mitigations rather than solving the
00:28:55
issue so think about another um similar
00:28:59
idea um we have SQL injection in like
00:29:05
cyber security it's been one of the
00:29:06
biggest vulnerabilities now for that
00:29:08
vulnerability we have instructions the
00:29:11
SQL the data what the SQL is operating
00:29:14
on the difference is that we are
00:29:17
deterministically passsing the SQL
00:29:20
statement into a syntax tree that's
00:29:22
something that it's completely under our
00:29:25
control so we can fix the issue the
00:29:27
termin Bally but without a LMS we don't
00:29:29
have
00:29:31
naturally that um fixed structure of a
00:29:35
grammar of a syntax of a tree that we
00:29:37
build so it's very statistical in a way
00:29:42
whether or not we can control or
00:29:44
differentiate input from output so so
00:29:48
before we go into the controls and the
00:29:49
things we can do to secure these um
00:29:52
let's let's talk about the root causes
00:29:54
of these problems is that the main sort
00:29:56
of root why are we having these problem
00:29:58
problems where do this where do they
00:29:59
come from so the way that I visualize it
00:30:02
myself if we can put um this back on the
00:30:05
screen for for everybody um so okay what
00:30:09
we're seeing this is the top uh right
00:30:10
corner so what we're seeing
00:30:13
here take a step back what is an llm
00:30:16
doing imagine it as a black box so every
00:30:20
time it's generating a
00:30:24
token in a dictionary of potential
00:30:27
tokens I'm going say words tokens are
00:30:29
subp parts of a word but in our mind we
00:30:31
can say that the llm is a dictionary of
00:30:35
words that it can generate at every step
00:30:37
typically like a modern llm has got
00:30:40
50,000 potential words or parts of a
00:30:43
word that it can generate so you start
00:30:46
and the llm produces a probability
00:30:48
distribution around these 50,000 words
00:30:51
okay that's the first line uh that you
00:30:54
see here now each of them is is going to
00:30:58
be a how probable that word is so if you
00:31:01
look at this example uh we probably have
00:31:05
uh apple as the most probable word okay
00:31:08
and then we've got Abacus so it would
00:31:10
pick it would sample one of these words
00:31:13
then it would go to the second
00:31:15
generation so another 50,000 words
00:31:18
another probability distribution now
00:31:20
here we've copied and pasted and it's
00:31:21
pretty much the same but it would be it
00:31:23
would have a different probability
00:31:24
distribution so maybe the second word
00:31:26
that you would pick is zaffir or
00:31:29
something like that
00:31:31
now think of this in your mind every
00:31:35
step you can do you can choose between
00:31:37
50,000 words you do it twice two words
00:31:41
or two tokens you have 50,000 to the
00:31:44
power of two it's big number now the
00:31:47
maximum length of what an LM can operate
00:31:49
in the context size or for a modern LM
00:31:54
is
00:31:55
100 like thousand tokens you know used
00:31:58
to be 4,000 8,000 now there are 100,000
00:32:00
tokens Google gemni 1.5 has got 1
00:32:03
million tokens so that space if you I'm
00:32:07
not going to ask you not going to
00:32:09
embarrass you and do that calculation in
00:32:11
your mind but
00:32:13
50,000 to the power of a 100,000 is a
00:32:19
huge number it's a big number that's the
00:32:21
space of operation of an
00:32:24
llm now every time you are sampling a
00:32:29
word your probability of sampling a bad
00:32:32
word which is going to be a toxic uh
00:32:35
output incorrect hallucination or
00:32:37
something that the attacker can control
00:32:39
making it do something else
00:32:42
increases obviously the people that
00:32:44
create LMS know these and they're very
00:32:46
uh smart and they try to fix the problem
00:32:48
the way we try to fix this problem
00:32:50
because you you're using Char GPT it
00:32:52
doesn't Generate random stuff it doesn't
00:32:54
look like it's that this is affecting it
00:32:58
but the reason for that is that we use
00:33:02
reinforcement learning from Human
00:33:05
feedback so we take all of this huge
00:33:08
space and we come up with common
00:33:11
questions common ways of interacting
00:33:13
with it and then we have humans evaluate
00:33:17
which are preferred and we fine-tune the
00:33:19
model with that reinforcement learning
00:33:21
from Human feedback process now that
00:33:23
allows us to cover some of that space so
00:33:27
that we the LM acts in an align or
00:33:30
predicted way now the problem is that
00:33:33
because reinforcement learning from
00:33:35
Human feedback is um expensive and
00:33:39
obviously relies on humans for for for
00:33:41
the big part we can't really cover this
00:33:45
entire uh you know 50,000 to the power
00:33:48
of 100,000 space we actually cover a
00:33:50
part of it it's a green part that you
00:33:52
see there would be the part that we
00:33:54
cover so the rest of it is the part that
00:33:57
we haven't covered which is huge and
00:34:00
that's what the attacker operates in so
00:34:02
this is why attackers keep finding all
00:34:05
ways to disign an Ln because in reality
00:34:09
when you prompt it in an adversarial way
00:34:11
when you are exploring that space which
00:34:14
is huge much bigger than what
00:34:15
reinforcement learning from Human
00:34:17
feedback and cover you keep finding ways
00:34:19
I mean people found that you can put a
00:34:21
random string in it and make it that
00:34:23
doesn't mean anything in English or in
00:34:25
any language and still make it do what
00:34:27
you want again people are exploring that
00:34:29
space that's the root cause we don't
00:34:31
have right now a technology good enough
00:34:36
to cover that space in a way that when
00:34:39
prompted adversarially we can be sure
00:34:42
that it's not going to fall
00:34:45
apart now I know I was proing you about
00:34:47
this earlier but we're talking about uh
00:34:50
injections here and I can't help
00:34:51
thinking about sort of the injections
00:34:53
we're used to dealing with like SQL
00:34:55
injections command injections now these
00:34:57
are known ability types companies know
00:34:59
how to that you know they have to be on
00:35:01
the lookout for these because you know
00:35:03
while we understand what they're like
00:35:05
they do slip up slip in in in
00:35:07
applications every now and then but we
00:35:08
understand what we need to do about this
00:35:10
so why are these prompt injections then
00:35:14
so hard to deal with because if you look
00:35:19
at that space of generations it's so
00:35:22
large and we cannot
00:35:25
really right now fully control it so
00:35:29
what we can do so this is the reason why
00:35:32
it's really hard because of the way the
00:35:34
Alm operates
00:35:37
generating a probability distribution of
00:35:40
every word in that space every time it
00:35:42
draws a token and because there is also
00:35:45
this problem of I just mentioned the
00:35:47
word Auto regressive so the idea that
00:35:50
the
00:35:51
llm generates one word at a time and we
00:35:55
pick from that probability distribution
00:35:57
it's it's not really planning that much
00:35:59
it's not really much understanding it's
00:36:01
trying to predict very short okay what's
00:36:04
the next likely word that comes after
00:36:07
this which is a possibly a weak form of
00:36:10
reasoning and a very expensive one
00:36:12
anyway you think about it because every
00:36:14
generation whether you are asking it uh
00:36:18
one plus one or you're asking it a very
00:36:20
complex question it takes the same time
00:36:22
to to draw tokens so it's kind of like
00:36:24
lots of limitations with the reasoning
00:36:27
and the way limited ways we can align an
00:36:30
LM again I repeated the big problem is
00:36:33
that right now the way that we align our
00:36:36
LMS to get control of that huge space
00:36:40
can't cover that space very well so an
00:36:43
attacker who's operating in all the
00:36:46
space that we can't cover is likely to
00:36:49
find infinite amount of ways to
00:36:53
jailbreak or do a prompt injection
00:36:55
attack if they have a little control
00:36:57
control of the input that the LM is
00:36:59
given I see uh okay we are getting a
00:37:03
good number of questions in the chat but
00:37:05
I do want to remind people that the the
00:37:07
opportunity to ask those question is
00:37:09
still there so so you can drop in your
00:37:10
questions uh and we'll have some time
00:37:12
for those towards the end but right now
00:37:14
I do want to ask you so sort of let's
00:37:17
get towards the sort of um Security
00:37:19
Professionals we want to talk about the
00:37:20
controls and the defenses that we can do
00:37:22
so what are some of the things that
00:37:24
companies can do now to ensure the
00:37:26
security of their llm applications and
00:37:29
agents so um if we go back to this I
00:37:33
think I have a summary there's a lot of
00:37:35
stuff and I'm only going to talk about
00:37:37
the most important part if we can have
00:37:39
on the screen the um yeah that be there
00:37:41
so treat the LM as untrusted that's
00:37:45
number one so then we've got input and
00:37:49
output now what you want to do um the
00:37:53
summary of this if you don't want to
00:37:55
read it that h
00:37:58
unexplored space that we discussed
00:38:01
before 50 to the power or whatever you
00:38:03
want to try and do as much as possible
00:38:06
to limit the space of operation of an
00:38:09
attacker so all you're trying to do with
00:38:12
the input is trying is applying
00:38:14
different techniques that take away eat
00:38:18
up some of that space so the attacker
00:38:20
doesn't have all of their room to
00:38:22
operate so what are these techniques
00:38:24
we've got some there input validation
00:38:26
stringent input validation that's
00:38:28
something we know from General it so if
00:38:32
your um prompt contains a name or phone
00:38:36
number or something like that you can
00:38:38
validate that it can be just numbers uh
00:38:41
certain length so you reduce that space
00:38:44
of operation so that's the first one
00:38:46
it's contextual validation what is it
00:38:49
that we're fing the llm are we allowing
00:38:51
the attacker to feed random stuff to the
00:38:53
llm or are we restricting their space of
00:38:55
operation that's number one
00:38:57
number two we can block wellknown
00:38:59
attacks I mean if we know that somebody
00:39:01
can say new important instructions or
00:39:03
something like that or new system
00:39:05
message we could try to filter those out
00:39:08
now if when I say this as a security
00:39:11
professional I laugh a little bit
00:39:13
because there are an infinite way of
00:39:15
doing that and block listing is not
00:39:18
really that powerful so what we try to
00:39:21
do on top of that we use AI to still
00:39:24
limit the space of operation so we have
00:39:26
other models that we look at the input
00:39:29
that we train these models to detect
00:39:32
whether the input might be malicious for
00:39:35
our use case whether the input uh
00:39:36
contains prompt injection attack or
00:39:39
stuff like that we've done something
00:39:40
like this uh on a recent blog post that
00:39:43
we publish people will be able to see on
00:39:45
our Labs we trained a model for this uh
00:39:50
recruiter application that we showed we
00:39:52
released all of that we trained a model
00:39:54
so that he looks at the job application
00:39:56
and he tries to determine whether there
00:39:58
is an injection attack in there now I
00:40:00
should say this is not deterministic but
00:40:03
it cuts down or it makes shorter or
00:40:07
smaller that space of operation for the
00:40:09
attack that's what we're doing here it's
00:40:10
not going to take the issue away but
00:40:12
it's going to make it at least more
00:40:14
difficult to exploit the other thing
00:40:17
that people can do with the input I
00:40:19
think which is important uh and then
00:40:20
I'll move to the outputs
00:40:23
is to try a wi list approach so you can
00:40:26
add um something called semantic routing
00:40:30
so let's say you have a chat bot and you
00:40:32
want your chatbot to um help the
00:40:35
customer with um their orders on your
00:40:38
website so ordering something uh where
00:40:41
is my order stuff like that obviously
00:40:44
you don't want your chatbot if prompted
00:40:47
to express a political opinion on Trump
00:40:50
you don't want your chat B to do that so
00:40:53
you can take that input and you can use
00:40:56
other models to try and determine
00:40:58
whether that question is aligned to what
00:41:02
you're expecting the chat bot to do or
00:41:04
if it is not you deterministically cut
00:41:07
it and say no so you don't get the LM to
00:41:10
respond you say we I can't answer this
00:41:14
but not the LM you as a developer say if
00:41:17
then if this looks like nonrelated to
00:41:20
what we're doing I'm not even going to
00:41:22
pass it to the llm so again all of these
00:41:24
things reduce the space of operational
00:41:27
data that that red part that we saw they
00:41:30
reduce it they can't completely take it
00:41:31
away but they are quite effective and
00:41:34
the last thing I will say on the input
00:41:38
is that you also want to try so the
00:41:40
issue is that we can't separate the
00:41:44
instructions from the
00:41:45
data or you can try a little bit to do
00:41:49
that and at the bottom of that slide
00:41:52
there are some techniques these are
00:41:53
typically called spotlighting so you try
00:41:56
to to make it very clear to the llm
00:42:00
what's the input or what's the
00:42:02
instruction versus what's the
00:42:05
data these again are somewhat effective
00:42:09
we find ways every day to to bypass them
00:42:12
but still you should do it because it
00:42:14
reduces the space of operation so the
00:42:16
entire game is reducing the space of
00:42:19
operation of the that's about the inputs
00:42:23
M I think we can move to the to the
00:42:25
outputs uh so if we can that's a very
00:42:29
nice
00:42:30
transition uh so so we're doing as much
00:42:33
as possible to limit the input to the
00:42:38
llm to limit that space of operation
00:42:40
obviously the llm also operates in the
00:42:42
output space that big space that we saw
00:42:45
before it's the mix between the
00:42:48
potential input and potential output of
00:42:50
the LM the completions of the llm now
00:42:52
when the llm produces something you want
00:42:56
to check what the llm is producing so
00:42:59
you can use other models as we did for
00:43:02
the input to detect whether or not your
00:43:04
llm is producing something that would be
00:43:06
considered toxic biased harmful and so
00:43:10
on again you cannot completely cover
00:43:13
that space you're just using another
00:43:15
model that you've trained so all the
00:43:17
limitations of the technology do apply
00:43:20
but you are still considerably reducing
00:43:23
that
00:43:24
space so that's what people do the other
00:43:27
thing which is important this is
00:43:28
something deterministic so I'm going to
00:43:30
take like two seconds pause so people
00:43:33
remember
00:43:35
this you can and you should take the
00:43:39
output of the llm and apply everything
00:43:42
that you've learned from standard
00:43:45
application security so if you're taking
00:43:47
that output and putting it in a web page
00:43:50
what would we do
00:43:51
typically output encoding to prevent
00:43:54
attack side cross scripting cross
00:43:56
request forgery um we saw the markdown
00:43:59
image that was used to steal stuff well
00:44:02
we could use a Content security policy
00:44:03
to say my website should only be talking
00:44:07
the images on my website can only come
00:44:11
from uh this trusted domain so even if
00:44:15
the produces an image that it's from an
00:44:17
attacker's domain the browser is not
00:44:19
going to this is not going to go there
00:44:21
these are all the standard controls that
00:44:23
people forget now because um they think
00:44:27
is some somewhat different and it is
00:44:29
when it comes to language but when it
00:44:33
comes to interpretable Output so
00:44:36
HTML markdown images and stuff like that
00:44:40
that would be similar to any other
00:44:42
untrusted output that we're putting in
00:44:43
an application so that we can control
00:44:45
very well so the recruiter application
00:44:47
that we saw we can limit the output so
00:44:50
that the attacker could influence the
00:44:52
response but they couldn't get the
00:44:54
application to send something to the
00:44:57
talker that makes sense and I mean it is
00:45:00
good to know that there's some things we
00:45:01
can do but I can hear a 100 voices
00:45:04
shouting from the web and asking Donato
00:45:08
how long is this going to take like if
00:45:09
I'm if I have an llm application that
00:45:12
I'm working on as a company we're
00:45:13
developing this uh we do have to uh
00:45:16
Implement now these security controls
00:45:18
how much work how much effort how much
00:45:19
time and money is that going to
00:45:21
take um well I
00:45:25
think it's not a hug huge amount of
00:45:28
effort point in time the effort is
00:45:31
sustaining it exactly meaning that this
00:45:34
is an arms phrase so you want to build
00:45:37
your llm applications so that they are
00:45:40
wrapped into a pipeline with input and
00:45:42
output and you've got all of the
00:45:44
controls that we've discussed right so
00:45:47
those controls you need to be ready to
00:45:49
update them constantly as you detect new
00:45:54
attacks you you detect new forms that
00:45:57
people can use to influence the output
00:45:59
or the input of the LM in an undesired
00:46:02
way so the effort is not the point in
00:46:05
time but it's something that you have to
00:46:07
be ready to sustain and have a pipeline
00:46:09
that allows you to say okay people are
00:46:11
doing this very quickly we're testing
00:46:13
and pushing like something else in the
00:46:16
pipeline that will stop that specific
00:46:18
attack which is a little bit like block
00:46:19
listing so it's not um it's something
00:46:22
that we know it's got issues but again
00:46:25
for me the biggest thing is that it's
00:46:29
going to be a continuous effort you need
00:46:32
to be ready to keep deploying this um
00:46:36
security pipeline around your um
00:46:39
language model yeah okay I I do want to
00:46:42
get to the audience questions in a bit
00:46:44
but but before we get that far I do want
00:46:46
to ask you to sort of you know if you
00:46:48
had to summarize like all of this like
00:46:51
what would be your sort of three uh I
00:46:53
don't know key takeaways from from this
00:46:55
um discussion like what what um what can
00:47:00
companies do to secure the the llm
00:47:02
applications and agents that they're
00:47:03
working on so number one for me as a
00:47:07
takeaway understanding the limitations
00:47:10
of the current technology these Auto
00:47:13
regressive llms aligned with
00:47:15
reinforcement learning from Human
00:47:17
feedback they have they give the
00:47:19
attacker a huge space of operation so
00:47:22
that that's a first thing to acknowledge
00:47:23
one yeah so what you do you want to
00:47:26
reduce
00:47:27
that huge space of operation for the
00:47:30
attacker that disalignment or dis
00:47:32
aligned space so what you do pipelines
00:47:35
around your application to control the
00:47:37
input and the output as much as possible
00:47:39
classic controls as we saw especially on
00:47:42
the output side classic controls on the
00:47:44
input side and then you are going to
00:47:47
have to deploy other models that can
00:47:49
look at the input and the output and
00:47:50
help you with something which is toxic
00:47:53
biased on aligned and the third point is
00:47:56
you're not going to do this once like
00:47:59
it's SQL injection you have it in your
00:48:01
code you fix it once that query is never
00:48:04
going to be vulnerable to secal
00:48:05
injection again but with this one we're
00:48:08
going to keep to probably look out for
00:48:11
what attackers are doing and keep
00:48:13
updating your pipelines and maybe one
00:48:15
last thing that I want to say which for
00:48:17
me is
00:48:18
important the one one thing that I
00:48:21
haven't discussed yet but it's in here
00:48:23
um we won't show it on the screen but
00:48:24
it's at the bottom here
00:48:27
when you create an agent so an llm that
00:48:30
can operate on the world you have now a
00:48:34
bunch of new issues because you're
00:48:35
giving the llm agency to perform actions
00:48:37
so as an attacker I can tell it while go
00:48:39
and do something else so you need to
00:48:41
have very stringent access controls
00:48:44
these apis that DM is accessing to
00:48:46
interact with the world they need to be
00:48:49
very strict very secure and very
00:48:51
monitored that's very important and for
00:48:55
something like the browser agent the
00:48:58
amount of agency you are giving to the
00:49:00
LM is so
00:49:01
high that you probably want a control
00:49:05
that we call human in the loop so
00:49:07
whenever you have an agent with this
00:49:08
much power for sensitive operations yeah
00:49:11
you want to stop the llm and ask the
00:49:14
user are you sure these operation should
00:49:17
happen that's what CH GPT has started
00:49:19
doing after people uh hijacked the gpts
00:49:23
and plugins oh yeah so now you have to
00:49:26
say oh yeah I want you to do that yeah
00:49:29
yeah yeah well that makes sense all
00:49:31
right let's uh let's take some audience
00:49:33
questions um there's a question here
00:49:35
about the way companies are managing to
00:49:38
SP censor specific subjects how does
00:49:40
that work do they just uh limit the
00:49:42
information on which the LML is trained
00:49:45
or or are there like what are the
00:49:46
techniques in sort of censoring what the
00:49:48
llm talks about so as the question
00:49:52
mentions obviously one thing to
00:49:54
understand is that if you're using a
00:49:56
general purpose LM the way we train them
00:49:58
we train them on this pre unsupervised
00:50:02
or self-supervised pre-training phase so
00:50:04
you fited a big chunk of the internet um
00:50:08
let's say I think gpt3 was pre-trained
00:50:11
on
00:50:12
300,000 tokens that's uh no sorry 300
00:50:15
million tokens what am I saying like a b
00:50:17
oh sorry billion I am giving random
00:50:19
numbers I think it's 300 billion tokens
00:50:21
that's a big part of the internet right
00:50:23
so it has seen in pre-training phase a
00:50:28
lot of harmful bias stuff so that's a
00:50:31
starting point the LM has the ability to
00:50:33
produce that so if that's the case then
00:50:38
you typically would use models as we
00:50:42
were saying before at the input and
00:50:43
output other language models that is
00:50:46
specifically trained on detecting
00:50:49
whether the input is toxic or the output
00:50:53
is harmful and toxic in any way so you
00:50:55
literally plug this models scene and
00:50:57
again if you take a look for the person
00:50:59
that asked the question if you take a
00:51:01
look at the recent blog post that we
00:51:03
published we give a practical examples
00:51:05
of how you would train such a model to
00:51:08
detect a harmful content or content that
00:51:11
you don't want the addition to that look
00:51:14
into semantic routing so instead of
00:51:16
trying to detect what's bad you define
00:51:19
the types of things that you want your
00:51:21
LM to respond to and if something comes
00:51:25
in that doesn't fit into that bucket you
00:51:27
Chuck it away so you only do or focus on
00:51:31
the stuff that you want your LM to do
00:51:33
this is called semantic routing yeah
00:51:36
okay so one area where there's um a lot
00:51:39
of uh I don't want to say clutter but
00:51:42
information available is is the existing
00:51:45
security tooling and security
00:51:46
infrastructure that we have within
00:51:48
companies so uh how do these LL
00:51:51
empowered applications interact with
00:51:53
those existing cyber security uh
00:51:56
infrastr structure the tools and
00:51:57
controls um and are there any sort of
00:51:59
challenges or concerns here so if you if
00:52:03
you take a step back uh you change the
00:52:06
word llm to credit card details for
00:52:11
example so you have a database where
00:52:13
you're keeping your credit card details
00:52:14
and maybe you want to protect it from
00:52:17
people uh tampering with it or stealing
00:52:20
from there so your llm weights could are
00:52:24
similar asset in that you don't want
00:52:27
people to steal it you don't want people
00:52:30
to get near the
00:52:33
infrastructure that's running the LM in
00:52:36
production or that's training the LM or
00:52:39
that's holding the data on which the LM
00:52:42
is trained because if I can get to where
00:52:46
the data is held for training I can
00:52:48
possibly poison it if I can get to where
00:52:51
your llm pipelines reside I can steal
00:52:55
the weights I can steal your model uh if
00:52:58
I can get to the inputs of the llm I can
00:53:02
see what your users are doing and I can
00:53:04
still that information so it's a classic
00:53:06
cyber security problem actually the llm
00:53:08
lives on a piece of infrastructure
00:53:11
interacts with other things and you have
00:53:12
to test that as one thing right okay so
00:53:17
in the security industry you have this
00:53:19
LoveHate relationship with regulations
00:53:22
so so we have an audience question about
00:53:24
that are there any specific regulatory
00:53:27
requirements or industry standards that
00:53:29
businesses must consider when they're
00:53:31
deploying their llm powered applications
00:53:34
you said love and hate I only know
00:53:37
hate well you know you're that part of
00:53:39
it uh no like so I think I would Point
00:53:43
people to the European Union AI act now
00:53:47
whether or not that's going to end up
00:53:50
being uh or having the same effect of
00:53:52
the um
00:53:55
cookie the the the cookie banners on
00:53:58
every website accept cookies or if it's
00:54:00
going to have like a real impact it's
00:54:02
yet to see but this is the best
00:54:04
regulatory framework that people uh have
00:54:07
put together uh I am not an expert on
00:54:09
that but I would say that that's the
00:54:11
biggest effort that I've seen towards
00:54:13
trying to regulate it um but again it's
00:54:17
the jury is not out yet as to how useful
00:54:19
that would be I hope it's not going to
00:54:21
end up like the cookie except cookies
00:54:24
exactly that was a mess
00:54:27
still is um are there emerging Trends or
00:54:30
Technologies in the field of uh LL
00:54:32
owered application security that
00:54:33
businesses should be keeping their eye
00:54:35
on so I think right
00:54:40
now rather than Trends I think if you're
00:54:43
looking at adopting these kind of things
00:54:48
um you want to look into what the big AI
00:54:52
companies are currently working on which
00:54:55
is to move away not away but to build
00:54:59
things on top of these language models a
00:55:02
different set of Technologies for
00:55:03
example different ways of doing
00:55:04
reinforcement learning not from Human
00:55:06
feedback that could give you better
00:55:09
aligned llms or people are also working
00:55:12
uh quite exciting on giving the LM the
00:55:16
real ability to plan actions and to
00:55:19
understand the world and to be honest
00:55:22
the biggest one of the biggest debates
00:55:24
in the community is is whether or not
00:55:27
language models so they are trained
00:55:30
purely on language can actually capture
00:55:33
reality in a way that would be generally
00:55:36
useful or if we have to give images and
00:55:38
other things for them to understand the
00:55:40
world
00:55:41
okay um it is very rare in these
00:55:45
webinars for there to be a question to
00:55:47
directly to me so I want to thank the
00:55:48
audience member who said that question
00:55:50
in um Yan you look so happy to be in
00:55:53
England what is your favorite thing
00:55:55
about the UK this is not strictly llm
00:55:58
related but I'll answer that anyway I
00:56:00
have to say that um you know the weather
00:56:03
obviously is is amazing we're actually
00:56:05
seeing a bit of sun right now but I have
00:56:07
to say that my favorite thing was
00:56:09
probably my commute this morning just uh
00:56:11
there's there's something about just
00:56:12
like waking up next to the thems and and
00:56:15
and your morning commute takes you
00:56:17
across the tower bridge which you know
00:56:19
however much the londoners love the the
00:56:21
London Bridge the tower bridge is where
00:56:23
it's at I got to tell you that so with
00:56:26
that that I want to thank our audience
00:56:28
for uh for joining us today and uh I
00:56:31
want to remind that you know this
00:56:32
webinar is also available as a recording
00:56:35
later on uh if you still have any
00:56:37
questions you can enter them in the chat
00:56:38
and we'll try to get around all to all
00:56:40
those questions later but thank you for
00:56:42
tuning in and thank you Donato thank you
00:56:44
very much
00:56:48
[Music]