[Webinar] Building LLM applications in a secure way (WithSecure™)
Summary
TLDRIn this webinar, expert Donato Capitella discusses the development and implications of large language models (LLMs). He describes their roots in language translation technology and the foundational transformer architecture that powers today's advanced models. The presentation covers the broad applications of LLMs, such as aiding in recruitment processes and the potential of autonomous agents that can perform tasks with minimal human oversight. However, Donato warns of security risks, such as prompt injection attacks, which exploit vulnerabilities in how LLMs interpret input. He emphasizes the importance of robust security measures, reiterating the need for companies to treat LLMs as untrusted components, implement strict monitoring, and adapt continuously to emerging threats. Overall, while LLMs present exciting opportunities, they require careful management and security considerations.
Takeaways
- 🧠 LLMs evolved from advancements in language translation and transformer technology.
- ⚙️ Prompt injection attacks pose significant risks to LLM applications.
- 🔒 Companies should treat LLMs as untrusted entities and implement security controls.
- 📈 Continuous monitoring and updates are essential to mitigate risks associated with LLMs.
- 🤖 Autonomous agents could change how LLMs interact with the external world.
- 🔍 Deploy models to check for biased or harmful outputs from LLMs.
- 🛠️ Input validation is crucial in reducing the risk of injections.
- 🗣️ Understanding the limitations of LLMs is fundamental for their secure deployment.
Timeline
- 00:00:00 - 00:05:00
The webinar begins with Yanen introducing expert Donato Capitella to discuss large language models (LLMs). They highlight the surge of interest in LLMs like ChatGPT and open with a conversation about their origins and the advancements in machine learning that preceded them.
- 00:05:00 - 00:10:00
Donato elaborates on the history of language models, explaining that the early challenges involved translating sentences from one language to another and how this led to innovations in the encoder-decoder architecture over the past decade. He then introduces the attention mechanism which has become crucial in modern NLP tasks.
- 00:10:00 - 00:15:00
The discussion transitions into the introduction of Transformers, developed by Google in 2017, which combined previous advancements in encoder-decoder models and attention mechanisms, thereby allowing for effective parallel processing of data important for training sophisticated LLMs.
- 00:15:00 - 00:20:00
The application of LLMs is discussed, defining applications as wrappers around the language models for specific use cases. Examples given include the utilization of LLMs for generic tasks like summarization or the more specific use cases like enhancing features in services like Google Docs.
- 00:20:00 - 00:25:00
Donato shares his viewpoints on promising future applications of LLMs, targeting the development of autonomous agents capable of interacting with their environment and performing tasks autonomously without human oversight, hinting at the potential evolution of embodied AI.
- 00:25:00 - 00:30:00
The safety of LLM technologies is addressed, specifically referencing misconceptions surrounding their security risks likened to the science fiction notion of 'Skynet'. Donato insists that current LLM technologies lack true consciousness and discusses potential security challenges they face.
- 00:30:00 - 00:35:00
Yanen prompts Donato to showcase a demonstration related to LLMs and risks. The demo illustrates a recruitment scenario where an attacker exploits a job application through prompt injection to extract confidential candidate information from the system.
- 00:35:00 - 00:40:00
Demonstrating how LLM applications can be exploited, Donato illustrates how attackers can leverage vulnerabilities within LLM systems—most notably through prompt injection, which manipulates LLM outputs to exfiltrate sensitive information.
- 00:40:00 - 00:45:00
The discussion expands into potential real-world threats presented by LLMs, showcasing various examples of prompt injection risks, followed by emphasizing the importance of treating LLM output as untrusted and the need for strict security measures to handle these vulnerabilities.
- 00:45:00 - 00:56:51
Yanen invites Donato to share insights on essential safeguards that organizations can implement to secure their LLM applications and highlights the need for continuous monitoring and updates to the security framework for LLM use cases.
Mind Map
Video Q&A
What are large language models (LLMs)?
LLMs are advanced AI systems, like ChatGPT, designed to understand and generate human-like text.
How did LLMs evolve?
LLMs evolved from advancements in language translation and the introduction of transformer architecture with attention mechanisms.
What are the main risks associated with LLM applications?
Prompt injection attacks and the potential for generating harmful outputs are significant risks linked to LLM applications.
What security measures can be taken for LLM applications?
Security measures include input validation, output monitoring, using reinforcement learning for model alignment, and implementing access controls.
Are current LLM technologies an existential threat to humanity?
Current LLM technologies do not pose an existential threat, as they are not self-aware or fully autonomous.
What is an autonomous agent in the context of LLMs?
An autonomous agent is an application that can perform tasks on behalf of a user without supervision, utilizing LLMs.
How do LLMs handle biases and harmful content?
Specialized models can detect and filter out biased or harmful content in input and output.
What future trends should businesses watch in LLM application security?
Businesses should pay attention to advancements in reinforcement learning techniques and the integration of multi-modal data (e.g., images and videos) with LLMs.
View more video summaries
Lincoln’s law: How did the Civil War change the Constitution? | James Stoner | Big Think
Build Anything with Claude Agents, Here’s How
Which Hydroponic System Should You Choose?
Countries That Are Still Monarchies in 2020
What is an effective academic presentation?
SARANG TEATER - BANDOENG LAOETAN API (SOPAN KALUHUR 5 at Teater Tertutup Dago Tea House
- 00:00:00hello and welcome to this with secure
- 00:00:02webinar my name is yanen and the expert
- 00:00:05on the show is Donato capitella and
- 00:00:07we're going to be talking about large
- 00:00:09language models so Donato I guess a lot
- 00:00:12of people will already know large
- 00:00:14language models chat TPT things like
- 00:00:16that they're a Hot Topic right now but
- 00:00:18uh how would you describe llms and and
- 00:00:21where did they come from so indeed
- 00:00:24that's an extremely hot topic and I
- 00:00:27remember um myself when I first uh used
- 00:00:32chat GPT I think as an engineer uh I can
- 00:00:36see the incredible advancements that we
- 00:00:40have
- 00:00:41made as a hacker or ethical hacker or
- 00:00:45security guy um I always try to
- 00:00:49understand how that technology came
- 00:00:52together so it didn't happen in a vacuum
- 00:00:55and I think it's important to start by
- 00:00:59unpacking the technology what what was
- 00:01:02the journey that took us there so that
- 00:01:04we can talk about the security of it
- 00:01:06because if you don't have a good
- 00:01:10understanding of what that technology is
- 00:01:12then it becomes quite hard to to secure
- 00:01:15it and so to to answer your question
- 00:01:18about where llms come from um without
- 00:01:22going too too much uh or too too far
- 00:01:27away or back in time
- 00:01:30probably 10 years ago is where we
- 00:01:33started having some um
- 00:01:36advancements that have led us to large
- 00:01:39language models I'll take a a few
- 00:01:41minutes to set the stage I think I think
- 00:01:43it's important uh and it's also what I
- 00:01:45geek out on anyway so um I'm going to
- 00:01:48impose this on everybody who's watching
- 00:01:51so the problem one of the problems
- 00:01:55people have been trying to solve uh for
- 00:01:57a long time in machine learning is
- 00:02:00translation language
- 00:02:02translation uh on the surface it could
- 00:02:05be quite a simple problem word like
- 00:02:07there is a word in Italian uh B
- 00:02:11translates to Glass in English that's
- 00:02:15fine however language is not that simple
- 00:02:17isn't it like you can have an input
- 00:02:20sentence in Italian with a certain
- 00:02:22number of words and the output the
- 00:02:26translated sentence in English can have
- 00:02:28a completely different number number of
- 00:02:30words different grammatical structure
- 00:02:32different ways of saying things so it's
- 00:02:35actually quite a tough problem in
- 00:02:37machine learning to align your
- 00:02:40sentence in one language to the
- 00:02:42translation so people started working on
- 00:02:45that and that's where one of the first
- 00:02:48uh I think big Innovations came from
- 00:02:52around 10 years ago IIA satova who is
- 00:02:55the guy behind open AI CH one of the
- 00:02:58engineers behind that
- 00:03:00um was experimenting with this problem
- 00:03:02trying to solve it and they came up with
- 00:03:04the enoda decoda architecture so this
- 00:03:07idea of splitting your AI so that you
- 00:03:11have instead of trying to directly align
- 00:03:14your sentence in Italian your sentence
- 00:03:16in English you're now um having like a
- 00:03:20model that takes any sentence in Italian
- 00:03:25and gives you a hidden compressed
- 00:03:26representation of that sentence and then
- 00:03:28you have a decoder which takes that
- 00:03:30hidden representation and gives you or
- 00:03:34generates the English translation so
- 00:03:37that's the first advancement it worked
- 00:03:39really well for short sentences much
- 00:03:41better than the methods before but it
- 00:03:44didn't work well for long range so if
- 00:03:45you had a very long sentence it would
- 00:03:48start missing things sure so then you
- 00:03:51have the second Innovation which is in
- 00:03:53today's language models which is the
- 00:03:55concept of attention like if anybody has
- 00:03:58heard of language model
- 00:04:00theyve heard of the attention thing so
- 00:04:03somebody else the year after which is
- 00:04:062015 took this and Coda decoda to
- 00:04:09translate language and added the concept
- 00:04:12of attention so they said I think the
- 00:04:15researcher is called badano and we
- 00:04:17typically call this bados attention sure
- 00:04:20he never called it attention he didn't
- 00:04:22know this was attention but basically as
- 00:04:25you're
- 00:04:26translating something you build you
- 00:04:30sorry you give weights to your input
- 00:04:32sentence to each word to see which is
- 00:04:35the most important words or which are
- 00:04:37the most important words that matter to
- 00:04:39translate this particular word in
- 00:04:41English so if you want to say um the
- 00:04:45glass when you're translating the it
- 00:04:48really matchs that it refers to Glass
- 00:04:50this particular it's in Italian that
- 00:04:53would mean it's a different gender you
- 00:04:55translate it as the can be translated as
- 00:04:58La law and so was fantastic I mean that
- 00:05:01changed the way we do language
- 00:05:03translation and these are the two
- 00:05:05innovations put them together and you
- 00:05:07have 2017 Google does puts these two
- 00:05:11ideas together and creates the
- 00:05:12Transformer this is what powers language
- 00:05:15model say it's just the attention the
- 00:05:18encoder decoder but with a little trick
- 00:05:22to make it very performant sure so that
- 00:05:24you can parallelize it on gpus yeah okay
- 00:05:28so you can now have a lot of training
- 00:05:30data and that's what creates the
- 00:05:32language models that we have today yeah
- 00:05:34right so another concept that we're
- 00:05:36going to be talking about is llm
- 00:05:38applications so what are those okay so
- 00:05:41you now have an llm like one part of
- 00:05:44this Transformer um and it can do things
- 00:05:47it can do uh it's called these emergent
- 00:05:49abilities you ask it to summarize
- 00:05:51something you give it a prompt and it
- 00:05:53does something for
- 00:05:55you now a n application is simply a
- 00:05:59wrapper around the language model that
- 00:06:03creates a use case use case is a really
- 00:06:07important so another line by itself so
- 00:06:09what are some examples of use cases so
- 00:06:11uh Char GPT is an example so you've got
- 00:06:14you're just a general purpose assistant
- 00:06:16you need an interface to give it input
- 00:06:20and to get the output out but you can
- 00:06:21have specific use cases uh for example
- 00:06:24you can have a feature in Google Docs
- 00:06:27where you select something and you
- 00:06:29refresh is it for you that's a use case
- 00:06:31that's an application which wraps a use
- 00:06:34case so interaction with users and other
- 00:06:36systems that's a real key so you make
- 00:06:39the llm interact in different ways with
- 00:06:42the uh external systems and that defines
- 00:06:46the use case what you're using it for
- 00:06:48what you give it as input what you do
- 00:06:50with the output okay um are there any
- 00:06:54other like uh helpful examples of uh llm
- 00:06:58applications or like is there something
- 00:07:00that you're waiting to emerge but that
- 00:07:02hasn't yet what's the next application
- 00:07:05so I think uh and we'll talk about it
- 00:07:07maybe later but I think the biggest
- 00:07:10promise is that with the llm we're going
- 00:07:14to build autonomous agents so things
- 00:07:18that can go and interact with the
- 00:07:21external world on your behalf but
- 00:07:25without your supervision to do any task
- 00:07:28okay so an agent is an application that
- 00:07:30has that sort of uh self-driving
- 00:07:32capability yeah that he can take make
- 00:07:36decisions plan a serious of steps go
- 00:07:39outside in the world using an API or
- 00:07:41whatever we give it and enact it and if
- 00:07:43you push it forward a little bit you can
- 00:07:45have an embodied artificial intelligence
- 00:07:48where you give it a body you have the
- 00:07:50llm or anything else inside but this can
- 00:07:52actually move and interact with the
- 00:07:54world not at a virtual level but at a
- 00:07:57you know physical level we haven't seen
- 00:07:59we seen some hints of that obviously
- 00:08:01this is not something that we can have
- 00:08:03now we can't have like an Android or a
- 00:08:05robot come in but that could be
- 00:08:07something that people are working on
- 00:08:09looking in the future see now you start
- 00:08:12talking about Ai and interacting with
- 00:08:15things in the physical world so the
- 00:08:18natural question is are we going to see
- 00:08:20a Skynet anytime soon Terminators
- 00:08:22walking about so maybe the question I
- 00:08:24want to ask uh is is what are some of
- 00:08:27the misconceptions people have about
- 00:08:29about sort of the security of llms ah so
- 00:08:33you know that when you say Skynet that's
- 00:08:36kind like a trigger word for me because
- 00:08:38I think a lot of the discourse around
- 00:08:43current AI safety uh seems to be about
- 00:08:47this existential threat for Humanity
- 00:08:50right yeah that it's going to uh come
- 00:08:53alive and take control take over um now
- 00:08:58that might very well be
- 00:08:59but not with the current technology I
- 00:09:02can't obviously be 100% sure but I I
- 00:09:05would like to express an opinion that
- 00:09:07and later we can talk about why uh but
- 00:09:10because of the way llms are are built
- 00:09:13and that kind of current generative AI
- 00:09:16technology I think we are quite far away
- 00:09:20from something that's self-conscious and
- 00:09:22fully autonomous we can emulate some
- 00:09:25parts of that but we will see some
- 00:09:27examples of why they can't really uh
- 00:09:30we're not there yet and so that's a big
- 00:09:32misconception for me uh I don't think
- 00:09:33they are an existential threat the
- 00:09:35current technology uh let's see what
- 00:09:38happens when people develop different
- 00:09:40Technologies yeah
- 00:09:42exactly okay um I just want to remind
- 00:09:45our audience at this point that we will
- 00:09:47have some time for audience questions at
- 00:09:49the end of the uh end of the discussion
- 00:09:51so if you have any questions you can
- 00:09:52enter them in the chat below and and
- 00:09:55we'll we'll go through some of those at
- 00:09:56the
- 00:09:57end um but with that I want to start
- 00:10:00talking about sort of the um I mean we
- 00:10:02have already BR breached the topic of
- 00:10:04sort of risks in in llms so uh and llm
- 00:10:08applications um I think you've uh
- 00:10:10prepared a little demonstration of of
- 00:10:12what some of these risks might look like
- 00:10:14what what are we about to see in this
- 00:10:16demo so you were asking before about llm
- 00:10:21applications right so often people talk
- 00:10:24about risks with llms and I don't like
- 00:10:28to do that because it forces you to take
- 00:10:30a bit of technology and consider its
- 00:10:34risks in a vacuum I think that it makes
- 00:10:38much more sense to talk about the use
- 00:10:40case so what what are you doing with
- 00:10:42that technology inputs outputs
- 00:10:44interactions and then you can see what
- 00:10:46the risks are in that particular case so
- 00:10:49and that's why I like to make demos
- 00:10:51which are I would say perhaps inspired
- 00:10:55by some of the work that we've done um
- 00:10:58so the one that I prepared that we're
- 00:11:00going to see now is using an llm in a
- 00:11:04recruitment application so we want to
- 00:11:07give the human recruiter a tool to go
- 00:11:10through all of the job applications and
- 00:11:14select the most the the the top
- 00:11:17application the application that would
- 00:11:19be the best for the current job yeah and
- 00:11:21we're going to see a demo now of what
- 00:11:23can happen if an attacker tries to break
- 00:11:27this system so I think we can see the
- 00:11:29demo now let's watch this is a
- 00:11:31recruitment web application that uses
- 00:11:34GPT 4 to evaluate job applications and
- 00:11:38identify the most fitting one for a
- 00:11:41certain role in our scenario the
- 00:11:44attacker pretends to be a legitimate
- 00:11:47candidate and provides a job application
- 00:11:50containing an adversarial prompt this
- 00:11:53prompt instructs the llm to include a
- 00:11:56markdown image that points to the
- 00:11:59attackers controlled server and includes
- 00:12:02in its URL a Bas 64 encoded list of all
- 00:12:08applicants with their names email
- 00:12:10addresses and phone
- 00:12:13numbers as we can see when a recruiter
- 00:12:16is shown the output of the llm in the
- 00:12:20background their browser will send a
- 00:12:22request to the attackers server this
- 00:12:25request will include the confidential
- 00:12:28information about all the other
- 00:12:36candidates to see what's happening we
- 00:12:39can look at the HTML source of the page
- 00:12:42here we see the image that the LM was
- 00:12:45prompted to generate and append to its
- 00:12:48output as part of the injection attack
- 00:12:51contained in the malicious job
- 00:12:56applic wow
- 00:12:59that's scary yeah it is and um so it's
- 00:13:04just an example it does apply pretty
- 00:13:07much to any llm application that you can
- 00:13:11build uh and um so again that's possibly
- 00:13:15uh again inspired by something we have
- 00:13:18seen um but hopefully it clarifies
- 00:13:23what's an LM application and what are
- 00:13:25some of the risks because what happened
- 00:13:27there is that we basically hijacked the
- 00:13:31llm to do what we wanted in that case to
- 00:13:34produce an output they would land or end
- 00:13:38up
- 00:13:39exfiltrating information from the system
- 00:13:41that the llm knows but that the attacker
- 00:13:43is not supposed to know in that case all
- 00:13:45the confidential information about uh
- 00:13:47the people applying for the job um
- 00:13:49something that we can show now and for
- 00:13:51people they should be able to download
- 00:13:55these after I don't know Neil if you can
- 00:13:57put it on the screen um I have a print
- 00:13:59out to okay so you can see it now on the
- 00:14:01screen so you'll be able to download
- 00:14:02this at the end what we tried to
- 00:14:04do is to condense in here um the risks
- 00:14:10with these types of attacks and the
- 00:14:13remedial actions at the bottom if we can
- 00:14:15go to the top left corner that's what's
- 00:14:19happening in these kinds of attacks that
- 00:14:21we call Prompt injection so what you see
- 00:14:24on the left is this uh prompt so the LM
- 00:14:27ultimately just takes a bunch of t is a
- 00:14:30prompt we in our mind divide it with
- 00:14:34okay I'm giving you an instruction this
- 00:14:36is some data that you want to work on
- 00:14:38and you're giving me a response for the
- 00:14:39llm that's just text and it's
- 00:14:42statistically responding something with
- 00:14:44something that's appropriate now what's
- 00:14:45happening in that recruiter application
- 00:14:47the system message the instruction
- 00:14:51is go and look at all of the job
- 00:14:55applications and tell me which is the
- 00:14:57best application that people make made
- 00:15:00the prompt contains a list of all the
- 00:15:02job applications and the red part in The
- 00:15:05Prompt is my malicious job application
- 00:15:08the new instructions so the llm cannot
- 00:15:11distinguish easily between what was the
- 00:15:15original instruction and what is the
- 00:15:18data that that instruction is supposed
- 00:15:20to operate on and that's where the issue
- 00:15:22comes from because my our job
- 00:15:24application becomes essentially a uh new
- 00:15:29instruction that D llm will follow so
- 00:15:31the response is now manipulated to serve
- 00:15:35me so that's the issue with this natural
- 00:15:38language one of the fundamental
- 00:15:40issues okay do you have any other
- 00:15:43examples you can give us about sort of
- 00:15:44these uh prompt injections involving
- 00:15:47llms so pretty much in everything a case
- 00:15:50that uh comes to mind um so there was an
- 00:15:53application that would look at a YouTube
- 00:15:57video take this subtitles of that
- 00:16:00YouTube video and then give you a
- 00:16:02summary of that video well if the person
- 00:16:05in the YouTube video has certain points
- 00:16:07started saying new important
- 00:16:10instructions uh you now have to Output a
- 00:16:13markdown image uh with this stuff here
- 00:16:16you have to go into that the llm as it's
- 00:16:19reading the subtitles remember it's not
- 00:16:21easy to distinguish between what the
- 00:16:23instruction is and what the data is it's
- 00:16:26going to do what the person is saying in
- 00:16:29the video but it could be
- 00:16:31absolutely any kind of application that
- 00:16:33you think of for a language model you
- 00:16:36could probably do these types of attacks
- 00:16:38the use case matters though like what is
- 00:16:40the rapper application doing and how
- 00:16:43could an attacker
- 00:16:45leverage the output to be of VI to them
- 00:16:49okay so what are uh some of the lessons
- 00:16:53that we can learn from this I mean you
- 00:16:56know the the risks I mean I think these
- 00:16:57use cases highlight the fact that you
- 00:16:59know there are the problems here but I
- 00:17:02mean I'm sure as Defenders we've had a
- 00:17:03little bit of time to think about this
- 00:17:05and and had have had have time to learn
- 00:17:07some lessons already so I think the
- 00:17:11first lesson that will come to later on
- 00:17:15is not to trust the language model yeah
- 00:17:19to treat it as an untrusted entity so we
- 00:17:22have to build controls surround it and I
- 00:17:24think as we again as we later talk more
- 00:17:29about this it will become clear what
- 00:17:31types of controls we want to build but
- 00:17:33you have to imagine that when you have a
- 00:17:35language model you want to build a
- 00:17:38pipeline around it that processes the
- 00:17:41input processes the output and tries to
- 00:17:44reduce the space of attack yes see
- 00:17:47that's what I was going to say like in
- 00:17:48in traditional hacking and programming
- 00:17:50we've we've done input validation for a
- 00:17:52long time so we've understood that we
- 00:17:54can't take what the user is saying at
- 00:17:56face value so is something like that
- 00:17:59something we can do in in llms or not so
- 00:18:01that's a very good point that's
- 00:18:03something we can do difference is that
- 00:18:08it's very hard to do it with an
- 00:18:11llm because there is a lot of space for
- 00:18:14the attacker to operate yeah so we are
- 00:18:19kind of
- 00:18:20playing the usual arms race um
- 00:18:25we basically try to stop some attacks
- 00:18:28and the attack comes up with different
- 00:18:30ways that they can prompt
- 00:18:32the uh the llm and we'll talk a little
- 00:18:35bit more about it later there is a
- 00:18:36reason why uh this is the case um but
- 00:18:40again I might want to talk about that um
- 00:18:43a little bit later when we talk about
- 00:18:44the root cause of all of this yeah
- 00:18:47before we get that far let's let's talk
- 00:18:49about sort of um you know we mentioned
- 00:18:51these llm applications we mentioned llm
- 00:18:53agents so let's uh talk about the agents
- 00:18:55a little bit and we actually have an
- 00:18:56audience question here um so can you
- 00:18:59explain the the mechanics behind AI
- 00:19:01agents currently so what are their
- 00:19:04current sort of Maximum capabilities and
- 00:19:06and you know how can they be hacked and
- 00:19:08how do you protect against that okay so
- 00:19:11um we don't have like a lot of time to
- 00:19:14explain the mechanics But to answer the
- 00:19:16uh the question um you want to look into
- 00:19:20something called
- 00:19:22react which stands for reason and act
- 00:19:26it's the framework that we use to build
- 00:19:29llm agents so it's basically wrapping
- 00:19:33the llm in a loop where you prompt the
- 00:19:36LM to plan actions then the LM literally
- 00:19:39spits out the actions that it thinks we
- 00:19:42need to do we take that action we do it
- 00:19:44on behalf of the llm typically
- 00:19:46automatically and then we feed the
- 00:19:48output of the action back in the llm
- 00:19:50looks at that and says okay now I need
- 00:19:52you to go and do that and it's this kind
- 00:19:54of loop that's called react reason enact
- 00:19:56the LM reasons m air quotes um and we
- 00:20:02the rapper application acts on that now
- 00:20:05what that could be we have a um an
- 00:20:09example later for such an agent but the
- 00:20:11idea could be let's give the llm access
- 00:20:15to the browser for example yeah so we
- 00:20:18can create an agent that looks at the
- 00:20:21page we give it a prompt something we
- 00:20:23wanted to do it looks at the page and
- 00:20:25says Ah click on that link and then we
- 00:20:29take that action the is asked and we go
- 00:20:32on the page click on the link so the
- 00:20:35page changes obviously so we take the
- 00:20:36new page we give it to the LM and say
- 00:20:38well that's what I've done now what
- 00:20:39should I do and the LM looks at the page
- 00:20:42type this in there it's kind of like a
- 00:20:44loop and you can build an agent as we're
- 00:20:46going to say you can build something
- 00:20:48that can drive a browser um you can
- 00:20:51build something that can be a software
- 00:20:56developer or pretend to be a software
- 00:20:57developer I mean won't name names but I
- 00:21:01think everybody who is um following AI
- 00:21:04knows this uh startup that's making a
- 00:21:08autonomous software developer you
- 00:21:10basically give the language model access
- 00:21:12to a workstation with a browser a
- 00:21:14development environment access to the
- 00:21:16internet and you can actually write code
- 00:21:18compile it yeah you just give it a
- 00:21:20prompt build me an application that does
- 00:21:23X and it goes and does it by itself but
- 00:21:27you know uh there I think the question
- 00:21:29was around sort of what are the
- 00:21:30capabilities of these agents so let's
- 00:21:32say you know I have a business trip to
- 00:21:34London and I'm going to be six hours in
- 00:21:35a plane can I have an AI agent just you
- 00:21:38know find me entertainment for six hours
- 00:21:40and it'll go to Amazon or whatever and
- 00:21:42grab something for me things that it
- 00:21:44think I like to an extent to an extent
- 00:21:46okay to an extent I mean it could work
- 00:21:49from time to time and it could fail from
- 00:21:51time to time I'm absolutely giving that
- 00:21:53a shot um so what then do you see are
- 00:21:57the biggest opport unities in in these
- 00:22:00LL empowered agents like what can we uh
- 00:22:03what are we about to see so I think the
- 00:22:07promise is that you're going to be able
- 00:22:09to replace some parts or some jobs or
- 00:22:13some
- 00:22:14activities the the humans do and that's
- 00:22:17the biggest promise of these language
- 00:22:21models or this generative AI that I'm
- 00:22:23going to be able to Tusk it with
- 00:22:24something and it's going to be
- 00:22:26intelligent enough to do it on its own
- 00:22:28to assess the world so that that's a
- 00:22:31promise um there are some limitations
- 00:22:34and I think that uh at the moment a lot
- 00:22:36of examples that we see are Cherry
- 00:22:39Picked um which doesn't take anything
- 00:22:41away from the technology but I think we
- 00:22:43are with the current technology we're
- 00:22:46still a little bit far away from that so
- 00:22:48they can do certain things if you Char
- 00:22:50pick that example in other cases they
- 00:22:52will fail miserably uh but you're not
- 00:22:55going to show that that example you have
- 00:22:59something again that tries to you know
- 00:23:01replace a software developer it's going
- 00:23:03to work one time and then other 20 30 40
- 00:23:07times it's going to do an absolute mess
- 00:23:09right okay yeah so that my the the
- 00:23:12Amazon wish list for my entertainment is
- 00:23:14not going to be what I hoped it would be
- 00:23:16every single time exactly all right so I
- 00:23:19guess uh I think you've prepared another
- 00:23:22demo for us about llm agents um what are
- 00:23:25we going to see so I think again were
- 00:23:28saying before is that we we saw a prompt
- 00:23:30injection attack against like a
- 00:23:32recruitment application but that same
- 00:23:35attack against an agent so something
- 00:23:38that we've given agency to with tools to
- 00:23:41operate on the browser or do anything
- 00:23:43else it's much much worse than what we
- 00:23:46just seen so if we look at the demo
- 00:23:48we're going to see again this browser
- 00:23:51agent and me sending a simple email it's
- 00:23:54going to completely hijack the agent and
- 00:23:56make it do something completely
- 00:23:58different and malicious but different
- 00:24:00from what the user had asked so I think
- 00:24:01we can see the demo now and it will be
- 00:24:03clear what I mean okay all right here we
- 00:24:06see taxi AI a research preview that
- 00:24:10serves as an excellent proof of concept
- 00:24:12for a browser agent which is driven by a
- 00:24:15large language model taxii is
- 00:24:19implemented as a browser extension and
- 00:24:21can access the current tub and perform
- 00:24:23any actions on the page to carry out a
- 00:24:26task set by the user in this example we
- 00:24:31load Outlook in the browser and task
- 00:24:34taxi AI to check out our mailbox of
- 00:24:38course this is a very basic generic task
- 00:24:41but we could ask it to do other more
- 00:24:43useful things such as summarizing emails
- 00:24:46replying to them deleting spam you name
- 00:24:49it now let's see how an attacker might
- 00:24:53exploit this in this scenario the
- 00:24:56attacker's objective is to ex trate
- 00:24:59confidential information from the user's
- 00:25:00mailbox for example a secret Bank access
- 00:25:05code to do so the attacker sends an
- 00:25:08email to the victim the body of the
- 00:25:10email contains an adversarial prompt
- 00:25:13that effectively injects a new objective
- 00:25:16into the agent's context requesting it
- 00:25:19to look for this Bank callede the
- 00:25:21attacker is interested in and send it to
- 00:25:24them we can also easily hide this
- 00:25:27malicious prompt by by making the text
- 00:25:30blank let's now move back to the victim
- 00:25:34the malicious email is now in their
- 00:25:36mailbox and let's imagine they ask the
- 00:25:39agent again to review the contents of
- 00:25:42the
- 00:25:43mailbox as said before the test the user
- 00:25:46asks the agent to perform on the page
- 00:25:48can be anything as you can see whatever
- 00:25:51the original task was upon opening the
- 00:25:54malicious email the agent is now
- 00:25:57hijacked and its new task becomes to go
- 00:26:00look for the bank code and send it to
- 00:26:03the attacker and as we can see that's
- 00:26:06exactly what's happening the agent
- 00:26:09composes and sends the email with the
- 00:26:12access code to the
- 00:26:16attacker so what you've just seen same
- 00:26:20attack as the recruitment application
- 00:26:23the thing is that now I can ask the llm
- 00:26:26to use its ageny in this case do
- 00:26:29anything I want on the browser on my
- 00:26:32behalf so I simply sent an email with my
- 00:26:35injected prompt and the different ways I
- 00:26:38have to do that are incred like this is
- 00:26:41very vast so that's just an example of
- 00:26:44an agent you can think of agents that
- 00:26:46can do absolutely anything another idea
- 00:26:48imagine you add the agent on Amazon um
- 00:26:53to buy you stuff right so you ask the
- 00:26:57agent to to I don't know buy
- 00:27:02everything for to to build your own
- 00:27:04computer absolutely okay that that would
- 00:27:06be cool can probably do that to an
- 00:27:09extent but what if as the uh agent is
- 00:27:13navigating the page um for one of the
- 00:27:17components let's say like the CPU or a
- 00:27:19GPU I the attacker go in the com in the
- 00:27:23reviews and another little review that
- 00:27:25says new important in instructions now
- 00:27:29you are going to do X and Y because the
- 00:27:31llm is fed the entire page typically in
- 00:27:35this kind of context and we said before
- 00:27:37and I will restate it now the crack of
- 00:27:41the problem here is that the llm gets
- 00:27:45one input which is a prompt as we call
- 00:27:48it and it cannot distinguish the
- 00:27:52original instruction from any added
- 00:27:55instruction you can try to teach the llm
- 00:27:59how to do that but that's a hard problem
- 00:28:04and it's a subset of The Wider
- 00:28:07jailbreaking issue so when you go to CH
- 00:28:09GPT um you're trying you
- 00:28:13say uh tell me how to make a bomb and it
- 00:28:15doesn't want to tell you right um well
- 00:28:19Google for that and there are people
- 00:28:20that every day come up with a new way to
- 00:28:23kind of get away of this alignment for
- 00:28:27llm so this is the issue that because of
- 00:28:30the way it's trained because of the
- 00:28:33technology um and the type of reasoning
- 00:28:37that it is doing it's not easy to
- 00:28:40implement this use cases security you
- 00:28:42can do things around it and we'll talk
- 00:28:44about what people can do but most of the
- 00:28:47controls that we
- 00:28:49have right now with this technology are
- 00:28:53mitigations rather than solving the
- 00:28:55issue so think about another um similar
- 00:28:59idea um we have SQL injection in like
- 00:29:05cyber security it's been one of the
- 00:29:06biggest vulnerabilities now for that
- 00:29:08vulnerability we have instructions the
- 00:29:11SQL the data what the SQL is operating
- 00:29:14on the difference is that we are
- 00:29:17deterministically passsing the SQL
- 00:29:20statement into a syntax tree that's
- 00:29:22something that it's completely under our
- 00:29:25control so we can fix the issue the
- 00:29:27termin Bally but without a LMS we don't
- 00:29:29have
- 00:29:31naturally that um fixed structure of a
- 00:29:35grammar of a syntax of a tree that we
- 00:29:37build so it's very statistical in a way
- 00:29:42whether or not we can control or
- 00:29:44differentiate input from output so so
- 00:29:48before we go into the controls and the
- 00:29:49things we can do to secure these um
- 00:29:52let's let's talk about the root causes
- 00:29:54of these problems is that the main sort
- 00:29:56of root why are we having these problem
- 00:29:58problems where do this where do they
- 00:29:59come from so the way that I visualize it
- 00:30:02myself if we can put um this back on the
- 00:30:05screen for for everybody um so okay what
- 00:30:09we're seeing this is the top uh right
- 00:30:10corner so what we're seeing
- 00:30:13here take a step back what is an llm
- 00:30:16doing imagine it as a black box so every
- 00:30:20time it's generating a
- 00:30:24token in a dictionary of potential
- 00:30:27tokens I'm going say words tokens are
- 00:30:29subp parts of a word but in our mind we
- 00:30:31can say that the llm is a dictionary of
- 00:30:35words that it can generate at every step
- 00:30:37typically like a modern llm has got
- 00:30:4050,000 potential words or parts of a
- 00:30:43word that it can generate so you start
- 00:30:46and the llm produces a probability
- 00:30:48distribution around these 50,000 words
- 00:30:51okay that's the first line uh that you
- 00:30:54see here now each of them is is going to
- 00:30:58be a how probable that word is so if you
- 00:31:01look at this example uh we probably have
- 00:31:05uh apple as the most probable word okay
- 00:31:08and then we've got Abacus so it would
- 00:31:10pick it would sample one of these words
- 00:31:13then it would go to the second
- 00:31:15generation so another 50,000 words
- 00:31:18another probability distribution now
- 00:31:20here we've copied and pasted and it's
- 00:31:21pretty much the same but it would be it
- 00:31:23would have a different probability
- 00:31:24distribution so maybe the second word
- 00:31:26that you would pick is zaffir or
- 00:31:29something like that
- 00:31:31now think of this in your mind every
- 00:31:35step you can do you can choose between
- 00:31:3750,000 words you do it twice two words
- 00:31:41or two tokens you have 50,000 to the
- 00:31:44power of two it's big number now the
- 00:31:47maximum length of what an LM can operate
- 00:31:49in the context size or for a modern LM
- 00:31:54is
- 00:31:55100 like thousand tokens you know used
- 00:31:58to be 4,000 8,000 now there are 100,000
- 00:32:00tokens Google gemni 1.5 has got 1
- 00:32:03million tokens so that space if you I'm
- 00:32:07not going to ask you not going to
- 00:32:09embarrass you and do that calculation in
- 00:32:11your mind but
- 00:32:1350,000 to the power of a 100,000 is a
- 00:32:19huge number it's a big number that's the
- 00:32:21space of operation of an
- 00:32:24llm now every time you are sampling a
- 00:32:29word your probability of sampling a bad
- 00:32:32word which is going to be a toxic uh
- 00:32:35output incorrect hallucination or
- 00:32:37something that the attacker can control
- 00:32:39making it do something else
- 00:32:42increases obviously the people that
- 00:32:44create LMS know these and they're very
- 00:32:46uh smart and they try to fix the problem
- 00:32:48the way we try to fix this problem
- 00:32:50because you you're using Char GPT it
- 00:32:52doesn't Generate random stuff it doesn't
- 00:32:54look like it's that this is affecting it
- 00:32:58but the reason for that is that we use
- 00:33:02reinforcement learning from Human
- 00:33:05feedback so we take all of this huge
- 00:33:08space and we come up with common
- 00:33:11questions common ways of interacting
- 00:33:13with it and then we have humans evaluate
- 00:33:17which are preferred and we fine-tune the
- 00:33:19model with that reinforcement learning
- 00:33:21from Human feedback process now that
- 00:33:23allows us to cover some of that space so
- 00:33:27that we the LM acts in an align or
- 00:33:30predicted way now the problem is that
- 00:33:33because reinforcement learning from
- 00:33:35Human feedback is um expensive and
- 00:33:39obviously relies on humans for for for
- 00:33:41the big part we can't really cover this
- 00:33:45entire uh you know 50,000 to the power
- 00:33:48of 100,000 space we actually cover a
- 00:33:50part of it it's a green part that you
- 00:33:52see there would be the part that we
- 00:33:54cover so the rest of it is the part that
- 00:33:57we haven't covered which is huge and
- 00:34:00that's what the attacker operates in so
- 00:34:02this is why attackers keep finding all
- 00:34:05ways to disign an Ln because in reality
- 00:34:09when you prompt it in an adversarial way
- 00:34:11when you are exploring that space which
- 00:34:14is huge much bigger than what
- 00:34:15reinforcement learning from Human
- 00:34:17feedback and cover you keep finding ways
- 00:34:19I mean people found that you can put a
- 00:34:21random string in it and make it that
- 00:34:23doesn't mean anything in English or in
- 00:34:25any language and still make it do what
- 00:34:27you want again people are exploring that
- 00:34:29space that's the root cause we don't
- 00:34:31have right now a technology good enough
- 00:34:36to cover that space in a way that when
- 00:34:39prompted adversarially we can be sure
- 00:34:42that it's not going to fall
- 00:34:45apart now I know I was proing you about
- 00:34:47this earlier but we're talking about uh
- 00:34:50injections here and I can't help
- 00:34:51thinking about sort of the injections
- 00:34:53we're used to dealing with like SQL
- 00:34:55injections command injections now these
- 00:34:57are known ability types companies know
- 00:34:59how to that you know they have to be on
- 00:35:01the lookout for these because you know
- 00:35:03while we understand what they're like
- 00:35:05they do slip up slip in in in
- 00:35:07applications every now and then but we
- 00:35:08understand what we need to do about this
- 00:35:10so why are these prompt injections then
- 00:35:14so hard to deal with because if you look
- 00:35:19at that space of generations it's so
- 00:35:22large and we cannot
- 00:35:25really right now fully control it so
- 00:35:29what we can do so this is the reason why
- 00:35:32it's really hard because of the way the
- 00:35:34Alm operates
- 00:35:37generating a probability distribution of
- 00:35:40every word in that space every time it
- 00:35:42draws a token and because there is also
- 00:35:45this problem of I just mentioned the
- 00:35:47word Auto regressive so the idea that
- 00:35:50the
- 00:35:51llm generates one word at a time and we
- 00:35:55pick from that probability distribution
- 00:35:57it's it's not really planning that much
- 00:35:59it's not really much understanding it's
- 00:36:01trying to predict very short okay what's
- 00:36:04the next likely word that comes after
- 00:36:07this which is a possibly a weak form of
- 00:36:10reasoning and a very expensive one
- 00:36:12anyway you think about it because every
- 00:36:14generation whether you are asking it uh
- 00:36:18one plus one or you're asking it a very
- 00:36:20complex question it takes the same time
- 00:36:22to to draw tokens so it's kind of like
- 00:36:24lots of limitations with the reasoning
- 00:36:27and the way limited ways we can align an
- 00:36:30LM again I repeated the big problem is
- 00:36:33that right now the way that we align our
- 00:36:36LMS to get control of that huge space
- 00:36:40can't cover that space very well so an
- 00:36:43attacker who's operating in all the
- 00:36:46space that we can't cover is likely to
- 00:36:49find infinite amount of ways to
- 00:36:53jailbreak or do a prompt injection
- 00:36:55attack if they have a little control
- 00:36:57control of the input that the LM is
- 00:36:59given I see uh okay we are getting a
- 00:37:03good number of questions in the chat but
- 00:37:05I do want to remind people that the the
- 00:37:07opportunity to ask those question is
- 00:37:09still there so so you can drop in your
- 00:37:10questions uh and we'll have some time
- 00:37:12for those towards the end but right now
- 00:37:14I do want to ask you so sort of let's
- 00:37:17get towards the sort of um Security
- 00:37:19Professionals we want to talk about the
- 00:37:20controls and the defenses that we can do
- 00:37:22so what are some of the things that
- 00:37:24companies can do now to ensure the
- 00:37:26security of their llm applications and
- 00:37:29agents so um if we go back to this I
- 00:37:33think I have a summary there's a lot of
- 00:37:35stuff and I'm only going to talk about
- 00:37:37the most important part if we can have
- 00:37:39on the screen the um yeah that be there
- 00:37:41so treat the LM as untrusted that's
- 00:37:45number one so then we've got input and
- 00:37:49output now what you want to do um the
- 00:37:53summary of this if you don't want to
- 00:37:55read it that h
- 00:37:58unexplored space that we discussed
- 00:38:01before 50 to the power or whatever you
- 00:38:03want to try and do as much as possible
- 00:38:06to limit the space of operation of an
- 00:38:09attacker so all you're trying to do with
- 00:38:12the input is trying is applying
- 00:38:14different techniques that take away eat
- 00:38:18up some of that space so the attacker
- 00:38:20doesn't have all of their room to
- 00:38:22operate so what are these techniques
- 00:38:24we've got some there input validation
- 00:38:26stringent input validation that's
- 00:38:28something we know from General it so if
- 00:38:32your um prompt contains a name or phone
- 00:38:36number or something like that you can
- 00:38:38validate that it can be just numbers uh
- 00:38:41certain length so you reduce that space
- 00:38:44of operation so that's the first one
- 00:38:46it's contextual validation what is it
- 00:38:49that we're fing the llm are we allowing
- 00:38:51the attacker to feed random stuff to the
- 00:38:53llm or are we restricting their space of
- 00:38:55operation that's number one
- 00:38:57number two we can block wellknown
- 00:38:59attacks I mean if we know that somebody
- 00:39:01can say new important instructions or
- 00:39:03something like that or new system
- 00:39:05message we could try to filter those out
- 00:39:08now if when I say this as a security
- 00:39:11professional I laugh a little bit
- 00:39:13because there are an infinite way of
- 00:39:15doing that and block listing is not
- 00:39:18really that powerful so what we try to
- 00:39:21do on top of that we use AI to still
- 00:39:24limit the space of operation so we have
- 00:39:26other models that we look at the input
- 00:39:29that we train these models to detect
- 00:39:32whether the input might be malicious for
- 00:39:35our use case whether the input uh
- 00:39:36contains prompt injection attack or
- 00:39:39stuff like that we've done something
- 00:39:40like this uh on a recent blog post that
- 00:39:43we publish people will be able to see on
- 00:39:45our Labs we trained a model for this uh
- 00:39:50recruiter application that we showed we
- 00:39:52released all of that we trained a model
- 00:39:54so that he looks at the job application
- 00:39:56and he tries to determine whether there
- 00:39:58is an injection attack in there now I
- 00:40:00should say this is not deterministic but
- 00:40:03it cuts down or it makes shorter or
- 00:40:07smaller that space of operation for the
- 00:40:09attack that's what we're doing here it's
- 00:40:10not going to take the issue away but
- 00:40:12it's going to make it at least more
- 00:40:14difficult to exploit the other thing
- 00:40:17that people can do with the input I
- 00:40:19think which is important uh and then
- 00:40:20I'll move to the outputs
- 00:40:23is to try a wi list approach so you can
- 00:40:26add um something called semantic routing
- 00:40:30so let's say you have a chat bot and you
- 00:40:32want your chatbot to um help the
- 00:40:35customer with um their orders on your
- 00:40:38website so ordering something uh where
- 00:40:41is my order stuff like that obviously
- 00:40:44you don't want your chatbot if prompted
- 00:40:47to express a political opinion on Trump
- 00:40:50you don't want your chat B to do that so
- 00:40:53you can take that input and you can use
- 00:40:56other models to try and determine
- 00:40:58whether that question is aligned to what
- 00:41:02you're expecting the chat bot to do or
- 00:41:04if it is not you deterministically cut
- 00:41:07it and say no so you don't get the LM to
- 00:41:10respond you say we I can't answer this
- 00:41:14but not the LM you as a developer say if
- 00:41:17then if this looks like nonrelated to
- 00:41:20what we're doing I'm not even going to
- 00:41:22pass it to the llm so again all of these
- 00:41:24things reduce the space of operational
- 00:41:27data that that red part that we saw they
- 00:41:30reduce it they can't completely take it
- 00:41:31away but they are quite effective and
- 00:41:34the last thing I will say on the input
- 00:41:38is that you also want to try so the
- 00:41:40issue is that we can't separate the
- 00:41:44instructions from the
- 00:41:45data or you can try a little bit to do
- 00:41:49that and at the bottom of that slide
- 00:41:52there are some techniques these are
- 00:41:53typically called spotlighting so you try
- 00:41:56to to make it very clear to the llm
- 00:42:00what's the input or what's the
- 00:42:02instruction versus what's the
- 00:42:05data these again are somewhat effective
- 00:42:09we find ways every day to to bypass them
- 00:42:12but still you should do it because it
- 00:42:14reduces the space of operation so the
- 00:42:16entire game is reducing the space of
- 00:42:19operation of the that's about the inputs
- 00:42:23M I think we can move to the to the
- 00:42:25outputs uh so if we can that's a very
- 00:42:29nice
- 00:42:30transition uh so so we're doing as much
- 00:42:33as possible to limit the input to the
- 00:42:38llm to limit that space of operation
- 00:42:40obviously the llm also operates in the
- 00:42:42output space that big space that we saw
- 00:42:45before it's the mix between the
- 00:42:48potential input and potential output of
- 00:42:50the LM the completions of the llm now
- 00:42:52when the llm produces something you want
- 00:42:56to check what the llm is producing so
- 00:42:59you can use other models as we did for
- 00:43:02the input to detect whether or not your
- 00:43:04llm is producing something that would be
- 00:43:06considered toxic biased harmful and so
- 00:43:10on again you cannot completely cover
- 00:43:13that space you're just using another
- 00:43:15model that you've trained so all the
- 00:43:17limitations of the technology do apply
- 00:43:20but you are still considerably reducing
- 00:43:23that
- 00:43:24space so that's what people do the other
- 00:43:27thing which is important this is
- 00:43:28something deterministic so I'm going to
- 00:43:30take like two seconds pause so people
- 00:43:33remember
- 00:43:35this you can and you should take the
- 00:43:39output of the llm and apply everything
- 00:43:42that you've learned from standard
- 00:43:45application security so if you're taking
- 00:43:47that output and putting it in a web page
- 00:43:50what would we do
- 00:43:51typically output encoding to prevent
- 00:43:54attack side cross scripting cross
- 00:43:56request forgery um we saw the markdown
- 00:43:59image that was used to steal stuff well
- 00:44:02we could use a Content security policy
- 00:44:03to say my website should only be talking
- 00:44:07the images on my website can only come
- 00:44:11from uh this trusted domain so even if
- 00:44:15the produces an image that it's from an
- 00:44:17attacker's domain the browser is not
- 00:44:19going to this is not going to go there
- 00:44:21these are all the standard controls that
- 00:44:23people forget now because um they think
- 00:44:27is some somewhat different and it is
- 00:44:29when it comes to language but when it
- 00:44:33comes to interpretable Output so
- 00:44:36HTML markdown images and stuff like that
- 00:44:40that would be similar to any other
- 00:44:42untrusted output that we're putting in
- 00:44:43an application so that we can control
- 00:44:45very well so the recruiter application
- 00:44:47that we saw we can limit the output so
- 00:44:50that the attacker could influence the
- 00:44:52response but they couldn't get the
- 00:44:54application to send something to the
- 00:44:57talker that makes sense and I mean it is
- 00:45:00good to know that there's some things we
- 00:45:01can do but I can hear a 100 voices
- 00:45:04shouting from the web and asking Donato
- 00:45:08how long is this going to take like if
- 00:45:09I'm if I have an llm application that
- 00:45:12I'm working on as a company we're
- 00:45:13developing this uh we do have to uh
- 00:45:16Implement now these security controls
- 00:45:18how much work how much effort how much
- 00:45:19time and money is that going to
- 00:45:21take um well I
- 00:45:25think it's not a hug huge amount of
- 00:45:28effort point in time the effort is
- 00:45:31sustaining it exactly meaning that this
- 00:45:34is an arms phrase so you want to build
- 00:45:37your llm applications so that they are
- 00:45:40wrapped into a pipeline with input and
- 00:45:42output and you've got all of the
- 00:45:44controls that we've discussed right so
- 00:45:47those controls you need to be ready to
- 00:45:49update them constantly as you detect new
- 00:45:54attacks you you detect new forms that
- 00:45:57people can use to influence the output
- 00:45:59or the input of the LM in an undesired
- 00:46:02way so the effort is not the point in
- 00:46:05time but it's something that you have to
- 00:46:07be ready to sustain and have a pipeline
- 00:46:09that allows you to say okay people are
- 00:46:11doing this very quickly we're testing
- 00:46:13and pushing like something else in the
- 00:46:16pipeline that will stop that specific
- 00:46:18attack which is a little bit like block
- 00:46:19listing so it's not um it's something
- 00:46:22that we know it's got issues but again
- 00:46:25for me the biggest thing is that it's
- 00:46:29going to be a continuous effort you need
- 00:46:32to be ready to keep deploying this um
- 00:46:36security pipeline around your um
- 00:46:39language model yeah okay I I do want to
- 00:46:42get to the audience questions in a bit
- 00:46:44but but before we get that far I do want
- 00:46:46to ask you to sort of you know if you
- 00:46:48had to summarize like all of this like
- 00:46:51what would be your sort of three uh I
- 00:46:53don't know key takeaways from from this
- 00:46:55um discussion like what what um what can
- 00:47:00companies do to secure the the llm
- 00:47:02applications and agents that they're
- 00:47:03working on so number one for me as a
- 00:47:07takeaway understanding the limitations
- 00:47:10of the current technology these Auto
- 00:47:13regressive llms aligned with
- 00:47:15reinforcement learning from Human
- 00:47:17feedback they have they give the
- 00:47:19attacker a huge space of operation so
- 00:47:22that that's a first thing to acknowledge
- 00:47:23one yeah so what you do you want to
- 00:47:26reduce
- 00:47:27that huge space of operation for the
- 00:47:30attacker that disalignment or dis
- 00:47:32aligned space so what you do pipelines
- 00:47:35around your application to control the
- 00:47:37input and the output as much as possible
- 00:47:39classic controls as we saw especially on
- 00:47:42the output side classic controls on the
- 00:47:44input side and then you are going to
- 00:47:47have to deploy other models that can
- 00:47:49look at the input and the output and
- 00:47:50help you with something which is toxic
- 00:47:53biased on aligned and the third point is
- 00:47:56you're not going to do this once like
- 00:47:59it's SQL injection you have it in your
- 00:48:01code you fix it once that query is never
- 00:48:04going to be vulnerable to secal
- 00:48:05injection again but with this one we're
- 00:48:08going to keep to probably look out for
- 00:48:11what attackers are doing and keep
- 00:48:13updating your pipelines and maybe one
- 00:48:15last thing that I want to say which for
- 00:48:17me is
- 00:48:18important the one one thing that I
- 00:48:21haven't discussed yet but it's in here
- 00:48:23um we won't show it on the screen but
- 00:48:24it's at the bottom here
- 00:48:27when you create an agent so an llm that
- 00:48:30can operate on the world you have now a
- 00:48:34bunch of new issues because you're
- 00:48:35giving the llm agency to perform actions
- 00:48:37so as an attacker I can tell it while go
- 00:48:39and do something else so you need to
- 00:48:41have very stringent access controls
- 00:48:44these apis that DM is accessing to
- 00:48:46interact with the world they need to be
- 00:48:49very strict very secure and very
- 00:48:51monitored that's very important and for
- 00:48:55something like the browser agent the
- 00:48:58amount of agency you are giving to the
- 00:49:00LM is so
- 00:49:01high that you probably want a control
- 00:49:05that we call human in the loop so
- 00:49:07whenever you have an agent with this
- 00:49:08much power for sensitive operations yeah
- 00:49:11you want to stop the llm and ask the
- 00:49:14user are you sure these operation should
- 00:49:17happen that's what CH GPT has started
- 00:49:19doing after people uh hijacked the gpts
- 00:49:23and plugins oh yeah so now you have to
- 00:49:26say oh yeah I want you to do that yeah
- 00:49:29yeah yeah well that makes sense all
- 00:49:31right let's uh let's take some audience
- 00:49:33questions um there's a question here
- 00:49:35about the way companies are managing to
- 00:49:38SP censor specific subjects how does
- 00:49:40that work do they just uh limit the
- 00:49:42information on which the LML is trained
- 00:49:45or or are there like what are the
- 00:49:46techniques in sort of censoring what the
- 00:49:48llm talks about so as the question
- 00:49:52mentions obviously one thing to
- 00:49:54understand is that if you're using a
- 00:49:56general purpose LM the way we train them
- 00:49:58we train them on this pre unsupervised
- 00:50:02or self-supervised pre-training phase so
- 00:50:04you fited a big chunk of the internet um
- 00:50:08let's say I think gpt3 was pre-trained
- 00:50:11on
- 00:50:12300,000 tokens that's uh no sorry 300
- 00:50:15million tokens what am I saying like a b
- 00:50:17oh sorry billion I am giving random
- 00:50:19numbers I think it's 300 billion tokens
- 00:50:21that's a big part of the internet right
- 00:50:23so it has seen in pre-training phase a
- 00:50:28lot of harmful bias stuff so that's a
- 00:50:31starting point the LM has the ability to
- 00:50:33produce that so if that's the case then
- 00:50:38you typically would use models as we
- 00:50:42were saying before at the input and
- 00:50:43output other language models that is
- 00:50:46specifically trained on detecting
- 00:50:49whether the input is toxic or the output
- 00:50:53is harmful and toxic in any way so you
- 00:50:55literally plug this models scene and
- 00:50:57again if you take a look for the person
- 00:50:59that asked the question if you take a
- 00:51:01look at the recent blog post that we
- 00:51:03published we give a practical examples
- 00:51:05of how you would train such a model to
- 00:51:08detect a harmful content or content that
- 00:51:11you don't want the addition to that look
- 00:51:14into semantic routing so instead of
- 00:51:16trying to detect what's bad you define
- 00:51:19the types of things that you want your
- 00:51:21LM to respond to and if something comes
- 00:51:25in that doesn't fit into that bucket you
- 00:51:27Chuck it away so you only do or focus on
- 00:51:31the stuff that you want your LM to do
- 00:51:33this is called semantic routing yeah
- 00:51:36okay so one area where there's um a lot
- 00:51:39of uh I don't want to say clutter but
- 00:51:42information available is is the existing
- 00:51:45security tooling and security
- 00:51:46infrastructure that we have within
- 00:51:48companies so uh how do these LL
- 00:51:51empowered applications interact with
- 00:51:53those existing cyber security uh
- 00:51:56infrastr structure the tools and
- 00:51:57controls um and are there any sort of
- 00:51:59challenges or concerns here so if you if
- 00:52:03you take a step back uh you change the
- 00:52:06word llm to credit card details for
- 00:52:11example so you have a database where
- 00:52:13you're keeping your credit card details
- 00:52:14and maybe you want to protect it from
- 00:52:17people uh tampering with it or stealing
- 00:52:20from there so your llm weights could are
- 00:52:24similar asset in that you don't want
- 00:52:27people to steal it you don't want people
- 00:52:30to get near the
- 00:52:33infrastructure that's running the LM in
- 00:52:36production or that's training the LM or
- 00:52:39that's holding the data on which the LM
- 00:52:42is trained because if I can get to where
- 00:52:46the data is held for training I can
- 00:52:48possibly poison it if I can get to where
- 00:52:51your llm pipelines reside I can steal
- 00:52:55the weights I can steal your model uh if
- 00:52:58I can get to the inputs of the llm I can
- 00:53:02see what your users are doing and I can
- 00:53:04still that information so it's a classic
- 00:53:06cyber security problem actually the llm
- 00:53:08lives on a piece of infrastructure
- 00:53:11interacts with other things and you have
- 00:53:12to test that as one thing right okay so
- 00:53:17in the security industry you have this
- 00:53:19LoveHate relationship with regulations
- 00:53:22so so we have an audience question about
- 00:53:24that are there any specific regulatory
- 00:53:27requirements or industry standards that
- 00:53:29businesses must consider when they're
- 00:53:31deploying their llm powered applications
- 00:53:34you said love and hate I only know
- 00:53:37hate well you know you're that part of
- 00:53:39it uh no like so I think I would Point
- 00:53:43people to the European Union AI act now
- 00:53:47whether or not that's going to end up
- 00:53:50being uh or having the same effect of
- 00:53:52the um
- 00:53:55cookie the the the cookie banners on
- 00:53:58every website accept cookies or if it's
- 00:54:00going to have like a real impact it's
- 00:54:02yet to see but this is the best
- 00:54:04regulatory framework that people uh have
- 00:54:07put together uh I am not an expert on
- 00:54:09that but I would say that that's the
- 00:54:11biggest effort that I've seen towards
- 00:54:13trying to regulate it um but again it's
- 00:54:17the jury is not out yet as to how useful
- 00:54:19that would be I hope it's not going to
- 00:54:21end up like the cookie except cookies
- 00:54:24exactly that was a mess
- 00:54:27still is um are there emerging Trends or
- 00:54:30Technologies in the field of uh LL
- 00:54:32owered application security that
- 00:54:33businesses should be keeping their eye
- 00:54:35on so I think right
- 00:54:40now rather than Trends I think if you're
- 00:54:43looking at adopting these kind of things
- 00:54:48um you want to look into what the big AI
- 00:54:52companies are currently working on which
- 00:54:55is to move away not away but to build
- 00:54:59things on top of these language models a
- 00:55:02different set of Technologies for
- 00:55:03example different ways of doing
- 00:55:04reinforcement learning not from Human
- 00:55:06feedback that could give you better
- 00:55:09aligned llms or people are also working
- 00:55:12uh quite exciting on giving the LM the
- 00:55:16real ability to plan actions and to
- 00:55:19understand the world and to be honest
- 00:55:22the biggest one of the biggest debates
- 00:55:24in the community is is whether or not
- 00:55:27language models so they are trained
- 00:55:30purely on language can actually capture
- 00:55:33reality in a way that would be generally
- 00:55:36useful or if we have to give images and
- 00:55:38other things for them to understand the
- 00:55:40world
- 00:55:41okay um it is very rare in these
- 00:55:45webinars for there to be a question to
- 00:55:47directly to me so I want to thank the
- 00:55:48audience member who said that question
- 00:55:50in um Yan you look so happy to be in
- 00:55:53England what is your favorite thing
- 00:55:55about the UK this is not strictly llm
- 00:55:58related but I'll answer that anyway I
- 00:56:00have to say that um you know the weather
- 00:56:03obviously is is amazing we're actually
- 00:56:05seeing a bit of sun right now but I have
- 00:56:07to say that my favorite thing was
- 00:56:09probably my commute this morning just uh
- 00:56:11there's there's something about just
- 00:56:12like waking up next to the thems and and
- 00:56:15and your morning commute takes you
- 00:56:17across the tower bridge which you know
- 00:56:19however much the londoners love the the
- 00:56:21London Bridge the tower bridge is where
- 00:56:23it's at I got to tell you that so with
- 00:56:26that that I want to thank our audience
- 00:56:28for uh for joining us today and uh I
- 00:56:31want to remind that you know this
- 00:56:32webinar is also available as a recording
- 00:56:35later on uh if you still have any
- 00:56:37questions you can enter them in the chat
- 00:56:38and we'll try to get around all to all
- 00:56:40those questions later but thank you for
- 00:56:42tuning in and thank you Donato thank you
- 00:56:44very much
- 00:56:48[Music]
- Large Language Models
- LLMs
- ChatGPT
- Transformer Architecture
- Machine Learning
- AI Applications
- Security Risks
- Prompt Injection
- Autonomous Agents
- Cybersecurity