What is prompt engineering?

Prompt engineering involves crafting input instructions to guide large language models (LLMs) effectively for desired responses.

What are some key best practices for prompt engineering?

Best practices include clarity and specificity, role prompting, few-shot prompting, chain-of-thought prompting, and leveraging XML tags for structure.

What is Few-Shot Prompting?

Few-shot prompting provides multiple examples to the model to guide its behavior and improve accuracy.

What is Chain-of-Thought Prompting?

Chain-of-thought prompting involves instructing the LLM to think step by step for better reasoning capabilities.

How does Amazon Bedrock assist with LLM tasks?

Amazon Bedrock simplifies the use of multiple LLMs while offering features like Knowledge Bases for retrieval-augmented generation and agents for external API interactions.

What are XML tags used for in prompting?

XML tags are used to structure and organize the input prompts provided to models like Anthropic Claude.

What is retrieval-augmented generation?

This method combines external knowledge bases with LLMs by dynamically injecting relevant information into prompts based on user queries.

What are agents in Amazon Bedrock?

Agents enable the integration of external tools and APIs in conjunction with LLMs using function-based instructions.

How do you reduce LLM hallucinations?

Encourage LLMs to say 'I don't know' when uncertain, limit outputs to pre-defined contexts, and use retrieval-augmented generation for reliable context.

How can malicious prompts be mitigated?

Use harmlessness screens or guardrails like Amazon Bedrock's built-in features to filter harmful prompts and responses.

AWS Summit Berlin 2024 - Prompt engineering best practices for LLMs on Amazon Bedrock (AIM302)

00:39:35

https://www.youtube.com/watch?v=L77pbuKymEU

Zusammenfassung

TLDRAWS experts detailed best practices for prompt engineering with Amazon Bedrock and Anthropic Claude. They emphasized clarity, role prompting, few-shot prompting, and chain-of-thought prompting to guide LLM responses effectively. Techniques like XML tags improve structural prompts, while retrieval-augmented generation enhances model accuracy using external knowledge bases. Amazon Bedrock simplifies deployment, supporting retrieval-augmented generation, guardrails, and agents for function integration. Examples demonstrated improving LLM behavior for tasks like generating JSON files, classifying tweets, and solving puzzles. Guidance for mitigating hallucinations and malicious prompts, alongside prompt templates, supported responsible and efficient LLM use. Tools like Amazon Bedrock's Knowledge Bases and pre-designed agents streamline workflows.

Mitbringsel

🧠 Prompt engineering requires clarity and creativity for effective guidance.
📘 Role prompting improves context relevance for models.
💡 Few-shot and chain-of-thought techniques enhance response accuracy.
📜 XML tags organize structured prompt formats.
📈 Amazon Bedrock simplifies deploying retrieval-augmented generation solutions.
🔍 Retrieval-augmented generation provides dynamically updated knowledge.
⚙️ Agents for Amazon Bedrock enable API integrations via modeled tools.
🔒 Guardrails protect against malicious or harmful prompts.
🛠️ LLM hallucinations can be reduced by context control and retrieval techniques.
📲 Amazon Bedrock accelerates the adoption of structured LLM solutions.

Zeitleiste

00:00:00 - 00:05:00
Introduction to the session on prompt engineering best practices for large language models on Amazon Bedrock, highlighting the creative aspect of prompt engineering and its role compared to traditional engineering disciplines. A simple example is provided to illustrate how different prompts can lead to varied responses from a language model.
00:05:00 - 00:10:00
Exploration of techniques like 'one-shot prompting', where giving a clear example helps guide the model to the desired response. The session explains this technique with practical examples, demonstrating how setting initial conditions can influence the model's output.
00:10:00 - 00:15:00
Further discussion on 'few-shot prompting' and 'Chain of Thought prompting,' where providing several examples and encouraging a step-by-step thought process can significantly enhance the model's accuracy and reliability. The combination of these techniques to yield precise outputs is illustrated.
00:15:00 - 00:20:00
Introduction to more advanced prompting strategies, such as role prompting and the use of XML tags in structuring prompts, especially for complex outputs like formatted email responses. The focus is on achieving clarity and specificity to direct the model's behavior effectively.
00:20:00 - 00:25:00
Explanation of using structured prompt templates and large context handling in models like CLA with up to 100,000 tokens. The segment emphasizes the importance of pre-filling expected output and using XML tags to manage prompt complexity and improve response accuracy.
00:25:00 - 00:30:00
Coverage of advanced concepts like 'retrieval augmented generation', where additional context is dynamically integrated into the model’s response process, and the system prompt template for setting initial conversational context for different use cases. These are tied into practical applications like building a career coach assistant.
00:30:00 - 00:39:35
Insight into implementing function calls and agent frameworks within LLMs to manage user input and extend functionalities. Examples are given on setting up API-like interactions and safeguards against malicious input, reinforcing the robustness of prompt engineering in creating responsible AI applications.

Mind Map

Video-Fragen und Antworten

What is prompt engineering?
Prompt engineering involves crafting input instructions to guide large language models (LLMs) effectively for desired responses.
What are some key best practices for prompt engineering?
Best practices include clarity and specificity, role prompting, few-shot prompting, chain-of-thought prompting, and leveraging XML tags for structure.
What is Few-Shot Prompting?
Few-shot prompting provides multiple examples to the model to guide its behavior and improve accuracy.
What is Chain-of-Thought Prompting?
Chain-of-thought prompting involves instructing the LLM to think step by step for better reasoning capabilities.
How does Amazon Bedrock assist with LLM tasks?
Amazon Bedrock simplifies the use of multiple LLMs while offering features like Knowledge Bases for retrieval-augmented generation and agents for external API interactions.
What are XML tags used for in prompting?
XML tags are used to structure and organize the input prompts provided to models like Anthropic Claude.
What is retrieval-augmented generation?
This method combines external knowledge bases with LLMs by dynamically injecting relevant information into prompts based on user queries.
What are agents in Amazon Bedrock?
Agents enable the integration of external tools and APIs in conjunction with LLMs using function-based instructions.
How do you reduce LLM hallucinations?
Encourage LLMs to say 'I don't know' when uncertain, limit outputs to pre-defined contexts, and use retrieval-augmented generation for reliable context.
How can malicious prompts be mitigated?
Use harmlessness screens or guardrails like Amazon Bedrock's built-in features to filter harmful prompts and responses.

Weitere Video-Zusammenfassungen anzeigen

Erhalten Sie sofortigen Zugang zu kostenlosen YouTube-Videozusammenfassungen, die von AI unterstützt werden!

Untertitel

Automatisches Blättern:

00:00:03
so hello and welcome to aw Summit Berlin
00:00:06
and I hope you had some great session so
00:00:08
far my name is conine Gonzalez I'm a
00:00:10
principal Solutions architect with AWS
00:00:13
and my name is Elina lek and I'm an
00:00:15
associate Solutions architect with AWS
00:00:18
right so today we're going to talk about
00:00:20
prompt engineering best practices for
00:00:22
large language models on Amazon Bedrock
00:00:25
so let's dive in we are going to start
00:00:27
by setting the scene a little bit then
00:00:30
then we are going to look at some useful
00:00:31
techniques that Elina will then give you
00:00:35
some examples for on how to apply to
00:00:37
anthropic Cloud which is one of our
00:00:39
favorite models we'll take a quick look
00:00:42
Beyond prompting and finally we'll give
00:00:44
you some resources that you can use when
00:00:46
you start your own prompt engineering
00:00:49
Journey so prompt engineering really is
00:00:52
a bit of an art form and um to
00:00:55
understand this let's take a look at
00:00:56
what a prompt really means a prompt is
00:00:59
the information you pass into a large
00:01:00
language model to get some kind of
00:01:03
response so you write some text which is
00:01:05
the prompt you put it into the large
00:01:07
language model and you get a res
00:01:10
response back and um this is a bit of an
00:01:14
uncertain science so far right if we
00:01:16
look at a traditional job like an
00:01:19
engineer Engineers they can depend on
00:01:22
the rules of physics um so they always
00:01:25
know what they're getting similarly
00:01:27
software Engineers as a software
00:01:29
engineer you can depend on some
00:01:31
syntactical and semantic rules and
00:01:33
everything is very precise but as a
00:01:35
prompt engineer you're a bit more like
00:01:37
an artist and you have to try out
00:01:41
different things find your way around be
00:01:43
creative which is a good thing so why am
00:01:47
I saying that let's take a look at a
00:01:48
simple example if the prompt is what is
00:01:50
10 + 10 you might get as a response 20
00:01:54
if you think this is a math problem but
00:01:57
you might also get something like 10
00:01:59
plus 10 is an addition problem so this
00:02:02
is more like a
00:02:04
classification answer right or you can
00:02:06
get something like Quant SDS Ms which is
00:02:10
it would be a translation use case so
00:02:12
one prompt three different
00:02:15
responses uh there are other ways you
00:02:17
can have fun with prompting for example
00:02:19
in this case we're instructing the model
00:02:21
to uh with uh the beginning phrase of
00:02:24
you are a high school physics teacher
00:02:26
answer questions in one sentence and
00:02:29
then we give in a question like explain
00:02:32
quantum entanglement and we get
00:02:34
something that makes sense from a point
00:02:36
of view of a high school teacher quantum
00:02:37
entanglement is a phenomenon in which
00:02:39
two or more particles y y y that sort of
00:02:41
thing but if we start slightly
00:02:43
differently by telling the model you are
00:02:45
an excited three-year-old who ate a lot
00:02:47
of sugar answer questions in one
00:02:49
sentence and give it the same question
00:02:52
explain quantum entanglement we get a
00:02:54
very different response woo woo bash bam
00:02:58
particles go Zippy zappy when they're
00:02:59
together together and even if you take
00:03:00
them really far apart they still know
00:03:02
what their friend is doing cuz they're
00:03:04
magic connected same question different
00:03:08
answers and the difference really comes
00:03:10
down to prompt
00:03:12
engineering so instructions matter and
00:03:16
um when you put together your own
00:03:19
prompts think about Clarity and
00:03:22
specificity here are two ways of
00:03:25
prompting one way is tell me a story
00:03:26
about chaos but there are so many ways
00:03:29
you could tell a story about cows or you
00:03:31
can be much more specific and clear tell
00:03:33
me a story about cows it should be
00:03:35
roughly 2,000 words long and appropriate
00:03:37
for 5th graders it should be
00:03:38
entertaining but with a moral message
00:03:40
about the value of loyalty make it
00:03:42
amazing and
00:03:44
memorable so the key Point here is just
00:03:47
like humans llms cannot read your mind
00:03:50
as prompt Engineers it is our job to
00:03:54
bring our mind into a text prompt that
00:03:56
tells the model how to behave so here
00:03:59
are some useful techniques that you can
00:04:01
use one of the earliest discovered
00:04:04
techniques is called onot prompting
00:04:07
which essentially boils down to giving
00:04:08
the model an example of what you're
00:04:10
looking for for example here we want to
00:04:13
generate airpod codes out of text so we
00:04:17
might start with an example of what we
00:04:19
are really looking for and the example
00:04:20
is I want to F fly from Los Angeles to
00:04:22
Miami provide the airpods code only and
00:04:25
we are already giving the expected
00:04:27
results for this particular example
00:04:28
which is l ax and Mia for those two um
00:04:32
airport codes and then we follow up with
00:04:34
the actual question we want some answer
00:04:36
for I want to fly from Dallas to San
00:04:38
Francisco we start the assistance
00:04:40
response with airport Cod and square
00:04:43
bracket and now the model knows exactly
00:04:45
what we're expecting and it completes
00:04:47
our sentence with DFW and SFO so this is
00:04:50
an example of oneshot prompting we give
00:04:53
the model one example and use that to
00:04:57
illustrate what we are looking for so
00:04:59
that the model is kind of guided into
00:05:02
giving us the the right kind of response
00:05:04
we uh response we're looking for you can
00:05:07
also do F short prompting and F short
00:05:09
prompting is what you would expect we
00:05:11
give it multiple examples for example in
00:05:13
this case we want to use a large
00:05:15
language model to classify tweets we
00:05:18
give it three different examples of
00:05:20
three different classifications that we
00:05:21
are looking for and then the fourth one
00:05:23
is the actual thing we wanted to do
00:05:25
where we are kind of pasting in some
00:05:27
tweet and then we get uh hopefully what
00:05:29
we are looking for because now we gave
00:05:31
it some really good examples that tell
00:05:33
the model what we what we
00:05:36
want you can
00:05:38
also use a technique called Chain of
00:05:41
Thought prompting so with Chain of
00:05:44
Thought prompting we want to make sure
00:05:46
that the model
00:05:48
really puts some energy into thinking
00:05:51
clearly and we can actually do this
00:05:54
really easily by telling it let's think
00:05:56
step by step so on the left hand side
00:05:58
you see an example example where we're
00:06:00
asking the model to solve a puzzle for
00:06:02
us a juggler can juggle 16 balls half of
00:06:05
the balls are golf balls and half of
00:06:06
them are blue blah blah blah and we're
00:06:08
getting a wrong answer because the model
00:06:11
gets kind of confused right but when we
00:06:13
are adding the simple sentence let's
00:06:15
think step by step now we're getting a
00:06:17
good answer out of the same model this
00:06:20
is a surprising trick that researchers
00:06:22
found out in the early days of llms and
00:06:25
it still works so think about when
00:06:27
you're not getting the right results ask
00:06:29
simply ask it to to uh think step by
00:06:32
step uh which is called Chain of Thought
00:06:35
prompting and now we can combine these
00:06:38
two things we can use examples and use
00:06:41
that example to teach the model how to
00:06:44
think step by step so left hand side
00:06:47
again uh an example that doesn't work we
00:06:49
are using Roger has five tennis balls
00:06:52
that's the the the first puzzle and
00:06:54
we're giving it the answer we want like
00:06:56
the answer is 11 but the model doesn't
00:06:58
quite understand how to get to that
00:07:00
answer right on the right hand side same
00:07:04
example but now in our example answer
00:07:07
we're kind of telling we're guiding the
00:07:09
model along our thinking process for
00:07:12
example in this tennis ball example the
00:07:13
thinking process is Roger started with
00:07:16
five balls two cans of three tennis
00:07:17
balls each is six tennis balls 5 + 6
00:07:20
equal 11 the answer is 11 so we adding
00:07:23
the thinking process which is implicit
00:07:26
Chain of Thought prompting and now we we
00:07:28
can get when we paste the actual
00:07:30
question that we want some answer for
00:07:32
like the cafeteria had 23 apples and so
00:07:34
on now we're getting the right model
00:07:36
output including the thinking process
00:07:38
which also has the capability for us to
00:07:40
debug what the model was thinking so it
00:07:43
tells us how it arrived at the answer
00:07:45
and surprise it gets to the right answer
00:07:48
because it knows how to think so when
00:07:51
you craft examples think about using
00:07:54
examples where you're actually telling
00:07:56
the model how exactly to
00:07:58
think so so with those initial examples
00:08:02
out of way the way let's get way more
00:08:04
practical and I'd like to introduce you
00:08:07
to Elina who is going to show you how to
00:08:09
prompt anthropic cloth which is my
00:08:11
personal favorite model
00:08:13
here so are you looking for some
00:08:16
concrete examples of the best practices
00:08:19
that we recommend impr prompt
00:08:21
engineering this section will uh cover
00:08:25
nine best practices and I would actually
00:08:29
start was introducing myself so my name
00:08:31
is Elina lassik and I joined AWS two
00:08:34
years ago and well I wanted to follow my
00:08:37
passion in scaling ideas as well as
00:08:39
built machine learning products I have
00:08:42
had an honor to work with customers from
00:08:44
different segments of different company
00:08:46
sizes and they all have had one
00:08:51
question that question was what is that
00:08:55
model that I should start experimenting
00:08:58
with in many cases the answer was indeed
00:09:01
entropic models that is exactly why we
00:09:04
have decided to bring concrete examples
00:09:06
of prompt engineering with entropic
00:09:10
clot I would like to start
00:09:15
with introducing you to the latest
00:09:17
edition of entropic CLA model family
00:09:21
namely clae 3 with the corresponding API
00:09:25
namely messages API an important hint to
00:09:28
prompt engine engering is to think a
00:09:31
step back and consider how the llms were
00:09:35
trained in the case of CLA we have the
00:09:38
alternating dialogue between user and
00:09:42
assistant that is built in the form of
00:09:45
history of the conversation which in
00:09:47
this case would be
00:09:49
messages and that is an important hint
00:09:51
for us to also consider once we are
00:09:53
doing prompt
00:09:55
engineering so if you have previously
00:09:58
used uh models lower than three then you
00:10:01
would uh see something like completion
00:10:03
API and if you would like to use clot
00:10:06
three models you would need to use
00:10:08
messages API and on the right hand side
00:10:11
right now you can see the example of how
00:10:13
the input into messages API can look
00:10:15
like and that is following this idea of
00:10:18
the alternating dialogue between user
00:10:20
and assistant that are given as rowes
00:10:23
within messages and we also have
00:10:25
something called system prompt that we
00:10:27
will cover in several minutes minutes in
00:10:29
more
00:10:31
detail now I would like to start with
00:10:34
the best practice number one and you
00:10:36
won't be surprised by it because it is
00:10:40
the tip on indeed being very clear and
00:10:43
Direct in our
00:10:45
prompts clot definitely responds better
00:10:48
to clear and direct
00:10:50
instructions at the same time I am
00:10:53
personally still a big victim of using
00:10:55
you know this polite words like could
00:10:57
you please
00:10:59
claw can we consider and uh similar
00:11:03
things that are rather wake in case of
00:11:06
prompt engineering so it is helpful for
00:11:09
me to always recall the principle of
00:11:11
clarity here as well as something that
00:11:14
we could consider as rubber duck
00:11:16
principle in software development but in
00:11:19
case of building um LMS LM applications
00:11:23
we can think about Golden Rule of clear
00:11:26
prompting the idea here is that once we
00:11:28
are in doubt we can go with our prompt
00:11:31
to our friend or to our colleague and
00:11:33
indeed ask them to follow something that
00:11:37
we are pretty much prompting for if they
00:11:40
are able to follow llm is also likely to
00:11:43
follow on the right right uh hand side
00:11:46
of the slide you can see our example and
00:11:49
well if we are willing to have a hio on
00:11:52
a certain topic and we would like to
00:11:54
skip the Preamble that is in the form of
00:11:56
well here is this hu the idea would be
00:11:59
to indeed be very clear in that and
00:12:01
instruct the model to skip this
00:12:04
Preamble so having the clarity in mind
00:12:07
we would proceed and do you recall
00:12:10
Constantine's example on um the
00:12:12
difference between explanations of
00:12:16
quantum entanglement between PhD student
00:12:19
and this over glucose
00:12:21
kit well the idea here is indeed to
00:12:25
utilize
00:12:26
roles uh also known as Ro prompt
00:12:30
and it is helping us to give the context
00:12:33
to claw to understand what role it
00:12:35
should
00:12:37
inhibit it won't be big secret that LMS
00:12:40
are not the best ones when it comes to
00:12:42
math they generally don't calculate
00:12:45
responses they generate responses at the
00:12:48
same time when we're using Ro prompting
00:12:51
we can help CLA to be more close to math
00:12:54
world or the world of solving
00:12:57
puzzles thus imp proving
00:13:01
accuracy at the same time how I'm
00:13:04
thinking about role prompting is the
00:13:05
sort of dress code so if we are thinking
00:13:08
about the parties that we should join we
00:13:09
are checking what is the style that is
00:13:11
expected there should it be casual
00:13:13
should it be formal that is also the
00:13:15
concept that we can utilize here and
00:13:18
think about this way of speaking to llms
00:13:22
that can change the tone and potential
00:13:25
style of the conversation that we will
00:13:27
be going through and on on the right
00:13:29
hand side of the slide you can see the
00:13:31
example of the puzzle that without any
00:13:34
additional role is likely to be solved
00:13:37
in a wrong way however the simple trick
00:13:40
of saying you are a master solver of all
00:13:43
puzzles in the world can help to improve
00:13:45
the
00:13:48
accuracy Constantine has introduced us
00:13:50
to the concept of fusure prompting and
00:13:54
in context of entropic models how this
00:13:57
can be put into action
00:13:59
well we can use examples and we see that
00:14:03
examples are indeed very effective in
00:14:06
generally speaking it is also very
00:14:08
helpful to consider examples that are
00:14:11
coming from different edge
00:14:14
cases we know that the more examples the
00:14:18
better the response is of course with
00:14:20
certain trade-offs that we can see when
00:14:22
it comes to the number of tokens and on
00:14:25
the slide you can see the ex you can see
00:14:27
the example and if our goal is indeed to
00:14:30
get the concrete name name and surname
00:14:34
of the author of a certain passage
00:14:35
without anything extra we can pretty
00:14:38
much prompt llm to give us exactly this
00:14:43
by using
00:14:46
example we have also thought about uh
00:14:48
the concept of Chain of Thought and how
00:14:50
to use it with claw and it works in a
00:14:54
way that we can just give clot time to
00:14:57
think if we would look into the example
00:15:00
of the input that we are passing into
00:15:03
llm we are using thinking as our XML TX
00:15:08
and we are letting the model to put
00:15:11
chain of thought into action and
00:15:13
consider this stinking stepbystep
00:15:15
process within thinking
00:15:17
steps how the output looks like
00:15:22
Well normally we would get the response
00:15:25
from the assistant that would start with
00:15:27
thinking and then with answering and
00:15:29
what is interesting is that exactly
00:15:31
something that we would find within
00:15:33
thinking XML Tex is helpful for us to
00:15:36
debug or to
00:15:39
troubleshoot you can already see here
00:15:42
some peculiar feature of CLA models what
00:15:46
is it well it is indeed using XML Tex
00:15:49
and that is the tip number five that we
00:15:51
would like to
00:15:53
give XML texts are helpful to structure
00:15:57
the prompts that we are sending to l
00:15:59
LS
00:16:01
and once we would like to have certain
00:16:03
sections within our prompts that is
00:16:05
where XML text can help us to have the
00:16:08
structure if we would look into the
00:16:11
example of us willing to have the email
00:16:14
of well if we are the person that should
00:16:16
ask our team to show up at some early uh
00:16:20
time of the day as uh 6:00 a.m. and
00:16:23
draft this email in some formal response
00:16:25
if we will just throw this into model in
00:16:27
this form of well just show up at 600
00:16:29
a.m. because I say so the model can
00:16:33
sometimes misjudge and not understand
00:16:36
that we are actually speaking about the
00:16:38
email that we would like to send to uh
00:16:40
to our team and it will start responding
00:16:42
as if it was sent to
00:16:45
it and once we are introducing XML TXS
00:16:49
of well here is the context of the email
00:16:52
we can get the response that we were
00:16:53
looking
00:16:55
for we can already see here that we are
00:16:59
explicitly talking about the output that
00:17:01
we are willing to see and indeed we can
00:17:04
help clot with this output by
00:17:08
speaking and prefilling of our assistant
00:17:12
for getting the output that we are
00:17:16
expecting how can it be done well it is
00:17:18
done very often in the form of preing
00:17:22
within assistant so very common use case
00:17:25
that we observe from our customers is
00:17:27
some Json files or some y files that are
00:17:30
having certain structure and we would
00:17:31
like to get exactly files in this
00:17:33
structure once we are getting the
00:17:35
response from llms and we are here in
00:17:38
the example preing for Json files just
00:17:41
by using a first curly bracket which is
00:17:44
of course incre increasing the
00:17:46
likelihood of us getting Json file as
00:17:48
the
00:17:51
response another very popular feature of
00:17:54
cloud models is Big context base of two
00:17:59
100,000
00:18:01
tokens we would really encourage you to
00:18:03
use this context base efficiently
00:18:06
because well what is 200,000 tokens it
00:18:10
can be a
00:18:11
book it can be a book that we will just
00:18:14
pass to llm
00:18:17
directly and to utilize this way of uh
00:18:22
using context base effectively what we
00:18:24
recommend is again to consider XML text
00:18:27
to separate what we are passing and
00:18:31
instructions we would also think about
00:18:34
using quotes for us to have the response
00:18:37
that is closer to the content that we
00:18:39
are passing as well as well as it also
00:18:42
works with us as humans we can prompt
00:18:45
for Claude to read the document
00:18:48
carefully because it will be asked
00:18:50
questions
00:18:51
later and the last but not the least tip
00:18:54
here is again to reconsider fusure
00:18:57
prompting and to use the
00:18:59
questions and answer pairs in the form
00:19:01
of
00:19:05
examples when it comes to the prompts
00:19:07
that are having certain steps in between
00:19:11
what is also helpful is to consider
00:19:13
prompt
00:19:14
chaining I will uh uh jump straight into
00:19:17
the example so if we would be having the
00:19:20
task of extracting the names from
00:19:23
certain text and we would like to have
00:19:25
them in alphabetical order we can
00:19:27
definitely start with was saying well
00:19:29
just extract those names in alphabetical
00:19:31
order or we can consider splitting this
00:19:34
into two
00:19:35
prompts so that in the first one we
00:19:38
would have the extraction of names first
00:19:42
and in the second prompt we would have
00:19:45
them
00:19:47
alphabetized with this way of chaining
00:19:51
prompts we are increasing the likelihood
00:19:53
of getting the output that we are
00:19:55
looking for and you can also recall here
00:19:58
the example of prefilling the assistant
00:20:00
was this names XML
00:20:03
tag so the last but not the least tip
00:20:07
from us today would be on using
00:20:09
structured prom templates if we are
00:20:12
considering the example with the SL
00:20:14
documents that can be as I mentioned
00:20:17
books it is very helpful to put this
00:20:20
input Data before our
00:20:24
instruction if we are looking into a
00:20:26
different example of using certain input
00:20:30
data that is having the effect of
00:20:33
iteration or certain um dictionary or
00:20:36
List uh and in this case in the example
00:20:38
it would be the um types of different
00:20:42
animals the idea here would be to also
00:20:44
structure our uh prompt in a way that we
00:20:47
would have the iteration uh within this
00:20:49
input data and thus our output would be
00:20:53
having also different types of animals
00:20:57
put into it
00:21:00
so we have started talking here about
00:21:02
some structures that we are giving to
00:21:03
our prompts and now we would like to
00:21:07
cover something different we would like
00:21:10
to cover system prompts so those prompts
00:21:13
that are going into llm as initial ones
00:21:18
those ones that are setting the scene
00:21:20
for the
00:21:22
conversation you have already seen
00:21:24
different beats and pieces of them
00:21:26
within our previous best practices
00:21:29
at the same time now we would like to
00:21:30
bring them all together into one system
00:21:33
prompt template so to say you will see
00:21:36
nine sections here and again you can
00:21:39
definitely avoid using some of them if
00:21:41
they don't if that doesn't fit uh your
00:21:43
use case at the same time what we uh
00:21:46
strongly recommend here is for you to
00:21:48
follow the order that we are having
00:21:52
here I will jump straight into the
00:21:55
example of how this system uh prompt can
00:21:58
be built and our use case would be to
00:22:01
have a certain career coach that would
00:22:03
be helping users with their career
00:22:06
aspirations so we start with the element
00:22:09
of our system uh prom template which is
00:22:12
in giving the
00:22:13
role so here the F first things first we
00:22:16
are uh explicitly saying that uh you
00:22:19
will be acting as an AI career coach
00:22:22
with certain name and we would probably
00:22:24
like to maintain the same name for this
00:22:26
career coach during our conversation
00:22:29
and at the same time what is your goal
00:22:31
your goal is to give career advice to
00:22:33
users the second element would be here
00:22:36
to use certain tone context or style of
00:22:39
the conversation so you should maintain
00:22:41
a friendly customer service
00:22:45
tone the third tip is utilizing the
00:22:48
context base and is passing certain
00:22:51
background data for our career coach to
00:22:54
process and again it can be included
00:22:57
within XML tax of in this case
00:23:01
guide the next element would be to give
00:23:05
more detailed in task description as
00:23:08
well as certain rules and here uh you
00:23:10
can also pay attention to how we are
00:23:12
letting Claud say that well if you don't
00:23:15
know you can tell that as well as uh if
00:23:19
uh someone is asking something that is
00:23:20
not relevant to the topic you can
00:23:22
also say that well I am here only to
00:23:25
give career advice
00:23:28
The Next Step would be to use
00:23:31
examples on potential also common um
00:23:34
common uh cases as well as edge cases as
00:23:37
well as uh give some immediate data in
00:23:40
this case it can be probably the history
00:23:42
of the conversation or for example the
00:23:44
information of the profile of the user
00:23:45
that our career coach is talking to the
00:23:49
seventh step would be to indeed
00:23:51
reiterate on what is the immediate task
00:23:54
here and allow clot or another llm of
00:23:58
your choice to think step by step and
00:24:01
even take a deep breath because we've
00:24:03
heard it to work efficiently from
00:24:05
scientific
00:24:07
papers as the last but not the least
00:24:09
element here would be to set certain
00:24:14
output and again we are utilizing XML
00:24:17
tags here and that has been the
00:24:20
suggested elements of our system prom
00:24:22
template let me bring them all together
00:24:25
for
00:24:26
you and would be the overview and I see
00:24:30
that some of you are willing to take the
00:24:34
picture so now we are having the system
00:24:37
promt
00:24:38
template how can we move even further
00:24:41
and how can we get even more
00:24:44
professional and consider what can be
00:24:46
done after we have gone through
00:24:49
prompting well we do consider prompt
00:24:53
engineering to be an art form at the
00:24:56
same time with Constantine we have
00:24:57
decided to try to add science to this
00:25:00
art form and with simple steps help to
00:25:04
make art work for your use
00:25:07
cases so the first step would normally
00:25:10
in our journey would be to of course
00:25:13
understand the use case as well as
00:25:15
develop certain test
00:25:17
cases I live in Munich so I will walk
00:25:20
you through the example that would be
00:25:22
helping me with understanding Bavarian
00:25:24
culture or it can be also the culture of
00:25:26
b vonach um so let's think about this
00:25:28
use case and um develop the test the
00:25:32
test use case that well maybe I will be
00:25:35
asking about certain dishes that are
00:25:37
relevant to a certain area of
00:25:41
Germany thus my um preliminary prompt
00:25:46
would be on cooking a certain dish so
00:25:48
how do I how do I make haish P or how do
00:25:52
I cook o
00:25:54
BDA and of course afterwards I'm going
00:25:57
into the ative phases with
00:26:01
evaluations that would be then going
00:26:03
into Loops in this case the first Loop
00:26:06
would be for example to consider adding
00:26:08
the role such as well you are an
00:26:10
experienced ethnographer or you an
00:26:12
experienced
00:26:13
cook and thus we will be refining The
00:26:16
Prompt that will be coming to the point
00:26:18
of being polished to the extent that it
00:26:21
fulfills our
00:26:23
goals a tricky part here is indeed in
00:26:27
setting evaluation
00:26:30
and if we have some test cases that are
00:26:33
closed ended questions having yes or no
00:26:36
answer it is possible to evaluate that
00:26:40
was not um many issues but if it is
00:26:43
open-ended
00:26:45
question let's again look into the
00:26:47
example of K sple and if we would have
00:26:50
again this uh first prompt of how do I
00:26:52
make his we would get of course LM
00:26:55
response that would be telling us what
00:26:57
elements do and what ingredients we
00:26:59
would have in our dish what is
00:27:01
interesting here is that we can utilize
00:27:05
rubrics and Define what we would like to
00:27:07
include in the response for it to be
00:27:09
evaluated as positive or negative so I
00:27:12
would prefer to have certain ingredients
00:27:15
and once we have the response that
00:27:18
fulfills the criteria we would have the
00:27:21
uh positive result and we would be
00:27:23
knowing that well our prompt engineering
00:27:26
and LM response were indeed true for
00:27:31
us many customers come to us
00:27:34
and uh ask about uh what can be done for
00:27:38
reducing hallucinations and that is also
00:27:40
something that
00:27:42
uh we would like to uh encourage you to
00:27:45
do with uh dealing with gallins and what
00:27:49
can be done is indeed to give lm's
00:27:51
permission to say I don't
00:27:53
know as well as answer only if it is
00:27:56
very confident in its response
00:28:00
it is also possible to ask about uh
00:28:03
relevant quotes from the context that we
00:28:06
have passed and once our context is
00:28:09
growing once we are having certain
00:28:12
iterations and changes to the context
00:28:14
that we would still like to pass to an
00:28:17
llm we can consider using something
00:28:20
called retrieval augmented
00:28:23
generation retrieval augmented
00:28:26
generation would look in a certain way
00:28:28
way that instead of having the typical
00:28:31
flow of user asking the question from
00:28:33
the LM and getting the the the response
00:28:36
we would be also passing a certain
00:28:39
additional
00:28:40
context that would be in the form of
00:28:43
dynamically changing knowledge basis or
00:28:45
certain
00:28:47
internal sources we can consider product
00:28:50
descriptions we can consider fq pages
00:28:53
and in this way the flow would be
00:28:56
changing in a way that all all the in
00:28:59
all the documents um would be passed
00:29:02
into some uh Vector database and once
00:29:05
the user is sending the question we
00:29:08
first get everything relevant from the
00:29:12
context let's say three uh pieces of um
00:29:16
certain uh fq description and then that
00:29:19
would be passed as the context together
00:29:21
with the question to llm thus creating
00:29:25
the answer for
00:29:26
us and while the are different uh moving
00:29:29
blocks in this architecture so
00:29:32
Constantine shall we help the audience
00:29:35
to consider different tools for building
00:29:37
uh retrieval augmented generation sure
00:29:39
thank you Elina so I like to think of
00:29:42
retrieval augmented Generation Um
00:29:44
similar to giving the llm a cheat sheet
00:29:48
right so as you know from school
00:29:50
probably uh when you don't know the
00:29:52
answer you tend to make up something uh
00:29:54
but if you have a cheat sheet it's very
00:29:56
easy to solve the question so what we're
00:29:58
really doing here is we're using the
00:29:59
knowledge base and some way of searching
00:30:01
the knowledge base like a vector
00:30:03
database or search engine or anything
00:30:05
you can use to provide the relevant
00:30:07
information and then we generate a cheat
00:30:09
sheet that we inject into the prompt as
00:30:12
as part of the context so that increases
00:30:14
the likelihood of the llm giving the
00:30:16
right answer because now it has all the
00:30:18
data it needs um you can um set up this
00:30:22
thing on your own by putting together
00:30:25
the components like the knowledge base
00:30:27
and setting up the or frustration on all
00:30:29
all that um that is fun but it also can
00:30:32
become old quickly especially when your
00:30:35
uh manager is brething down your neck
00:30:36
and say hey when is my solution ready
00:30:38
right so um our job at AWS is to make it
00:30:41
easy for you so first um while ago we
00:30:44
introduced Amazon Bedrock which is makes
00:30:46
it super easy to use a selection of
00:30:48
different llms um with an easy to ous
00:30:51
API in a secure way uh so that you stay
00:30:54
in control with your data and then we
00:30:56
added an an additional feature called
00:30:58
knowledge bases for Amazon Bedrock which
00:31:00
essentially gives you the retrieval
00:31:02
augmented generation architecture that
00:31:03
we uh looked at before as a ready to use
00:31:07
feature of Bedrock so all you need to do
00:31:09
is you bring your knowledge base in the
00:31:11
form of some sort of document you can in
00:31:14
you can import them into knowledge bases
00:31:17
for Amazon Bedrock you can choose which
00:31:20
of the vector search engines or
00:31:22
databases you would like to use from a
00:31:24
collection of choices you can choose
00:31:26
which uh llm you want to use as part of
00:31:29
your architecture and then Bedrock does
00:31:31
everything else automatically for you
00:31:33
including the prompt engineering bit um
00:31:36
if you want to control the prompt prompt
00:31:38
you can actually add your own um
00:31:40
variation of the prompt or you can
00:31:42
simply use what is shipped inside Amazon
00:31:44
Bedrock so bedrock and knowledge bases
00:31:47
make it really easy for you to build
00:31:48
your own knowledge base uh architecture
00:31:51
your own retrieval augmented generation
00:31:53
and retrieval augmented generation is is
00:31:55
one of the most popular use cases we see
00:31:57
with customer
00:31:58
because it generates uh immediate um
00:32:01
value for your business no more
00:32:03
searching and hunting through long
00:32:05
product manuals long uh boring
00:32:07
documentation you can simply ask
00:32:09
questions get relevant answers that are
00:32:11
right from your documentation including
00:32:13
citations so that you know that you're
00:32:15
getting good answers
00:32:18
here another thing that our customers
00:32:20
like to use with llms is called agents
00:32:22
so agents and function calling um how
00:32:25
does it work well um instead of just
00:32:28
injecting documents from a search engine
00:32:31
like with retrieval augmented generation
00:32:33
you can go one step further and you can
00:32:35
tell the model hey model CLA in this
00:32:38
case right you actually have access to
00:32:40
some tools let me explain you how those
00:32:42
tools work and then you're giving it
00:32:44
like a prompt engineered version of an
00:32:46
API specification so what you're doing
00:32:49
here is um you're telling the model you
00:32:51
have access to a Weather Service you
00:32:53
have access to a Wikipedia you have
00:32:54
access to some other tool that you
00:32:56
choose how it works and then claw does
00:32:59
not really call those tools on its own
00:33:02
but it will tell you hey I would like to
00:33:04
use a Weather Service now with this
00:33:06
parameters and then you can engineer or
00:33:08
you can set up the actual function call
00:33:11
do the operation and give the result
00:33:14
back as part of the prompt so how does
00:33:16
it work well uh first of all you start
00:33:19
by putting together a prompt where you
00:33:21
describe to the model here are the tools
00:33:23
you have access to um think of this as
00:33:26
you put together your answer put them
00:33:28
into Claude as part of the prompt Claud
00:33:30
now decides whether it can answer the
00:33:32
question right away or whether it would
00:33:34
like to use one of these functions that
00:33:36
you gave it and in the case of no it
00:33:38
will probably answer with some
00:33:41
definitive answer like I don't know or
00:33:43
it says Okay I want to use those tools
00:33:45
and clot will actually output the
00:33:47
function call in the specification that
00:33:50
you told it so if you told it that
00:33:52
please use XML calls that are called
00:33:54
function calls invoke with the
00:33:55
parameters here and there it'll give you
00:33:57
the kind of XML that you expect and now
00:34:00
you can go and execute that so the
00:34:02
execution step looks like this in more
00:34:04
detail right so you get this function
00:34:06
call XML from CLA as a response you know
00:34:09
you can actually grab or you can you can
00:34:11
detect this with your traditional code
00:34:12
like a Lambda function hey Cloud wants
00:34:14
to call something you can use this XML
00:34:18
maybe validate it and then you actually
00:34:20
Implement your own client um in your own
00:34:23
code could be Lambda could be a
00:34:25
container whatever that does the
00:34:27
function call in this example it would
00:34:28
call the weather function and then you
00:34:31
inject the results back uh into their
00:34:34
own uh XML text like function results
00:34:36
and then you send the whole thing back
00:34:38
to clo right the system prompt the user
00:34:41
question the function call the function
00:34:43
results and then you let Claud decide
00:34:46
what to do next and then Claud sees oh I
00:34:48
have everything I know I want I have the
00:34:50
weather data I can now give a great
00:34:52
answer and then you get your answer so
00:34:55
that's how you implement your own
00:34:56
functions in the context of a large
00:34:58
language model by telling the model
00:35:00
these are the functions you can use
00:35:02
giving it an API specification in an
00:35:04
easy to use language such as XML Tex and
00:35:07
then you let the model decide when to
00:35:09
use which tool you are in control in how
00:35:12
you implement those tool those function
00:35:14
calls and then you give everything back
00:35:15
for the model to process and give a
00:35:17
definitive
00:35:19
answer now when you implement something
00:35:21
like this um explain the function
00:35:25
capabilities in great detail right this
00:35:27
is the same as explaining to a human
00:35:30
like your colleague how does this API
00:35:32
work um you can also provide a diverse
00:35:35
set of examples here is an example of
00:35:37
using this call to do X here is an
00:35:39
example of how the parameters might look
00:35:40
like for y and everything and um you can
00:35:43
actually use the stop tag or the end tag
00:35:47
of your function spec specification a
00:35:49
stop sequence which tells the Bedrock
00:35:52
service okay stop after this um sequence
00:35:55
here after this XML tag here because now
00:35:58
the XML part is over and you have a
00:36:00
definitive stop uh condition and um if
00:36:04
if it's if you're not getting reliable
00:36:06
results think about the prompt chaining
00:36:09
tip in the beginning don't make your
00:36:11
task too complicated break them down
00:36:13
into simple function calls and then do
00:36:16
them one by
00:36:17
one so here's an example on how it looks
00:36:20
in practice um this is how you would
00:36:22
describe the tool tool description tool
00:36:25
name is get weather uh this is the
00:36:27
descript destion um these are the
00:36:29
parameters location which is a string
00:36:31
you can actually add type declarations
00:36:33
there as well and all that stuff and and
00:36:35
you can use that as part of your system
00:36:37
prompt again you can do this all on your
00:36:40
own and it's fun the first time um but
00:36:42
you can also use a feature called agents
00:36:45
for Amazon Bedrock which allow you to
00:36:47
either programmatically set up these
00:36:49
functions with your own function code in
00:36:52
Lambda functions um or you can actually
00:36:55
go through the console and click
00:36:56
together your own headboard using
00:36:58
functions and um thereby reduce the
00:37:01
development time and the time to results
00:37:04
um to just uh a day or so instead of
00:37:07
weeks of trying out and figuring out
00:37:09
stuff and and prompt engineering and
00:37:11
everything else lastly let's take a look
00:37:14
at a different problem what to do with
00:37:17
um malicious users who want to inject
00:37:20
something into the prompt to get it to
00:37:22
do something that you don't want to uh
00:37:25
what if you have some bad user behavior
00:37:27
that you want to mitigate against the
00:37:29
good news is that anthropic clot is
00:37:31
already very resistant to jailbreaks and
00:37:33
other bad behavior uh which is one
00:37:35
reason why at Amazon we like to partner
00:37:38
very much with entropic uh because they
00:37:41
focus a lot on responsible use of AI but
00:37:44
again you can also get one step further
00:37:47
by adding an harmlessness screen to
00:37:50
evaluate the appropriateness of the
00:37:52
input prompt or the results right think
00:37:54
of this like a firewall that you're
00:37:56
putting in front of the llm that checks
00:37:59
whether input or output are really um
00:38:02
compliant to your own company rules and
00:38:06
um if a armful prompt is detected you
00:38:08
can filter it out based on uh that
00:38:11
example surprise you can use a different
00:38:13
llm or another llm to do that screen for
00:38:16
you so here's how a prompt might look
00:38:18
like for an harmless L screen uh a human
00:38:21
user would like you to continue a piece
00:38:23
of content here is the content so far if
00:38:25
the content refers to harmful graphic or
00:38:28
illegal activities reply with why so
00:38:30
you're essentially using an llm as a
00:38:32
classifier to classify whether this is a
00:38:34
malicious prompt or not and then you can
00:38:37
use that to filter it out again you can
00:38:39
do it on your own or you can use a
00:38:40
feature from Amazon Bedrock called guard
00:38:43
raids for Amazon badrock that let you
00:38:45
set up those guard rails either from
00:38:47
predefined Rules or from your own rules
00:38:49
that you bring into your
00:38:53
application so we hope this was useful
00:38:55
to you we hope you learned a lot
00:38:58
um no need to take so many photos you
00:39:00
can actually go to our helpful prompting
00:39:02
resources page that we prepared for you
00:39:04
U maybe take one more photo from this QR
00:39:07
code here and that'll guide you to a
00:39:09
page that we prepared for you with a
00:39:11
white paper with some prompting
00:39:13
resources some links to useful
00:39:14
documentation and even a workshop that
00:39:16
you can use to try out some things and
00:39:19
learn in your own pace and build your
00:39:22
own applications so with that thank you
00:39:25
very much for coming and enjoy the
00:39:27
evening of the summit thank you