00:00:00
Everything I'm about to cover in this
00:00:02
video is the kind of basic information
00:00:04
and terms and like a glossery index of
00:00:07
what you need to know as an author who
00:00:10
wants to use AI. A lot of even
00:00:12
experienced people who write with AI
00:00:14
don't understand a lot of these terms
00:00:16
that I'm going to be sharing with you
00:00:18
today. So, I want to make sure you have
00:00:20
all of these systems and terms in mind
00:00:23
so that you can understand how AI works
00:00:25
and how you can help it best serve you
00:00:28
as an author. So let's dive in.
00:00:35
The first thing that I wish more people
00:00:37
understood is the difference between a
00:00:38
large language model or LLM and a tool
00:00:42
like a chatbot or another what we call
00:00:45
rapper tools. And to help with this, I
00:00:47
like to use an analogy of electricity
00:00:49
and an appliance. You can use
00:00:51
electricity in a variety of different
00:00:54
ways. Obviously, if you're in your
00:00:55
kitchen and you have a food processor
00:00:58
and a microwave and a blender, all of
00:01:01
them use the same electricity, but they
00:01:03
use it in different ways. And in this
00:01:06
case, the large language model is like
00:01:08
the electricity. It's the raw power that
00:01:10
is creating the results that you want.
00:01:13
However, you can have a tool which is
00:01:16
similar to the appliance that uses that
00:01:19
power in unique ways and it might be
00:01:22
more useful for a writer versus a coder
00:01:25
or what have you. So you might have a
00:01:27
wrapper tool that's specifically geared
00:01:29
in a workflow format for someone who
00:01:32
wants to code and you might have another
00:01:34
one that uses the same large language
00:01:36
model uh that is using it for writing or
00:01:39
whatever the case is. and even chat bots
00:01:42
which we often think as being synonymous
00:01:44
with an AI large language model. Things
00:01:46
like chat GPT or claude. These chat bots
00:01:49
are actually rapper tools in themselves.
00:01:51
They are actually really simple rapper
00:01:53
tools that all they have is a simple
00:01:55
chat interface and maybe a few other
00:01:57
bells and whistles. But if I go to say
00:01:59
chat GBT here, you can see up here in
00:02:01
the corner I can select between
00:02:03
different large language models. ChatGBT
00:02:06
is not the same as a large language
00:02:07
model. is simply a tool that
00:02:09
incorporates a large language model into
00:02:11
it like an appliance uses electricity.
00:02:14
Now, where this metaphor breaks down a
00:02:16
little bit is the fact that there are
00:02:17
multiple LLMs and as far as I know,
00:02:19
there's not multiple types of
00:02:21
electricity, right? That slight
00:02:22
difference and each form of electricity.
00:02:25
Each large language model kind of has
00:02:26
its own strengths and weaknesses, but by
00:02:29
itself, a large language model needs
00:02:31
some kind of tool built around it in
00:02:34
order to really interface with it
00:02:35
effectively. Otherwise, it's just a
00:02:37
simple prompt and a response. It doesn't
00:02:39
have memory. It doesn't have a whole lot
00:02:41
of uh features. All right. The second
00:02:43
thing I want to make sure I get across
00:02:44
to you is the difference between regular
00:02:47
models and reasoning models. Now, this
00:02:49
is a relatively new development, at
00:02:51
least new in the world of AI, where we
00:02:53
started to get these reasoning models
00:02:55
that think before they give you an
00:02:58
answer. And by doing so, they actually
00:03:00
give you usually higher quality answers.
00:03:02
And they're particularly good for any
00:03:04
kind of task that involves reasoning or,
00:03:08
you know, something that a human would
00:03:09
actually spend time to think about it.
00:03:11
In the writing world, the best cases for
00:03:14
reasoning models are things like
00:03:16
editing. Because before, when you had a
00:03:18
large language model, it wouldn't
00:03:20
necessarily give you good advice about
00:03:22
your book. If you gave it your book and
00:03:23
said, "Hey, please edit this book for
00:03:25
me." it wouldn't necessarily do a good
00:03:27
job because it would give you stuff that
00:03:28
sounds like what an editor would say,
00:03:31
but it wasn't able to actually analyze
00:03:33
and think and make decisions based on
00:03:36
the book that it's reading. It just gave
00:03:39
you stuff that sounded correct. Now, a
00:03:41
thinking model, while not perfect in
00:03:43
that regard, is much better at actually
00:03:45
thinking through what may or may not be
00:03:48
a problem with your book, and it's able
00:03:50
to give you much better answers. There
00:03:52
are other useful applications of
00:03:53
thinking models that are usually good
00:03:55
for brainstorming or outlining. Anything
00:03:57
that requires a little bit heavier
00:03:59
thought they're particularly good at.
00:04:01
But there are still really good use
00:04:03
cases for non-reasoning models.
00:04:06
Particularly, for some reason, the
00:04:07
actual writing of pros tends to be
00:04:10
better or at least the same for a
00:04:13
cheaper price as the reasoning models.
00:04:16
So, something that I recommend for
00:04:17
pretty much all authors is this idea
00:04:20
that you should test out which large
00:04:22
language models do what because not only
00:04:24
are they all different and they have
00:04:25
different strengths, but you have your
00:04:27
reasoning versus your non-reasoning and
00:04:29
you should definitely check out with
00:04:30
every prompt that you have that you use
00:04:32
regularly, you should definitely test
00:04:35
which one is better for one or the
00:04:37
other. All right, the next thing I want
00:04:38
to be clear on is what is a context
00:04:41
window and a token. These are terms
00:04:43
you'll hear a lot around AI. And let me
00:04:45
explain them uh a little bit for you.
00:04:47
First, if we come into Open Router, uh
00:04:50
which is a a tool you'll see me use.
00:04:52
It's not really important for this
00:04:53
video, but in open router.ai, you can
00:04:56
kind of look at pretty much all of the
00:04:58
large language models that are publicly
00:05:01
available on the market. Not everything,
00:05:02
but uh the vast majority. And so if we
00:05:05
look at one of them just uh whatever's
00:05:07
here, you'll see it has a 96k context
00:05:12
window. Um and uh if you look at others
00:05:15
like let's look at some of the big ones
00:05:17
here like Llama Llama 4 Maverick has a
00:05:20
1.05 million context window. Now it just
00:05:24
so happens with Llama 4 like I wouldn't
00:05:27
trust this based on reports I'm I'm
00:05:30
hearing but technically it has that big
00:05:33
of a context window. What that means is
00:05:35
that it can process in this case one
00:05:38
over 1 million tokens in its prompt. So
00:05:41
you can give an enormous prompt with 1
00:05:44
million tokens. It'll be able to read
00:05:46
everything and understand everything in
00:05:48
theory within that prompt. So with a 1
00:05:51
million context window that's actually
00:05:53
really large. In theory, you should be
00:05:55
able to have it like you could have it
00:05:57
read all of your organization's
00:05:58
documents, all of your books, whatever,
00:06:00
and it would be able to understand all
00:06:02
of that. Now, in some cases, that isn't
00:06:04
always the case. It doesn't really turn
00:06:06
out that way. There's this thing called
00:06:08
the needle in the haystack problem where
00:06:10
if you give it a whole bunch of stuff,
00:06:11
it actually gets worse at identifying
00:06:13
small bits of information, kind of like
00:06:15
a human would, honestly. But regardless,
00:06:18
you should be able to give it a massive
00:06:21
amount of context in this case and have
00:06:23
it understand that. But first, we have
00:06:26
to understand what exactly is a token.
00:06:28
Uh because a token is not the same as
00:06:30
words. This is not the same as 1 million
00:06:32
words that it can understand. A token is
00:06:35
a specific unit that an AI uses to to
00:06:40
read text. And not all large language
00:06:43
models use tokens in the same way. If we
00:06:46
go to this tool developed by OpenAI, you
00:06:49
can see how the OpenAI models look at
00:06:52
tokens. So if I just like copy this text
00:06:54
here and place it in here, we can now
00:06:57
see exactly what these tokens look like.
00:07:00
And so you might have you know a lot of
00:07:02
them do translate to single words like
00:07:04
you know language models process text
00:07:06
using tokens those are all individual
00:07:09
words are processed as one token but
00:07:11
then we have the comma is also a single
00:07:13
token you'll see punctuation is often a
00:07:16
single token but you also have words
00:07:17
like this open ais is actually three
00:07:20
tokens there's open AI and then the
00:07:22
apostrophe s you'll also see other words
00:07:25
sometimes split up so over here we have
00:07:27
tokenized as two tokens right there. And
00:07:31
in general, so that's just the way it
00:07:33
works. So sometimes it splits words up.
00:07:35
Uh sometimes you have a single word per
00:07:37
token. Sometimes you have punctuation
00:07:39
function as a single token, etc. So the
00:07:42
rule of thumb that's pretty widely
00:07:44
considered in AI circles is that if you
00:07:47
have say 100 tokens, that's roughly
00:07:50
equivalent to 75 words. So, if you look
00:07:53
at this million token context window,
00:07:57
you can assume that you can fit in at
00:08:00
least 750,000 words into that context
00:08:04
window. So, that's enough for several
00:08:06
books that you could put in there and
00:08:07
have it read. Now, once again, the
00:08:10
needle in the haystack problem results
00:08:12
in, you know, if you give it too much
00:08:14
context, it actually performs worse in
00:08:17
general. I'm sure that will get better
00:08:18
over time. It already has gotten better
00:08:20
over time. I would say a standard
00:08:22
context window that you'll see on a lot
00:08:24
of models is about 200,000. That is
00:08:26
plenty for any of the use cases that I
00:08:29
have. Usually, my prompts do not exceed
00:08:31
15 to 20,000 words. So, all I would need
00:08:34
is a good, I don't know, 50,000 context
00:08:36
window to feel safe there. I definitely
00:08:39
don't need this million-doll context
00:08:41
window for my needs as a writer. And uh
00:08:44
if you are putting in entire books into
00:08:47
an AI to read, I would be skeptical of
00:08:51
the results you're going to get from
00:08:52
that. You will get better results by
00:08:54
having AI summarize each chapter
00:08:56
individually and then taking that
00:08:58
summary of your book and using that in
00:09:00
your context because it'll be fewer
00:09:02
words but still get the point across of
00:09:05
your book if you want it to understand
00:09:06
the context. So, say you're writing a
00:09:08
series and you're writing book two and
00:09:10
you want to make sure it understands
00:09:12
what happened in book one. Don't put
00:09:14
your entire book one in into your
00:09:15
context window. Uh, not only will that
00:09:18
dilute the prompt a little bit, but it
00:09:20
will also cost you a whole lot more
00:09:22
money. So, I would just summarize that
00:09:24
book, summarize each chapter
00:09:25
individually and then provide that
00:09:27
summary in the context. So, little
00:09:30
tricks like that make it very easy. All
00:09:32
right. Next thing I want to talk about
00:09:33
is temperature and other AI parameters.
00:09:36
So if I'm in the openAI playground here,
00:09:40
this is something that you cannot do
00:09:41
inside of chat GBT or most other chat
00:09:43
bots. You have to do this inside of
00:09:45
their own API, which you don't really
00:09:47
need to understand right now. I will do
00:09:49
more videos about APIs later, but you
00:09:51
can think of an API as the cord that
00:09:54
connects the appliance to the
00:09:56
electricity to the plug. Right? So we
00:09:58
have chat GBT, but then we also have
00:10:01
opening eyes playground. One of the
00:10:03
benefits of this is that you can select
00:10:05
a bunch of other models that are not
00:10:06
available in chat GBT. Uh so if we
00:10:09
select one uh and then select these
00:10:11
little settings right here, you'll see
00:10:13
it actually gives us temperature, max
00:10:16
tokens, and top P. If we go to back to
00:10:18
open router and set up a chat here,
00:10:21
let's just open a new chat. We'll pull
00:10:24
in a just a random let's do Gemini 2.5
00:10:27
Pro here. And then click these three
00:10:29
dots and then go to sampling parameters.
00:10:31
You see even more. So we have max
00:10:34
tokens, temperature, top P, top K,
00:10:37
frequency penalty, presence penalty,
00:10:39
repetition penalty, min P, and top A.
00:10:42
Most of these you do not need to worry
00:10:44
about. They don't really have much of an
00:10:47
effect or at least a desirable effect on
00:10:50
your words. But there are some that you
00:10:52
should probably play around with,
00:10:54
especially if you're not getting the
00:10:55
results that you want out of the prompts
00:10:57
that you're giving AI. First of all, max
00:10:58
tokens. This one is the most
00:11:00
straightforward. This one uh is just
00:11:02
like how many tokens can it read. So say
00:11:05
it has that million token window. In
00:11:08
this case it doesn't because this
00:11:09
actually no it does because this is um
00:11:11
this is Gemini 2.5 Pro which does have a
00:11:14
million token context window. But let's
00:11:17
say you don't really need that. You just
00:11:18
want to bring it down to like say
00:11:19
100,000 or 75,000 or something like you
00:11:22
can mess with that. Chat memory. You
00:11:25
won't see this everywhere. Uh but here
00:11:27
in open router this is just showing you
00:11:29
like how far back the chat goes. This
00:11:31
again will save you money if you are
00:11:34
having really really long chats but it
00:11:36
will result in the AI forgetting some of
00:11:40
the older chats if you have set this too
00:11:42
low. So you can set this like pretty
00:11:43
much to 420 which is the top here. So
00:11:46
your chat can go as far as 420 responses
00:11:49
before it gets starts to forget. Then we
00:11:52
have temperature. Now, this is probably
00:11:54
the most important one that you will
00:11:56
look at. And uh temperature, a lot of
00:11:59
people call it the creativity meter.
00:12:01
It's not exactly a fair comparison
00:12:03
because it's a little more nuanced than
00:12:04
that, but you can essentially think of
00:12:06
it in that way that the more we turn
00:12:08
this up, the more creative it gets. And
00:12:10
the more we turn this down, the less
00:12:13
creative and more predictable it gets.
00:12:15
Now, sometimes you want predictability.
00:12:17
Sometimes you want, you know, if you
00:12:18
have a certain automation and you want
00:12:20
the results to always be the same kind
00:12:23
of response, you can turn the
00:12:25
temperature down and it will get more
00:12:28
predictable in its responses. But
00:12:29
sometimes, especially as authors, when
00:12:32
we're trying to write something, we
00:12:34
wanted to maybe be a little bit more
00:12:36
creative. And so sometimes tweaking this
00:12:38
up just a little bit can be useful to
00:12:40
test to see, is it actually going to
00:12:42
write better when you do that. Now the
00:12:43
problem is if you raise this too high it
00:12:46
starts to just become gibberish because
00:12:48
what's happening is uh large language
00:12:50
models are predictive models. They use
00:12:52
probability to determine what word
00:12:54
should come next. And the higher you
00:12:57
send this temperature dial the less
00:13:00
predictable it's going to be. So it's
00:13:02
going to start to throw in some words
00:13:04
that maybe weren't necessarily the the
00:13:06
most logical words. But in some cases
00:13:09
you might want that. Uh, but if you turn
00:13:11
it up too high, it's just going to turn
00:13:12
into gibberish, and you don't want that
00:13:14
either. So, it's something to play
00:13:16
around with. Most of the time, I see
00:13:17
people keeping it within like a three
00:13:21
before or after. Uh, if by default, most
00:13:24
large language models start at one, but
00:13:27
you can bring it down to like 7 or up to
00:13:29
1.3 just kind of in that range to be
00:13:32
more or less safe without it turning
00:13:34
into gibberish or anything like that.
00:13:35
Top P is similar. So top p if you lower
00:13:39
this top p it makes the the responses a
00:13:42
little bit more predictable and this is
00:13:44
because it's actually restricting the
00:13:46
number of tokens that the model is
00:13:48
using. So if I brought this down here to
00:13:50
like 0.5 that means half of the words
00:13:53
that could be allowed are not being
00:13:55
allowed. So usually I keep this at the
00:13:57
at the top maybe you'd bring it down
00:13:58
just a little bit so you start avoiding
00:14:00
some of those really flowery overused
00:14:03
words and instead use a little bit more
00:14:05
predictable words. But that's just
00:14:07
another thing to predict. Top K I don't
00:14:09
really deal with. Frequency, presence,
00:14:11
and repetition penalty are all somewhat
00:14:13
useful because these kind of determine
00:14:16
how frequently your AI is going to be
00:14:19
using certain words or ideas. And so
00:14:21
playing around with this if you want to,
00:14:23
you know, reduce repetition can be
00:14:25
another useful one. But again, I, you
00:14:27
know, some of these are very subtle. I
00:14:29
wouldn't necessarily play around with
00:14:30
these too much unless you're really an
00:14:32
expert at AI and know what you're doing.
00:14:34
The one you are most likely to use the
00:14:36
most is this one, temperature. And I do
00:14:38
recommend you play around with
00:14:39
temperature just to see which one is
00:14:41
best. All right, last but not least, I
00:14:43
want to talk about the difference
00:14:44
between a system prompt, an AI response,
00:14:47
and a user input. So, we often talk
00:14:49
about prompts, right? Like for instance,
00:14:51
in my StoryHacker Silver group, I have a
00:14:53
bunch of prompts that I've given people
00:14:56
that are usually just like most of them
00:14:58
are just a single prompt. You enter it
00:14:59
into a chatbot and it gives you a
00:15:01
response. There's actually a lot more
00:15:02
nuance to prompting than that. And
00:15:05
sometimes your prompts can get a little
00:15:06
bit more complex. The prompts that I
00:15:08
have in my silver group, which by the
00:15:10
way you can check out below, is um I
00:15:13
have some prompts in there for
00:15:14
novelcfter. Novelcfter splits thing
00:15:16
things up into multiple parts. So it's
00:15:19
not just one prompt. It's actually one
00:15:21
prompt split into three groups. And
00:15:23
those three groups are system prompt, AI
00:15:25
response, and user prompt. So to show
00:15:27
you that here in open router, let's just
00:15:29
pull up Gemini 2.5 Pro again. And you
00:15:31
have right here when you click on these
00:15:34
three dots, you'll get this little box
00:15:36
that says system prompt. Likewise, if we
00:15:38
go here to OpenAI's dashboard, you can
00:15:41
see there's a box for the system message
00:15:43
here. And actually chat GBT, if you're
00:15:46
using chat GPT or other chat bots like
00:15:49
it, there is a feature in most chat bots
00:15:51
that give you similar results as the
00:15:52
system prompt. And you can do it by
00:15:55
creating a custom GPT or you can go here
00:15:58
to uh customize chat GBT and enter in
00:16:02
information here where it asks you like
00:16:04
what traits should chat GPT have and
00:16:06
then you can put in style information
00:16:09
different things in here and it
00:16:10
functions much the same as a system
00:16:11
prompt. Uh but let's go back to open
00:16:14
router so I can show this off. So a
00:16:16
system prompt is essentially the things
00:16:19
that you want it to always know. So, if
00:16:22
you have a regular task that you want AI
00:16:25
to perform, you put that in here. Uh,
00:16:28
for instance, a really simple version of
00:16:30
a good system prompt that I use all the
00:16:32
time is I just have in the system
00:16:34
prompt, I say, when I give you text, I
00:16:36
want you to summarize it in one sentence
00:16:38
or one paragraph or whatever. In that
00:16:40
case, in fact, let's do that right now.
00:16:42
When I give you text, summarize it in
00:16:47
one sentence. And then I just click out
00:16:48
of this. And now anytime I give it text,
00:16:50
it's just going to automatically uh do
00:16:53
what I asked it to because it always
00:16:55
remembers that that is the way it must
00:16:57
always behave. So the system prompt is
00:16:59
essentially establishing the parameters
00:17:02
that it must always do. Now if I go to
00:17:04
let's go to my website. So let's just
00:17:07
pick one of the articles on my website
00:17:09
and just copy this entire thing. Nice
00:17:12
long article. And we go to this and
00:17:15
paste the whole article in there. It
00:17:17
knows because it has the system prompt
00:17:19
that it now needs to summarize this in
00:17:22
one sentence. Now, it so happens that
00:17:24
Gemini 2.5 Pro is a reasoning model. So,
00:17:27
it it did a little bit of reasoning here
00:17:29
that you can look through, but it's here
00:17:31
it is the sentence, the single sentence
00:17:33
summary where it says, "This guide
00:17:35
provides writers with a detailed
00:17:36
four-act beat sheet for plotting cozy
00:17:37
mysteries covering essential story
00:17:39
structure, character development, and
00:17:40
genre conventions to craft a compelling
00:17:42
who done it." So, it did what I asked,
00:17:44
right? So, that's the system prompt, and
00:17:45
it's probably the most important on this
00:17:46
list. However, there are two other
00:17:48
prompt components. Let's just get rid of
00:17:51
this system prompt for a minute. Say we
00:17:53
don't really have a specific tasks that
00:17:57
we want the AI to do all of the time. We
00:17:59
just want to provide it with a simple
00:18:03
chat response right now. So, we just
00:18:06
want to provide it with a simple
00:18:08
question and have it answer us in a
00:18:11
simple way. So, there's no system prompt
00:18:13
at work here, but let's just say, "Give
00:18:15
me the lyrics to Mary had a little
00:18:20
lamb." This right here that I've just
00:18:22
entered in is the user prompt, and it's
00:18:24
generally the thing that we're most
00:18:26
familiar with prompting. It's the thing
00:18:29
that we ask it to do where we give it a
00:18:32
task. And a lot of time we put all of
00:18:35
our data, all of our info into that
00:18:38
single prompt when it might be better to
00:18:41
split up portions of that prompt into
00:18:44
the system prompt. Like for instance, if
00:18:46
I'm writing a book and I want the style
00:18:49
to be relatively consistent throughout
00:18:51
everything that I do, I put the style
00:18:53
prompt and maybe examples of the type of
00:18:55
writing I want, put that in the system
00:18:58
prompt because that's the stuff I wanted
00:18:59
to remember all the time. But then in
00:19:01
the user prompt, I give it specific
00:19:03
instructions for that specific part of
00:19:05
the scene that I wanted to write next.
00:19:07
So that's the user prompt. It's the most
00:19:09
straightforward. I think we all
00:19:10
understand how that works. And now the
00:19:12
response that it gave me, this is the
00:19:15
third component of a prompt and that is
00:19:17
the AI response. Now you might think
00:19:19
that's not a prompt Jason, that is a
00:19:22
response. And while that's true, there
00:19:25
is in some cases uh for instance, if
00:19:28
you're here in open router, I can
00:19:29
actually go through and edit this. So I
00:19:32
could edit this and um you know, say I
00:19:35
don't like the style of the response
00:19:37
that it gave me, I can rewrite it
00:19:39
myself. And then after I hit save, it
00:19:41
will think that my edited version is
00:19:44
what it said. And so one of the unique
00:19:46
things that a lot of people don't take
00:19:48
advantage of with AI is the fact that if
00:19:52
you put data into the AI response or you
00:19:56
edit the data from an AI response, then
00:19:58
what it gives you next. If you like say
00:20:01
you're continuing on with the next part
00:20:02
of your scene or something, it will
00:20:05
better match the what you saw in its
00:20:08
first response. So, let's say rather
00:20:10
than asking for lyrics of Mary had a
00:20:11
little lamb, I just asked it to write a
00:20:13
part of my scene, but there were some
00:20:15
bits in there that I didn't like. And
00:20:16
so, I changed the wording. I edited it
00:20:19
pretty heavily to make sure it sounded
00:20:21
like I wanted it to sound. And then I
00:20:23
asked it to write the next part of the
00:20:25
scene after that. Well, it will look at
00:20:28
what it wrote in the past. And because
00:20:31
it thinks that it wrote that, for some
00:20:33
reason, that helps it to be more
00:20:35
effective at writing the next bit. And
00:20:37
so that's just something to keep in
00:20:39
mind. Anyway, I hope these tips have
00:20:40
been super useful for you. Let me know
00:20:42
in the comments if you want to see
00:20:43
anything else like this, any other
00:20:45
things about AI that confuse you. I'll
00:20:48
be sure to try and take a look at those.
00:20:49
And in the meantime, go ahead and check
00:20:51
out my groups down below. My silver
00:20:53
group is uh really low cost. It's
00:20:55
onetime fee. You can get all my prompts
00:20:57
and all my frameworks in there, plus
00:20:59
access to a really thriving community
00:21:01
with thousands of members at this point.
00:21:03
And then there's also my gold group down
00:21:04
below which is on a wait list right now
00:21:07
but you can go ahead and I'll be opening
00:21:08
that pretty soon.