00:00:00
have you ever wished for a single
00:00:01
package that you could easily install
00:00:03
that has everything you need for local
00:00:05
AI well I have good news for you today
00:00:08
because I have exactly what you are
00:00:09
looking for I have actually never been
00:00:12
so excited to make a video on something
00:00:14
before today I'm going to show you an
00:00:15
incredible package for local AI
00:00:18
developed by the n8n team and this thing
00:00:20
has it all it's got old llama for the
00:00:23
llms quadrant for the vector database
00:00:26
postgress for the SQL database and then
00:00:28
n8n to tie it Al together with workflow
00:00:31
automations this thing is absolutely
00:00:33
incredible and I'm going to show you how
00:00:35
to set it up in just minutes then I'll
00:00:37
even show you how to extend it to make
00:00:39
it better and use it to create a full
00:00:42
rag AI agent in n8n so stick around
00:00:45
because I have a lot of value for you
00:00:47
today running your own AI infrastructure
00:00:49
is the way of the future especially
00:00:51
because of how accessible is becoming
00:00:54
and because open-source models like
00:00:56
llama are getting to the point where
00:00:58
they're so powerful that they're
00:01:00
actually able to compete with close
00:01:02
Source models like GPT and clad so now
00:01:05
is the time to jump on this and what I'm
00:01:07
about to show you is an excellent start
00:01:09
to doing so and at the end of this video
00:01:11
I'll even talk about how I'm going to
00:01:13
extend this package in the near future
00:01:15
just for you to make it even better all
00:01:18
right so here we are in the GitHub
00:01:19
repository for the self-hosted AI
00:01:21
starter kit by n8n now this repo is
00:01:24
really really basic and I love it there
00:01:27
are basically just two files that we
00:01:28
have to care about here we have our
00:01:30
environment variable file where we'll
00:01:32
set credentials for things like
00:01:33
postgress and then we have Docker
00:01:35
compose the caml file here where we'll
00:01:38
basically be bringing in everything
00:01:39
together like postgress quadrant and
00:01:41
olama to have a single package for our
00:01:43
local AI now the first thing that I want
00:01:46
to mention here is that this read me has
00:01:48
instructions for how to install
00:01:50
everything yourself but honestly it's
00:01:52
quite lacking and there's a couple of
00:01:53
holes that I want to fill in here with
00:01:55
ways to extend it to really make it what
00:01:57
I think that you need and so I'll go
00:01:59
through that a little bit and we'll
00:02:00
actually get this installed on our
00:02:02
computer now there are a couple of
00:02:03
dependencies before you start basically
00:02:06
you just need git and Docker so I'd
00:02:07
recommend installing GitHub desktop and
00:02:10
then Docker desktop as well because this
00:02:11
also has Docker compose with it which is
00:02:13
what we need to bring everything
00:02:15
together for one package so with that we
00:02:18
can go ahead and get started downloading
00:02:19
this on our computer so the first thing
00:02:21
you want to do to download this code is
00:02:23
copy the get clone command here with the
00:02:25
URL of the repository you'll go into a
00:02:28
terminal then and then paste in this
00:02:30
command for me I've already cloned this
00:02:32
that's why I get this error message but
00:02:33
you're going to get this code downloaded
00:02:35
on your computer and then you can change
00:02:37
your directory into this new repository
00:02:40
that you've pulled and so with this we
00:02:42
can now go and edit the files in any
00:02:44
editor of our choice I like using VSS
00:02:46
code and so if you have VSS code as well
00:02:48
you can just type in code Dot and this
00:02:50
is going to pull up everything in Visual
00:02:53
Studio code now the official
00:02:54
instructions in the readme that we just
00:02:56
saw would tell you at this point to run
00:02:58
everything with the docker compos post
00:03:00
command now that is not actually the
00:03:02
right Next Step I'm not really sure why
00:03:04
they say that cuz we have to actually go
00:03:05
and edit a couple of things in the code
00:03:07
to make it customized for us and that
00:03:10
starts with the EnV file so you're going
00:03:12
to want to go into your EnV file I've
00:03:14
just made a env. example file in this
00:03:17
case because I already have my
00:03:18
credentials set up so you'll go into
00:03:20
your EnV and then set up your postgress
00:03:22
username and password the database name
00:03:25
and then also a couple of n8n Secrets
00:03:27
these can be whatever you want just make
00:03:28
sure that they are very very here and
00:03:30
basically just a long alpha numeric
00:03:32
string and then with that we can go into
00:03:34
our Docker compose file and here's where
00:03:36
I want to make a couple of extensions to
00:03:38
really fill in the gaps so the couple of
00:03:40
things that were missing in the original
00:03:42
Docker compose file first of all for
00:03:45
some reason the postgress container
00:03:47
doesn't have the port exposed by default
00:03:49
so you can't actually go and use
00:03:51
postgress as your database in an NN
00:03:53
workflow I think n uses postgress
00:03:56
internally which is why it's set up like
00:03:58
that initially but we want to actually
00:03:59
be a to use postgress for our chat
00:04:02
memory for our agents and so I'm going
00:04:03
to show you how to do that basically all
00:04:05
you have to do is go down to the
00:04:08
postgress service here and then just add
00:04:10
these two lines of code right here ports
00:04:12
and then just a single item where we
00:04:14
have 5432 map to the port 5432 inside
00:04:18
the container and that way we can go
00:04:19
Local Host 5432 and access postgress so
00:04:23
that is super super important otherwise
00:04:25
we won't actually be able to access it
00:04:26
within an NA end workflow we're going to
00:04:28
be doing that later when we build the
00:04:29
rag AI agent now the other thing that we
00:04:32
want to do is we want to use olama for
00:04:34
our embeddings for our Vector database
00:04:36
as well now the base command when we
00:04:39
initialize olama is just this part right
00:04:41
here so we sleep for 3 seconds and then
00:04:45
we pull llama 3.1 with oama so that's
00:04:47
why we have llama 3.1 Available To Us by
00:04:50
default but what I've added here is
00:04:52
another line to pull one of the olama
00:04:55
embedding models and we need this if we
00:04:57
want to be able to use AMA for our
00:05:00
Rag and so I've added this line as well
00:05:02
that is very very key so that is
00:05:04
literally everything that you have to
00:05:06
change in the code to get this to work
00:05:07
and I'll even have a link in the
00:05:09
description of this video to my version
00:05:11
of this you can pull that directly if
00:05:12
you want to have all the customizations
00:05:14
that we just went over here and with
00:05:16
that we can go ahead and actually start
00:05:18
it with Docker compose and so the
00:05:20
installation instructions in the readme
00:05:22
are actually kind of useful here because
00:05:23
there's a slightly different Docker
00:05:25
compose command that you want to run
00:05:26
based on your architecture so if you
00:05:28
have a Nvidia G
00:05:30
you can follow these instructions which
00:05:31
are a bit more complicated but if you
00:05:33
want to you can and then you can run
00:05:35
with a GPU Nvidia profile and then if
00:05:38
you are a Mac User you follow this
00:05:40
Command right here and then for everyone
00:05:41
else like what I'm going to use in this
00:05:43
case even though I have a Nvidia GPU
00:05:45
I'll just keep it simple with Docker
00:05:47
compose d-profile CPU up and so we'll
00:05:50
copy this command go into our terminal
00:05:52
here and paste it in and in my case I
00:05:55
have already created all these
00:05:56
containers and so it's going to run
00:05:58
really really fast for me but in your
00:06:00
case it's going to have to pull each of
00:06:01
the images for olama postgress n8n and
00:06:05
quadrant and then start them all up and
00:06:07
it'll take a little bit because I also
00:06:08
have to do things like pulling llama 3.1
00:06:10
for the old llama container and so in my
00:06:13
case it's going to blast through this
00:06:14
pretty quick because it's already done a
00:06:15
lot of this I did that on purpose so it
00:06:17
can be a quicker walkthrough for you
00:06:19
here um but you can see all the
00:06:21
different containers the different
00:06:22
colors here that are running everything
00:06:24
to set me up for each of the different
00:06:26
services and so like right here for
00:06:27
example it pulled llama 3.1 and then
00:06:30
right here it pulled the embedding model
00:06:32
that I chose from AMA as well um and so
00:06:35
at this point it's basically done so I'm
00:06:36
going to pause here and come back when
00:06:38
everything is ready all right so
00:06:39
everything is good to go and now I'm
00:06:41
going to actually take you in a Docker
00:06:42
so we can see all of this running live
00:06:45
so you're going to want to open up your
00:06:46
Docker desktop and then you'll see one
00:06:48
record here for the self-hosted AI
00:06:51
starter kit you can click on this button
00:06:52
on the left hand side to expand it and
00:06:54
then we can see every container that is
00:06:56
currently running or ran for the setup
00:06:59
so they're going to four containers each
00:07:00
running for one of our different local
00:07:02
AI services and we can actually click
00:07:04
into each one of them which is super
00:07:06
cool because we can see the output of
00:07:08
each container and even go to the exec
00:07:10
tab to run Linux commands within each of
00:07:13
these containers and so you can actually
00:07:14
do things in real time as well without
00:07:16
having to restart the containers you can
00:07:18
go into the postgress container and run
00:07:21
commands to query your tables and stuff
00:07:23
you can go into actually I'll show you
00:07:24
this really quick you can go into the
00:07:26
olama container and you can pull in real
00:07:29
time like if I want to go to exec here I
00:07:31
can do AMA pull llama
00:07:35
3.1 if I can spell it right 70b so I can
00:07:38
pull models in real time and have those
00:07:40
updated and available to me in n8n
00:07:42
without having to actually restart
00:07:44
anything which is super super cool all
00:07:46
right so now is the really really fun
00:07:48
part because we get to use all the local
00:07:50
infrastructure that we spun up just now
00:07:52
to create a fully local rag AI agent
00:07:55
within n8n and so to access your new
00:07:57
self-hosted n8n you can just go to Local
00:08:00
Host Port 5678 and the way that you know
00:08:03
that this is the URL is either through
00:08:05
the docker logs for your n container or
00:08:08
in the readme that we went over um that
00:08:10
was in the GitHub repository we cloned
00:08:12
and with that we can dive into this
00:08:14
workflow that I created to use postgress
00:08:16
for the chat memory quadrant for Rag and
00:08:19
olama for the llm and the embedding
00:08:21
model and so this is a full rag AI agent
00:08:24
that I've already built out I don't want
00:08:25
to build it from scratch just because I
00:08:27
want this to be a quicker smooth walk
00:08:29
through for you but I'll still go step
00:08:31
by step through everything that I set up
00:08:32
here and so that you can understand it
00:08:34
for yourself and also just steal this
00:08:36
from me CU I'm going to have this in the
00:08:38
description link as well so you can pull
00:08:40
this workflow and bring it into your own
00:08:41
n8n instance and so with that we can go
00:08:44
ahead and get started so there are two
00:08:46
parts to this workflow first of all we
00:08:48
have the agent itself with the chat
00:08:50
interaction here so this chat widget is
00:08:52
how we can interact with our agent and
00:08:54
then we also have the workflow that is
00:08:56
going to bring files from Google Drive
00:08:58
into our knowledge base with quadrant
00:09:01
and so I'll show the agent first and
00:09:03
then I'll dive very quickly into how I
00:09:04
have this pipeline set up to pull files
00:09:07
in from a Google drive folder into my
00:09:09
knowledge base so we have the trigger
00:09:11
that I just mentioned there where we
00:09:12
have our chat input and that is fed
00:09:15
directly into this AI agent where we
00:09:17
hook up all the different local stuff
00:09:19
and so first of all we have our olama
00:09:21
chat model and so I'm referencing llama
00:09:23
3.1 colon latest which is the 8 billion
00:09:26
parameter model but if you want to do an
00:09:28
AMA PLL Within the container like I
00:09:30
showed you how to do you can use
00:09:31
literally any olama llm right here it is
00:09:34
just so so simple to set up and then for
00:09:36
the credentials here it is very easy you
00:09:39
just have to put in this base URL right
00:09:41
here it is so important that for the URL
00:09:44
you use
00:09:45
HTTP and instead of Local Host you
00:09:48
reference host. doer. internal otherwise
00:09:51
it will not work and then the port for
00:09:53
Alama is if you don't change it
00:09:56
11434 and you can get this port either
00:09:59
in the Docker compost file or in the
00:10:01
logs for the AMA container you'll see
00:10:03
this in a lot of places and so with that
00:10:05
we've got our llm set up for this agent
00:10:08
and then for the memory of course we're
00:10:09
going to use postgress and so I'll click
00:10:11
into this and we're just going to have
00:10:13
any kind of table name you have here and
00:10:14
N will create this automatically in your
00:10:16
postgress database and it'll get the
00:10:19
session ID from the previous node and
00:10:21
then for the credentials here this is
00:10:23
going to be based on what you set in
00:10:25
yourb file so we have our host which is
00:10:28
host. doer. internal again just like
00:10:30
with AMA and then the database name user
00:10:33
and password all three of those you
00:10:35
defined in your EnV file that we went
00:10:37
over earlier and the port for postgress
00:10:40
is
00:10:41
5432 and so with that we've got our
00:10:43
local chat memory set up it is that
00:10:45
simple and so we can move on to the last
00:10:46
part of this agent which is the tool for
00:10:49
rag so we have the vector store tool
00:10:52
that we attach to our agent and then we
00:10:54
hook in our quadrant Vector store for
00:10:56
this and so we're just going to retrieve
00:10:58
any documents based on the query that
00:11:00
comes into our agent and then for the
00:11:02
credentials for Quadrant we just have an
00:11:04
API key which this was filled in for me
00:11:06
by default so I hope it is for you as
00:11:08
well I think it's just the password for
00:11:10
the NN instance and then for the
00:11:12
quadrant URL this should look very very
00:11:15
familiar HTTP host. doer. internal and
00:11:19
then the port for Quadrant is 6333 again
00:11:21
you can get this from the docker compose
00:11:23
file because we have to expose that Port
00:11:25
make it available or you can get it from
00:11:27
the quadrant logs as well
00:11:30
and so one other thing that I want to
00:11:31
show that is so so cool with hosting
00:11:34
quadrant locally is if you go to local
00:11:36
hostport
00:11:38
6333 like I have right here you can see
00:11:40
in the top left slash dashboard it's
00:11:43
going to take you to your very own
00:11:45
self-hosted quadrant dashboard where you
00:11:48
can see all your collections your
00:11:50
knowledge base basically and you can see
00:11:52
all the different vectors that you have
00:11:53
in there you can click into visualize
00:11:56
and I can actually go and see all my
00:11:58
different vectors which this is a
00:12:00
document that I already have inserted as
00:12:02
I was testing things um so you can see
00:12:03
all the metadata the contents of each
00:12:05
chunk it is so so cool so we'll go back
00:12:08
to this in a little bit here but just
00:12:10
know that like you have so much
00:12:11
visibility into your own quadrant
00:12:12
instance and you can even go and like
00:12:14
run your own queries to uh get
00:12:16
collections or delete vectors or do a
00:12:18
search uh it's just really awesome so
00:12:21
yeah hosting quadrant is a beautiful
00:12:23
thing um and so with that we have our
00:12:25
quadrant Vector store and then we're
00:12:27
using olama for embeddings using that
00:12:29
embedding model that I pulled that I
00:12:31
added to the docker compost file and
00:12:34
then we're just going to use llama 3.1
00:12:36
again to parse the responses that we get
00:12:38
from rag when we do our lookups so that
00:12:40
is everything for our agent and so we'll
00:12:43
test this in a little bit but first I
00:12:45
want to actually show you the workflow
00:12:47
for ingesting files into our knowledge
00:12:49
base and so the way that works is we
00:12:51
have two triggers here basically
00:12:53
whenever a file is created in a specific
00:12:56
folder in Google Drive or if a file is
00:12:59
updated in that same folder we want to
00:13:02
run this pipeline to download the file
00:13:04
and put it into our quadrant Vector
00:13:05
database running locally and so that
00:13:08
folder that I have right here is this
00:13:10
meeting notes folder in my Google Drive
00:13:12
and specifically the document that I'm
00:13:14
going to use for testing purposes here
00:13:17
is these fake meeting notes that I made
00:13:19
I just generated something really really
00:13:20
silly here about a company that is
00:13:23
selling robotic pets and AI startup um
00:13:26
and so we're going to use this document
00:13:27
for our rag I'm not going to do a bunch
00:13:29
bunch of different documents um because
00:13:31
I want to keep this really simple right
00:13:32
now but you can definitely do that and
00:13:34
the quadrant Vector database can handle
00:13:36
that but for now I'm just using this
00:13:38
single document and so I'll walk through
00:13:40
step by step what this flow actually
00:13:42
looks like to ingest this into the
00:13:43
vector database and so first of all I'm
00:13:46
going to fetch a test event which is
00:13:48
going to be the creation of this meeting
00:13:50
Note file that I just showed you and
00:13:52
then we're going to feed that into this
00:13:53
node here which is going to extrapolate
00:13:55
a couple of key pieces of information
00:13:58
including the file ID and the folder ID
00:14:01
and so once we have that I'm going to go
00:14:03
on to this next step right here and this
00:14:05
is a very very important step okay let
00:14:08
me just stop here for a second there are
00:14:10
a lot of rag tutorials with n8n on
00:14:13
YouTube that miss this when you have
00:14:16
this Step at the end here I'm just going
00:14:17
to skip to the end really quick whether
00:14:20
this is super base quadrant pine cone it
00:14:22
doesn't matter when you have this
00:14:23
inserter it is not an upsert it is just
00:14:27
an insert and so what that means means
00:14:29
is if you reinsert a document you're
00:14:31
actually going to have duplicate vectors
00:14:33
for that document so if I update a
00:14:35
document in Google Drive and it
00:14:37
reinserts the vectors into my quadrant
00:14:40
Vector database I'm going to have the
00:14:42
old vectors for the first time I
00:14:44
ingested my document and then new
00:14:46
vectors for when I updated the file it
00:14:48
does not get rid of the old files or
00:14:51
update the vectors in place that is so
00:14:53
important to keep in mind and so I'm
00:14:56
giving a lot of value to you right here
00:14:58
by including this node and it's actually
00:15:00
custom code because there's not a way to
00:15:02
do it without code in n8n but it is all
00:15:05
good because you can just copy this from
00:15:07
me I'm going to have a link to this
00:15:08
workflow in the description like I said
00:15:10
so you can just download this and bring
00:15:12
it into your own n8n take my code here
00:15:14
which basically just uses Lang chain to
00:15:16
connect to my quadrant Vector store get
00:15:19
all of the vector IDs where the metadata
00:15:22
file ID is equal to the ID of the file
00:15:25
I'm currently ingesting and then it just
00:15:27
deletes those vectors so basically we
00:15:29
clear everything that's currently in the
00:15:31
vector database for this file so that we
00:15:34
can reinsert it and make sure that we
00:15:35
have zero duplicates that is so so
00:15:38
important because you don't want
00:15:40
different versions of your file existing
00:15:42
at the same time in your knowledge base
00:15:44
that will confuse the heck out of your
00:15:45
llm and so this is a very very important
00:15:49
step and so I'll run this as well and
00:15:51
that's going to delete everything so I
00:15:52
can even go back to quadrant here go to
00:15:55
my Collections and you can see now that
00:15:57
this number was nine when I first showed
00:15:59
this quadrant dashboard and now it is
00:16:00
zero but it's going to go back up to 9
00:16:03
when I finish this workflow so next up
00:16:05
we're going to download this Google
00:16:07
drive
00:16:08
file nice and simple uh then we're going
00:16:11
to extract the text from it and so this
00:16:13
it doesn't matter if it's a PDF a CSP a
00:16:15
Google doc it'll take the file and get
00:16:18
the raw text from it and then we're
00:16:21
going to insert it into our quadrant
00:16:23
Vector store and so now I'm going to run
00:16:25
test step here and we're going to go
00:16:26
back to the UI after it's done doing
00:16:29
these insertions you can see here nine
00:16:30
items because it chunked up my document
00:16:33
so we go back here and I'll refresh it
00:16:35
right now it's zero I'll refresh and
00:16:38
there we go boom we're back up to nine
00:16:39
chunks and the reason there's so many
00:16:41
chunks for such a small document is
00:16:44
because if we go to my chunk size here
00:16:46
in my recursive character text splitter
00:16:49
I have a chunk size of 100 so every
00:16:51
single time I put in a document it's
00:16:53
going to get split up into 100 character
00:16:55
chunks so I want to keep it small just
00:16:57
because I'm running llama 3 .1 locally I
00:17:00
don't have the most powerful computer
00:17:01
and so I want my prompts to be small so
00:17:03
I'm keeping my context lower by having
00:17:05
smaller chunk sizes and not returning a
00:17:07
lot of documents when I perform Rag and
00:17:10
so the other thing that I wanted to show
00:17:12
really quickly here is my document data
00:17:15
loader and so or my default data loader
00:17:17
I'm adding two pieces of metadata here
00:17:19
the file ID and the folder ID the more
00:17:22
important one right here is the file ID
00:17:24
because that is how I know that a vector
00:17:26
is tied to a specific document so I use
00:17:29
that in that other step right here to
00:17:32
delete the old document vectors before I
00:17:34
insert the new one so that's how I make
00:17:36
that connection there so that's kind of
00:17:38
the most in-depth part of this
00:17:39
walkthrough is how that all works and
00:17:41
having this custom code here but just
00:17:42
know that this is so so important so
00:17:44
just take this from me I hope that it
00:17:46
makes sense to an extent I spent a lot
00:17:48
of time making this work for you um so
00:17:50
yeah with that that is everything we've
00:17:53
got our agent fully set up everything
00:17:55
ingested in uh we have the document
00:17:58
currently in the knowledge base CU I ran
00:17:59
through that step by step and so now we
00:18:02
can go ahead and test this thing so I'm
00:18:03
going to go to the chat widget here
00:18:05
actually I'm going to save it first and
00:18:06
then go to the chat widget and then I'll
00:18:08
ask it a question that it can only
00:18:09
answer if it actually has the document
00:18:11
in the knowledge base and can retrieve
00:18:13
it so I'll say what is the ad campaign
00:18:16
focusing on and because this is llama
00:18:19
3.1 running locally it's going to
00:18:21
actually take a little bit to get a
00:18:22
response because I don't have the BPS
00:18:24
computer so I'm going to pause and come
00:18:26
back when it has an answer for me all
00:18:29
right so we got an answer from llama 3.1
00:18:31
and this is looking pretty good it's a
00:18:33
little bit awkward at the start of the
00:18:35
response here uh but this is just the
00:18:37
raw output without any instructions from
00:18:39
me to the model on how to format a
00:18:41
response and so you can very very easily
00:18:43
fix this by just adding to the system
00:18:45
prompt for the llm and telling it how to
00:18:47
respond with the information it's given
00:18:49
from rag but overall it does have the
00:18:51
right answer and it's talking about
00:18:52
robotic pets which obviously it is only
00:18:54
going to get that if it's using regag on
00:18:56
the meaning notes document that I have
00:18:58
uploaded through my Google Drive so this
00:19:00
is working absolutely beautifully now I
00:19:02
would probably want to do a lot more
00:19:04
testing with this whole setup U but just
00:19:06
to keep things simple right now I'm
00:19:08
going to leave it at this as a simple
00:19:09
example um but yeah I would encourage
00:19:11
you to just take this forward keep
00:19:13
working on this agent and um yeah it's
00:19:16
fully fully local it is just a beautiful
00:19:18
thing so I hope that this whole local AI
00:19:20
setup is just as cool for you as it is
00:19:22
for me because I have been having a
00:19:24
blast with this and I will continue to
00:19:26
as I keep expanding on it so just just
00:19:28
as I promised in the start of the video
00:19:31
I want to talk a little bit about how
00:19:32
I'm planning on expanding this in the
00:19:33
future to make it even better cuz here's
00:19:35
the thing this whole stack that I showed
00:19:38
here is a really good starting point but
00:19:39
there's some things I want to add on to
00:19:41
it as well to make it even more robust
00:19:43
things like redis for caching or a
00:19:44
self-hosted super base instead of the
00:19:47
vanilla postgress CU then it can handle
00:19:48
things like authentication as well maybe
00:19:50
even turning this into a whole local AI
00:19:52
Tech stack that would even include
00:19:54
things like the front end as well or
00:19:56
maybe baking in best practices for red
00:19:58
and llms or na end workflows for that to
00:20:01
make this more of like a template as
00:20:03
well to actually make it really really
00:20:04
easy to get started with local AI so I
00:20:07
hope that you're excited about that if
00:20:08
you are or if you found this video just
00:20:10
helpful in general getting you set up
00:20:12
with your local AI Tech stack I would
00:20:14
really appreciate a like and a subscribe
00:20:16
and with that I will see you in the next
00:20:18
video