What is included in the local AI package?

The package includes Llama for LLMs, Quadrant for the vector database, PostgreSQL for the SQL database, and n8n for workflow automations.

How can I set up this local AI infrastructure?

You need Git and Docker to start. Clone the repository from GitHub, edit the environment variable and Docker compose files, and run the setup using Docker Compose.

What are some of the benefits of local AI infrastructure?

Local AI allows control over data, better customization, and can be more cost-effective compared to using external services. Open-source models like Llama are becoming competitive with closed-source models, making this a good time to adopt local AI solutions.

How do I extend the package?

You can extend the package by adjusting Docker compose settings, adding additional models for Llama embeddings, or configuring the PostgreSQL ports according to your use case.

Can Quadrant be accessed through its own dashboard?

Yes, once set up, Quadrant can be accessed through its local dashboard using localhost on the specified port, allowing you to manage and visualize your vector data.

How can I integrate Google Drive with this solution?

You can set up an n8n workflow to pull documents from a specified folder in Google Drive and ingest them into the Quadrant Vector store.

What command should I use to start the Docker setup?

The specific Docker Compose command depends on your system's architecture. There are separate commands for Nvidia GPU users, Mac users, and others.

How does the AI agent use the local setup?

The AI agent uses Llama for language models, Quadrant for vector search, and PostgreSQL for memory storage, interacting through n8n workflows.

Is there a way to prevent duplicate vectors in the Quadrant database?

Yes, using a custom script in n8n, you can delete old vectors associated with a document before inserting updated ones, preventing duplication.

What future expansions are planned for the package?

Plans include adding components like Redis for caching and a self-hosted Supabase to enhance functionality beyond the basic AI stack.

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

00:20:19

https://www.youtube.com/watch?v=V_0dNE-H2gw

Résumé

TLDRThe video showcases a comprehensive package for setting up a local AI ecosystem, developed by the n8n team. This package combines several open-source tools: Llama for large language models (LLMs), Quadrant for vector database management, PostgreSQL for SQL databases, and n8n for workflow automation. The video provides step-by-step guidance on installing and customizing this AI stack using Docker and Git. It highlights the benefits of running AI locally, such as increased accessibility, control, and the growing competitiveness of open-source models like Llama against proprietary ones. Additionally, the video covers how to extend the tool with further configurations, manage workflow automation with n8n, and discusses future development plans to enhance the tool's capabilities. The video emphasizes the potential of local AI setups and encourages viewers to adopt these technologies for better data control and flexibility.

A retenir

💡 The package offers a complete self-hosted AI stack.
🚀 Easy setup with Git and Docker into a local environment.
🔧 Includes Llama, Quadrant, PostgreSQL, and n8n tools.
📈 Provides competitive local AI solutions with open-source models.
🛠️ Extensible with additional settings and custom code.
🌐 Integrates Google Drive for document processing with n8n.
🔍 Features a self-hosted Quadrant dashboard for easy data management.
🔄 Workflow automation is simplified with n8n.
🌟 Plans to incorporate Redis and Supabase for enhanced functionality.
📅 Future expansions aim to create a robust local AI tech stack.

Chronologie

00:00:00 - 00:05:00
An introduction is given about an exciting new package developed by the n8n team that simplifies local AI setup. It includes essential components like llama for language models, Quadrant for a vector database, and Postgres for SQL, all integrated with workflow automation by n8n. The goal is to demonstrate how to set up and extend it for creating a full AI agent.
00:05:00 - 00:10:00
The process of setting up the package is described. It involves downloading from GitHub, setting environment variables, and modifying the Docker compose file to add critical extensions, such as exposing the Postgres port and adding Olama's embedding model. Instructions are given for starting Docker according to different system architectures.
00:10:00 - 00:15:00
The video explains how to run and interact with Docker and its containers, including executing commands and pulling models in real time. It also describes setting up a RAG AI agent using n8n, leveraging various local services. The configuration involves using Olama for models and embeddings, Postgres for memory, and Quadrant for vector storage.
00:15:00 - 00:20:19
Details are provided on creating a workflow for ingesting files into a vector database and setting up a fully local RAG AI agent. It includes handling file updates uniquely by clearing previous vectors to avoid duplication, thus ensuring effective data management in the AI model. The response of the agent is tested and demonstrated as successful.

Afficher plus

Carte mentale

Vidéo Q&R

What is included in the local AI package?
The package includes Llama for LLMs, Quadrant for the vector database, PostgreSQL for the SQL database, and n8n for workflow automations.
How can I set up this local AI infrastructure?
You need Git and Docker to start. Clone the repository from GitHub, edit the environment variable and Docker compose files, and run the setup using Docker Compose.
What are some of the benefits of local AI infrastructure?
Local AI allows control over data, better customization, and can be more cost-effective compared to using external services. Open-source models like Llama are becoming competitive with closed-source models, making this a good time to adopt local AI solutions.
How do I extend the package?
You can extend the package by adjusting Docker compose settings, adding additional models for Llama embeddings, or configuring the PostgreSQL ports according to your use case.
Can Quadrant be accessed through its own dashboard?
Yes, once set up, Quadrant can be accessed through its local dashboard using localhost on the specified port, allowing you to manage and visualize your vector data.
How can I integrate Google Drive with this solution?
You can set up an n8n workflow to pull documents from a specified folder in Google Drive and ingest them into the Quadrant Vector store.
What command should I use to start the Docker setup?
The specific Docker Compose command depends on your system's architecture. There are separate commands for Nvidia GPU users, Mac users, and others.
How does the AI agent use the local setup?
The AI agent uses Llama for language models, Quadrant for vector search, and PostgreSQL for memory storage, interacting through n8n workflows.
Is there a way to prevent duplicate vectors in the Quadrant database?
Yes, using a custom script in n8n, you can delete old vectors associated with a document before inserting updated ones, preventing duplication.
What future expansions are planned for the package?
Plans include adding components like Redis for caching and a self-hosted Supabase to enhance functionality beyond the basic AI stack.

Voir plus de résumés vidéo

Accédez instantanément à des résumés vidéo gratuits sur YouTube grâce à l'IA !

Sous-titres

Défilement automatique:

00:00:00
have you ever wished for a single
00:00:01
package that you could easily install
00:00:03
that has everything you need for local
00:00:05
AI well I have good news for you today
00:00:08
because I have exactly what you are
00:00:09
looking for I have actually never been
00:00:12
so excited to make a video on something
00:00:14
before today I'm going to show you an
00:00:15
incredible package for local AI
00:00:18
developed by the n8n team and this thing
00:00:20
has it all it's got old llama for the
00:00:23
llms quadrant for the vector database
00:00:26
postgress for the SQL database and then
00:00:28
n8n to tie it Al together with workflow
00:00:31
automations this thing is absolutely
00:00:33
incredible and I'm going to show you how
00:00:35
to set it up in just minutes then I'll
00:00:37
even show you how to extend it to make
00:00:39
it better and use it to create a full
00:00:42
rag AI agent in n8n so stick around
00:00:45
because I have a lot of value for you
00:00:47
today running your own AI infrastructure
00:00:49
is the way of the future especially
00:00:51
because of how accessible is becoming
00:00:54
and because open-source models like
00:00:56
llama are getting to the point where
00:00:58
they're so powerful that they're
00:01:00
actually able to compete with close
00:01:02
Source models like GPT and clad so now
00:01:05
is the time to jump on this and what I'm
00:01:07
about to show you is an excellent start
00:01:09
to doing so and at the end of this video
00:01:11
I'll even talk about how I'm going to
00:01:13
extend this package in the near future
00:01:15
just for you to make it even better all
00:01:18
right so here we are in the GitHub
00:01:19
repository for the self-hosted AI
00:01:21
starter kit by n8n now this repo is
00:01:24
really really basic and I love it there
00:01:27
are basically just two files that we
00:01:28
have to care about here we have our
00:01:30
environment variable file where we'll
00:01:32
set credentials for things like
00:01:33
postgress and then we have Docker
00:01:35
compose the caml file here where we'll
00:01:38
basically be bringing in everything
00:01:39
together like postgress quadrant and
00:01:41
olama to have a single package for our
00:01:43
local AI now the first thing that I want
00:01:46
to mention here is that this read me has
00:01:48
instructions for how to install
00:01:50
everything yourself but honestly it's
00:01:52
quite lacking and there's a couple of
00:01:53
holes that I want to fill in here with
00:01:55
ways to extend it to really make it what
00:01:57
I think that you need and so I'll go
00:01:59
through that a little bit and we'll
00:02:00
actually get this installed on our
00:02:02
computer now there are a couple of
00:02:03
dependencies before you start basically
00:02:06
you just need git and Docker so I'd
00:02:07
recommend installing GitHub desktop and
00:02:10
then Docker desktop as well because this
00:02:11
also has Docker compose with it which is
00:02:13
what we need to bring everything
00:02:15
together for one package so with that we
00:02:18
can go ahead and get started downloading
00:02:19
this on our computer so the first thing
00:02:21
you want to do to download this code is
00:02:23
copy the get clone command here with the
00:02:25
URL of the repository you'll go into a
00:02:28
terminal then and then paste in this
00:02:30
command for me I've already cloned this
00:02:32
that's why I get this error message but
00:02:33
you're going to get this code downloaded
00:02:35
on your computer and then you can change
00:02:37
your directory into this new repository
00:02:40
that you've pulled and so with this we
00:02:42
can now go and edit the files in any
00:02:44
editor of our choice I like using VSS
00:02:46
code and so if you have VSS code as well
00:02:48
you can just type in code Dot and this
00:02:50
is going to pull up everything in Visual
00:02:53
Studio code now the official
00:02:54
instructions in the readme that we just
00:02:56
saw would tell you at this point to run
00:02:58
everything with the docker compos post
00:03:00
command now that is not actually the
00:03:02
right Next Step I'm not really sure why
00:03:04
they say that cuz we have to actually go
00:03:05
and edit a couple of things in the code
00:03:07
to make it customized for us and that
00:03:10
starts with the EnV file so you're going
00:03:12
to want to go into your EnV file I've
00:03:14
just made a env. example file in this
00:03:17
case because I already have my
00:03:18
credentials set up so you'll go into
00:03:20
your EnV and then set up your postgress
00:03:22
username and password the database name
00:03:25
and then also a couple of n8n Secrets
00:03:27
these can be whatever you want just make
00:03:28
sure that they are very very here and
00:03:30
basically just a long alpha numeric
00:03:32
string and then with that we can go into
00:03:34
our Docker compose file and here's where
00:03:36
I want to make a couple of extensions to
00:03:38
really fill in the gaps so the couple of
00:03:40
things that were missing in the original
00:03:42
Docker compose file first of all for
00:03:45
some reason the postgress container
00:03:47
doesn't have the port exposed by default
00:03:49
so you can't actually go and use
00:03:51
postgress as your database in an NN
00:03:53
workflow I think n uses postgress
00:03:56
internally which is why it's set up like
00:03:58
that initially but we want to actually
00:03:59
be a to use postgress for our chat
00:04:02
memory for our agents and so I'm going
00:04:03
to show you how to do that basically all
00:04:05
you have to do is go down to the
00:04:08
postgress service here and then just add
00:04:10
these two lines of code right here ports
00:04:12
and then just a single item where we
00:04:14
have 5432 map to the port 5432 inside
00:04:18
the container and that way we can go
00:04:19
Local Host 5432 and access postgress so
00:04:23
that is super super important otherwise
00:04:25
we won't actually be able to access it
00:04:26
within an NA end workflow we're going to
00:04:28
be doing that later when we build the
00:04:29
rag AI agent now the other thing that we
00:04:32
want to do is we want to use olama for
00:04:34
our embeddings for our Vector database
00:04:36
as well now the base command when we
00:04:39
initialize olama is just this part right
00:04:41
here so we sleep for 3 seconds and then
00:04:45
we pull llama 3.1 with oama so that's
00:04:47
why we have llama 3.1 Available To Us by
00:04:50
default but what I've added here is
00:04:52
another line to pull one of the olama
00:04:55
embedding models and we need this if we
00:04:57
want to be able to use AMA for our
00:05:00
Rag and so I've added this line as well
00:05:02
that is very very key so that is
00:05:04
literally everything that you have to
00:05:06
change in the code to get this to work
00:05:07
and I'll even have a link in the
00:05:09
description of this video to my version
00:05:11
of this you can pull that directly if
00:05:12
you want to have all the customizations
00:05:14
that we just went over here and with
00:05:16
that we can go ahead and actually start
00:05:18
it with Docker compose and so the
00:05:20
installation instructions in the readme
00:05:22
are actually kind of useful here because
00:05:23
there's a slightly different Docker
00:05:25
compose command that you want to run
00:05:26
based on your architecture so if you
00:05:28
have a Nvidia G
00:05:30
you can follow these instructions which
00:05:31
are a bit more complicated but if you
00:05:33
want to you can and then you can run
00:05:35
with a GPU Nvidia profile and then if
00:05:38
you are a Mac User you follow this
00:05:40
Command right here and then for everyone
00:05:41
else like what I'm going to use in this
00:05:43
case even though I have a Nvidia GPU
00:05:45
I'll just keep it simple with Docker
00:05:47
compose d-profile CPU up and so we'll
00:05:50
copy this command go into our terminal
00:05:52
here and paste it in and in my case I
00:05:55
have already created all these
00:05:56
containers and so it's going to run
00:05:58
really really fast for me but in your
00:06:00
case it's going to have to pull each of
00:06:01
the images for olama postgress n8n and
00:06:05
quadrant and then start them all up and
00:06:07
it'll take a little bit because I also
00:06:08
have to do things like pulling llama 3.1
00:06:10
for the old llama container and so in my
00:06:13
case it's going to blast through this
00:06:14
pretty quick because it's already done a
00:06:15
lot of this I did that on purpose so it
00:06:17
can be a quicker walkthrough for you
00:06:19
here um but you can see all the
00:06:21
different containers the different
00:06:22
colors here that are running everything
00:06:24
to set me up for each of the different
00:06:26
services and so like right here for
00:06:27
example it pulled llama 3.1 and then
00:06:30
right here it pulled the embedding model
00:06:32
that I chose from AMA as well um and so
00:06:35
at this point it's basically done so I'm
00:06:36
going to pause here and come back when
00:06:38
everything is ready all right so
00:06:39
everything is good to go and now I'm
00:06:41
going to actually take you in a Docker
00:06:42
so we can see all of this running live
00:06:45
so you're going to want to open up your
00:06:46
Docker desktop and then you'll see one
00:06:48
record here for the self-hosted AI
00:06:51
starter kit you can click on this button
00:06:52
on the left hand side to expand it and
00:06:54
then we can see every container that is
00:06:56
currently running or ran for the setup
00:06:59
so they're going to four containers each
00:07:00
running for one of our different local
00:07:02
AI services and we can actually click
00:07:04
into each one of them which is super
00:07:06
cool because we can see the output of
00:07:08
each container and even go to the exec
00:07:10
tab to run Linux commands within each of
00:07:13
these containers and so you can actually
00:07:14
do things in real time as well without
00:07:16
having to restart the containers you can
00:07:18
go into the postgress container and run
00:07:21
commands to query your tables and stuff
00:07:23
you can go into actually I'll show you
00:07:24
this really quick you can go into the
00:07:26
olama container and you can pull in real
00:07:29
time like if I want to go to exec here I
00:07:31
can do AMA pull llama
00:07:35
3.1 if I can spell it right 70b so I can
00:07:38
pull models in real time and have those
00:07:40
updated and available to me in n8n
00:07:42
without having to actually restart
00:07:44
anything which is super super cool all
00:07:46
right so now is the really really fun
00:07:48
part because we get to use all the local
00:07:50
infrastructure that we spun up just now
00:07:52
to create a fully local rag AI agent
00:07:55
within n8n and so to access your new
00:07:57
self-hosted n8n you can just go to Local
00:08:00
Host Port 5678 and the way that you know
00:08:03
that this is the URL is either through
00:08:05
the docker logs for your n container or
00:08:08
in the readme that we went over um that
00:08:10
was in the GitHub repository we cloned
00:08:12
and with that we can dive into this
00:08:14
workflow that I created to use postgress
00:08:16
for the chat memory quadrant for Rag and
00:08:19
olama for the llm and the embedding
00:08:21
model and so this is a full rag AI agent
00:08:24
that I've already built out I don't want
00:08:25
to build it from scratch just because I
00:08:27
want this to be a quicker smooth walk
00:08:29
through for you but I'll still go step
00:08:31
by step through everything that I set up
00:08:32
here and so that you can understand it
00:08:34
for yourself and also just steal this
00:08:36
from me CU I'm going to have this in the
00:08:38
description link as well so you can pull
00:08:40
this workflow and bring it into your own
00:08:41
n8n instance and so with that we can go
00:08:44
ahead and get started so there are two
00:08:46
parts to this workflow first of all we
00:08:48
have the agent itself with the chat
00:08:50
interaction here so this chat widget is
00:08:52
how we can interact with our agent and
00:08:54
then we also have the workflow that is
00:08:56
going to bring files from Google Drive
00:08:58
into our knowledge base with quadrant
00:09:01
and so I'll show the agent first and
00:09:03
then I'll dive very quickly into how I
00:09:04
have this pipeline set up to pull files
00:09:07
in from a Google drive folder into my
00:09:09
knowledge base so we have the trigger
00:09:11
that I just mentioned there where we
00:09:12
have our chat input and that is fed
00:09:15
directly into this AI agent where we
00:09:17
hook up all the different local stuff
00:09:19
and so first of all we have our olama
00:09:21
chat model and so I'm referencing llama
00:09:23
3.1 colon latest which is the 8 billion
00:09:26
parameter model but if you want to do an
00:09:28
AMA PLL Within the container like I
00:09:30
showed you how to do you can use
00:09:31
literally any olama llm right here it is
00:09:34
just so so simple to set up and then for
00:09:36
the credentials here it is very easy you
00:09:39
just have to put in this base URL right
00:09:41
here it is so important that for the URL
00:09:44
you use
00:09:45
HTTP and instead of Local Host you
00:09:48
reference host. doer. internal otherwise
00:09:51
it will not work and then the port for
00:09:53
Alama is if you don't change it
00:09:56
11434 and you can get this port either
00:09:59
in the Docker compost file or in the
00:10:01
logs for the AMA container you'll see
00:10:03
this in a lot of places and so with that
00:10:05
we've got our llm set up for this agent
00:10:08
and then for the memory of course we're
00:10:09
going to use postgress and so I'll click
00:10:11
into this and we're just going to have
00:10:13
any kind of table name you have here and
00:10:14
N will create this automatically in your
00:10:16
postgress database and it'll get the
00:10:19
session ID from the previous node and
00:10:21
then for the credentials here this is
00:10:23
going to be based on what you set in
00:10:25
yourb file so we have our host which is
00:10:28
host. doer. internal again just like
00:10:30
with AMA and then the database name user
00:10:33
and password all three of those you
00:10:35
defined in your EnV file that we went
00:10:37
over earlier and the port for postgress
00:10:40
is
00:10:41
5432 and so with that we've got our
00:10:43
local chat memory set up it is that
00:10:45
simple and so we can move on to the last
00:10:46
part of this agent which is the tool for
00:10:49
rag so we have the vector store tool
00:10:52
that we attach to our agent and then we
00:10:54
hook in our quadrant Vector store for
00:10:56
this and so we're just going to retrieve
00:10:58
any documents based on the query that
00:11:00
comes into our agent and then for the
00:11:02
credentials for Quadrant we just have an
00:11:04
API key which this was filled in for me
00:11:06
by default so I hope it is for you as
00:11:08
well I think it's just the password for
00:11:10
the NN instance and then for the
00:11:12
quadrant URL this should look very very
00:11:15
familiar HTTP host. doer. internal and
00:11:19
then the port for Quadrant is 6333 again
00:11:21
you can get this from the docker compose
00:11:23
file because we have to expose that Port
00:11:25
make it available or you can get it from
00:11:27
the quadrant logs as well
00:11:30
and so one other thing that I want to
00:11:31
show that is so so cool with hosting
00:11:34
quadrant locally is if you go to local
00:11:36
hostport
00:11:38
6333 like I have right here you can see
00:11:40
in the top left slash dashboard it's
00:11:43
going to take you to your very own
00:11:45
self-hosted quadrant dashboard where you
00:11:48
can see all your collections your
00:11:50
knowledge base basically and you can see
00:11:52
all the different vectors that you have
00:11:53
in there you can click into visualize
00:11:56
and I can actually go and see all my
00:11:58
different vectors which this is a
00:12:00
document that I already have inserted as
00:12:02
I was testing things um so you can see
00:12:03
all the metadata the contents of each
00:12:05
chunk it is so so cool so we'll go back
00:12:08
to this in a little bit here but just
00:12:10
know that like you have so much
00:12:11
visibility into your own quadrant
00:12:12
instance and you can even go and like
00:12:14
run your own queries to uh get
00:12:16
collections or delete vectors or do a
00:12:18
search uh it's just really awesome so
00:12:21
yeah hosting quadrant is a beautiful
00:12:23
thing um and so with that we have our
00:12:25
quadrant Vector store and then we're
00:12:27
using olama for embeddings using that
00:12:29
embedding model that I pulled that I
00:12:31
added to the docker compost file and
00:12:34
then we're just going to use llama 3.1
00:12:36
again to parse the responses that we get
00:12:38
from rag when we do our lookups so that
00:12:40
is everything for our agent and so we'll
00:12:43
test this in a little bit but first I
00:12:45
want to actually show you the workflow
00:12:47
for ingesting files into our knowledge
00:12:49
base and so the way that works is we
00:12:51
have two triggers here basically
00:12:53
whenever a file is created in a specific
00:12:56
folder in Google Drive or if a file is
00:12:59
updated in that same folder we want to
00:13:02
run this pipeline to download the file
00:13:04
and put it into our quadrant Vector
00:13:05
database running locally and so that
00:13:08
folder that I have right here is this
00:13:10
meeting notes folder in my Google Drive
00:13:12
and specifically the document that I'm
00:13:14
going to use for testing purposes here
00:13:17
is these fake meeting notes that I made
00:13:19
I just generated something really really
00:13:20
silly here about a company that is
00:13:23
selling robotic pets and AI startup um
00:13:26
and so we're going to use this document
00:13:27
for our rag I'm not going to do a bunch
00:13:29
bunch of different documents um because
00:13:31
I want to keep this really simple right
00:13:32
now but you can definitely do that and
00:13:34
the quadrant Vector database can handle
00:13:36
that but for now I'm just using this
00:13:38
single document and so I'll walk through
00:13:40
step by step what this flow actually
00:13:42
looks like to ingest this into the
00:13:43
vector database and so first of all I'm
00:13:46
going to fetch a test event which is
00:13:48
going to be the creation of this meeting
00:13:50
Note file that I just showed you and
00:13:52
then we're going to feed that into this
00:13:53
node here which is going to extrapolate
00:13:55
a couple of key pieces of information
00:13:58
including the file ID and the folder ID
00:14:01
and so once we have that I'm going to go
00:14:03
on to this next step right here and this
00:14:05
is a very very important step okay let
00:14:08
me just stop here for a second there are
00:14:10
a lot of rag tutorials with n8n on
00:14:13
YouTube that miss this when you have
00:14:16
this Step at the end here I'm just going
00:14:17
to skip to the end really quick whether
00:14:20
this is super base quadrant pine cone it
00:14:22
doesn't matter when you have this
00:14:23
inserter it is not an upsert it is just
00:14:27
an insert and so what that means means
00:14:29
is if you reinsert a document you're
00:14:31
actually going to have duplicate vectors
00:14:33
for that document so if I update a
00:14:35
document in Google Drive and it
00:14:37
reinserts the vectors into my quadrant
00:14:40
Vector database I'm going to have the
00:14:42
old vectors for the first time I
00:14:44
ingested my document and then new
00:14:46
vectors for when I updated the file it
00:14:48
does not get rid of the old files or
00:14:51
update the vectors in place that is so
00:14:53
important to keep in mind and so I'm
00:14:56
giving a lot of value to you right here
00:14:58
by including this node and it's actually
00:15:00
custom code because there's not a way to
00:15:02
do it without code in n8n but it is all
00:15:05
good because you can just copy this from
00:15:07
me I'm going to have a link to this
00:15:08
workflow in the description like I said
00:15:10
so you can just download this and bring
00:15:12
it into your own n8n take my code here
00:15:14
which basically just uses Lang chain to
00:15:16
connect to my quadrant Vector store get
00:15:19
all of the vector IDs where the metadata
00:15:22
file ID is equal to the ID of the file
00:15:25
I'm currently ingesting and then it just
00:15:27
deletes those vectors so basically we
00:15:29
clear everything that's currently in the
00:15:31
vector database for this file so that we
00:15:34
can reinsert it and make sure that we
00:15:35
have zero duplicates that is so so
00:15:38
important because you don't want
00:15:40
different versions of your file existing
00:15:42
at the same time in your knowledge base
00:15:44
that will confuse the heck out of your
00:15:45
llm and so this is a very very important
00:15:49
step and so I'll run this as well and
00:15:51
that's going to delete everything so I
00:15:52
can even go back to quadrant here go to
00:15:55
my Collections and you can see now that
00:15:57
this number was nine when I first showed
00:15:59
this quadrant dashboard and now it is
00:16:00
zero but it's going to go back up to 9
00:16:03
when I finish this workflow so next up
00:16:05
we're going to download this Google
00:16:07
drive
00:16:08
file nice and simple uh then we're going
00:16:11
to extract the text from it and so this
00:16:13
it doesn't matter if it's a PDF a CSP a
00:16:15
Google doc it'll take the file and get
00:16:18
the raw text from it and then we're
00:16:21
going to insert it into our quadrant
00:16:23
Vector store and so now I'm going to run
00:16:25
test step here and we're going to go
00:16:26
back to the UI after it's done doing
00:16:29
these insertions you can see here nine
00:16:30
items because it chunked up my document
00:16:33
so we go back here and I'll refresh it
00:16:35
right now it's zero I'll refresh and
00:16:38
there we go boom we're back up to nine
00:16:39
chunks and the reason there's so many
00:16:41
chunks for such a small document is
00:16:44
because if we go to my chunk size here
00:16:46
in my recursive character text splitter
00:16:49
I have a chunk size of 100 so every
00:16:51
single time I put in a document it's
00:16:53
going to get split up into 100 character
00:16:55
chunks so I want to keep it small just
00:16:57
because I'm running llama 3 .1 locally I
00:17:00
don't have the most powerful computer
00:17:01
and so I want my prompts to be small so
00:17:03
I'm keeping my context lower by having
00:17:05
smaller chunk sizes and not returning a
00:17:07
lot of documents when I perform Rag and
00:17:10
so the other thing that I wanted to show
00:17:12
really quickly here is my document data
00:17:15
loader and so or my default data loader
00:17:17
I'm adding two pieces of metadata here
00:17:19
the file ID and the folder ID the more
00:17:22
important one right here is the file ID
00:17:24
because that is how I know that a vector
00:17:26
is tied to a specific document so I use
00:17:29
that in that other step right here to
00:17:32
delete the old document vectors before I
00:17:34
insert the new one so that's how I make
00:17:36
that connection there so that's kind of
00:17:38
the most in-depth part of this
00:17:39
walkthrough is how that all works and
00:17:41
having this custom code here but just
00:17:42
know that this is so so important so
00:17:44
just take this from me I hope that it
00:17:46
makes sense to an extent I spent a lot
00:17:48
of time making this work for you um so
00:17:50
yeah with that that is everything we've
00:17:53
got our agent fully set up everything
00:17:55
ingested in uh we have the document
00:17:58
currently in the knowledge base CU I ran
00:17:59
through that step by step and so now we
00:18:02
can go ahead and test this thing so I'm
00:18:03
going to go to the chat widget here
00:18:05
actually I'm going to save it first and
00:18:06
then go to the chat widget and then I'll
00:18:08
ask it a question that it can only
00:18:09
answer if it actually has the document
00:18:11
in the knowledge base and can retrieve
00:18:13
it so I'll say what is the ad campaign
00:18:16
focusing on and because this is llama
00:18:19
3.1 running locally it's going to
00:18:21
actually take a little bit to get a
00:18:22
response because I don't have the BPS
00:18:24
computer so I'm going to pause and come
00:18:26
back when it has an answer for me all
00:18:29
right so we got an answer from llama 3.1
00:18:31
and this is looking pretty good it's a
00:18:33
little bit awkward at the start of the
00:18:35
response here uh but this is just the
00:18:37
raw output without any instructions from
00:18:39
me to the model on how to format a
00:18:41
response and so you can very very easily
00:18:43
fix this by just adding to the system
00:18:45
prompt for the llm and telling it how to
00:18:47
respond with the information it's given
00:18:49
from rag but overall it does have the
00:18:51
right answer and it's talking about
00:18:52
robotic pets which obviously it is only
00:18:54
going to get that if it's using regag on
00:18:56
the meaning notes document that I have
00:18:58
uploaded through my Google Drive so this
00:19:00
is working absolutely beautifully now I
00:19:02
would probably want to do a lot more
00:19:04
testing with this whole setup U but just
00:19:06
to keep things simple right now I'm
00:19:08
going to leave it at this as a simple
00:19:09
example um but yeah I would encourage
00:19:11
you to just take this forward keep
00:19:13
working on this agent and um yeah it's
00:19:16
fully fully local it is just a beautiful
00:19:18
thing so I hope that this whole local AI
00:19:20
setup is just as cool for you as it is
00:19:22
for me because I have been having a
00:19:24
blast with this and I will continue to
00:19:26
as I keep expanding on it so just just
00:19:28
as I promised in the start of the video
00:19:31
I want to talk a little bit about how
00:19:32
I'm planning on expanding this in the
00:19:33
future to make it even better cuz here's
00:19:35
the thing this whole stack that I showed
00:19:38
here is a really good starting point but
00:19:39
there's some things I want to add on to
00:19:41
it as well to make it even more robust
00:19:43
things like redis for caching or a
00:19:44
self-hosted super base instead of the
00:19:47
vanilla postgress CU then it can handle
00:19:48
things like authentication as well maybe
00:19:50
even turning this into a whole local AI
00:19:52
Tech stack that would even include
00:19:54
things like the front end as well or
00:19:56
maybe baking in best practices for red
00:19:58
and llms or na end workflows for that to
00:20:01
make this more of like a template as
00:20:03
well to actually make it really really
00:20:04
easy to get started with local AI so I
00:20:07
hope that you're excited about that if
00:20:08
you are or if you found this video just
00:20:10
helpful in general getting you set up
00:20:12
with your local AI Tech stack I would
00:20:14
really appreciate a like and a subscribe
00:20:16
and with that I will see you in the next
00:20:18
video