Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

00:20:19
https://www.youtube.com/watch?v=V_0dNE-H2gw

Ringkasan

TLDRThe video showcases a comprehensive package for setting up a local AI ecosystem, developed by the n8n team. This package combines several open-source tools: Llama for large language models (LLMs), Quadrant for vector database management, PostgreSQL for SQL databases, and n8n for workflow automation. The video provides step-by-step guidance on installing and customizing this AI stack using Docker and Git. It highlights the benefits of running AI locally, such as increased accessibility, control, and the growing competitiveness of open-source models like Llama against proprietary ones. Additionally, the video covers how to extend the tool with further configurations, manage workflow automation with n8n, and discusses future development plans to enhance the tool's capabilities. The video emphasizes the potential of local AI setups and encourages viewers to adopt these technologies for better data control and flexibility.

Takeaways

  • 💡 The package offers a complete self-hosted AI stack.
  • 🚀 Easy setup with Git and Docker into a local environment.
  • 🔧 Includes Llama, Quadrant, PostgreSQL, and n8n tools.
  • 📈 Provides competitive local AI solutions with open-source models.
  • 🛠️ Extensible with additional settings and custom code.
  • 🌐 Integrates Google Drive for document processing with n8n.
  • 🔍 Features a self-hosted Quadrant dashboard for easy data management.
  • 🔄 Workflow automation is simplified with n8n.
  • 🌟 Plans to incorporate Redis and Supabase for enhanced functionality.
  • 📅 Future expansions aim to create a robust local AI tech stack.

Garis waktu

  • 00:00:00 - 00:05:00

    An introduction is given about an exciting new package developed by the n8n team that simplifies local AI setup. It includes essential components like llama for language models, Quadrant for a vector database, and Postgres for SQL, all integrated with workflow automation by n8n. The goal is to demonstrate how to set up and extend it for creating a full AI agent.

  • 00:05:00 - 00:10:00

    The process of setting up the package is described. It involves downloading from GitHub, setting environment variables, and modifying the Docker compose file to add critical extensions, such as exposing the Postgres port and adding Olama's embedding model. Instructions are given for starting Docker according to different system architectures.

  • 00:10:00 - 00:15:00

    The video explains how to run and interact with Docker and its containers, including executing commands and pulling models in real time. It also describes setting up a RAG AI agent using n8n, leveraging various local services. The configuration involves using Olama for models and embeddings, Postgres for memory, and Quadrant for vector storage.

  • 00:15:00 - 00:20:19

    Details are provided on creating a workflow for ingesting files into a vector database and setting up a fully local RAG AI agent. It includes handling file updates uniquely by clearing previous vectors to avoid duplication, thus ensuring effective data management in the AI model. The response of the agent is tested and demonstrated as successful.

Tampilkan lebih banyak

Peta Pikiran

Video Tanya Jawab

  • What is included in the local AI package?

    The package includes Llama for LLMs, Quadrant for the vector database, PostgreSQL for the SQL database, and n8n for workflow automations.

  • How can I set up this local AI infrastructure?

    You need Git and Docker to start. Clone the repository from GitHub, edit the environment variable and Docker compose files, and run the setup using Docker Compose.

  • What are some of the benefits of local AI infrastructure?

    Local AI allows control over data, better customization, and can be more cost-effective compared to using external services. Open-source models like Llama are becoming competitive with closed-source models, making this a good time to adopt local AI solutions.

  • How do I extend the package?

    You can extend the package by adjusting Docker compose settings, adding additional models for Llama embeddings, or configuring the PostgreSQL ports according to your use case.

  • Can Quadrant be accessed through its own dashboard?

    Yes, once set up, Quadrant can be accessed through its local dashboard using localhost on the specified port, allowing you to manage and visualize your vector data.

  • How can I integrate Google Drive with this solution?

    You can set up an n8n workflow to pull documents from a specified folder in Google Drive and ingest them into the Quadrant Vector store.

  • What command should I use to start the Docker setup?

    The specific Docker Compose command depends on your system's architecture. There are separate commands for Nvidia GPU users, Mac users, and others.

  • How does the AI agent use the local setup?

    The AI agent uses Llama for language models, Quadrant for vector search, and PostgreSQL for memory storage, interacting through n8n workflows.

  • Is there a way to prevent duplicate vectors in the Quadrant database?

    Yes, using a custom script in n8n, you can delete old vectors associated with a document before inserting updated ones, preventing duplication.

  • What future expansions are planned for the package?

    Plans include adding components like Redis for caching and a self-hosted Supabase to enhance functionality beyond the basic AI stack.

Lihat lebih banyak ringkasan video

Dapatkan akses instan ke ringkasan video YouTube gratis yang didukung oleh AI!
Teks
en
Gulir Otomatis:
  • 00:00:00
    have you ever wished for a single
  • 00:00:01
    package that you could easily install
  • 00:00:03
    that has everything you need for local
  • 00:00:05
    AI well I have good news for you today
  • 00:00:08
    because I have exactly what you are
  • 00:00:09
    looking for I have actually never been
  • 00:00:12
    so excited to make a video on something
  • 00:00:14
    before today I'm going to show you an
  • 00:00:15
    incredible package for local AI
  • 00:00:18
    developed by the n8n team and this thing
  • 00:00:20
    has it all it's got old llama for the
  • 00:00:23
    llms quadrant for the vector database
  • 00:00:26
    postgress for the SQL database and then
  • 00:00:28
    n8n to tie it Al together with workflow
  • 00:00:31
    automations this thing is absolutely
  • 00:00:33
    incredible and I'm going to show you how
  • 00:00:35
    to set it up in just minutes then I'll
  • 00:00:37
    even show you how to extend it to make
  • 00:00:39
    it better and use it to create a full
  • 00:00:42
    rag AI agent in n8n so stick around
  • 00:00:45
    because I have a lot of value for you
  • 00:00:47
    today running your own AI infrastructure
  • 00:00:49
    is the way of the future especially
  • 00:00:51
    because of how accessible is becoming
  • 00:00:54
    and because open-source models like
  • 00:00:56
    llama are getting to the point where
  • 00:00:58
    they're so powerful that they're
  • 00:01:00
    actually able to compete with close
  • 00:01:02
    Source models like GPT and clad so now
  • 00:01:05
    is the time to jump on this and what I'm
  • 00:01:07
    about to show you is an excellent start
  • 00:01:09
    to doing so and at the end of this video
  • 00:01:11
    I'll even talk about how I'm going to
  • 00:01:13
    extend this package in the near future
  • 00:01:15
    just for you to make it even better all
  • 00:01:18
    right so here we are in the GitHub
  • 00:01:19
    repository for the self-hosted AI
  • 00:01:21
    starter kit by n8n now this repo is
  • 00:01:24
    really really basic and I love it there
  • 00:01:27
    are basically just two files that we
  • 00:01:28
    have to care about here we have our
  • 00:01:30
    environment variable file where we'll
  • 00:01:32
    set credentials for things like
  • 00:01:33
    postgress and then we have Docker
  • 00:01:35
    compose the caml file here where we'll
  • 00:01:38
    basically be bringing in everything
  • 00:01:39
    together like postgress quadrant and
  • 00:01:41
    olama to have a single package for our
  • 00:01:43
    local AI now the first thing that I want
  • 00:01:46
    to mention here is that this read me has
  • 00:01:48
    instructions for how to install
  • 00:01:50
    everything yourself but honestly it's
  • 00:01:52
    quite lacking and there's a couple of
  • 00:01:53
    holes that I want to fill in here with
  • 00:01:55
    ways to extend it to really make it what
  • 00:01:57
    I think that you need and so I'll go
  • 00:01:59
    through that a little bit and we'll
  • 00:02:00
    actually get this installed on our
  • 00:02:02
    computer now there are a couple of
  • 00:02:03
    dependencies before you start basically
  • 00:02:06
    you just need git and Docker so I'd
  • 00:02:07
    recommend installing GitHub desktop and
  • 00:02:10
    then Docker desktop as well because this
  • 00:02:11
    also has Docker compose with it which is
  • 00:02:13
    what we need to bring everything
  • 00:02:15
    together for one package so with that we
  • 00:02:18
    can go ahead and get started downloading
  • 00:02:19
    this on our computer so the first thing
  • 00:02:21
    you want to do to download this code is
  • 00:02:23
    copy the get clone command here with the
  • 00:02:25
    URL of the repository you'll go into a
  • 00:02:28
    terminal then and then paste in this
  • 00:02:30
    command for me I've already cloned this
  • 00:02:32
    that's why I get this error message but
  • 00:02:33
    you're going to get this code downloaded
  • 00:02:35
    on your computer and then you can change
  • 00:02:37
    your directory into this new repository
  • 00:02:40
    that you've pulled and so with this we
  • 00:02:42
    can now go and edit the files in any
  • 00:02:44
    editor of our choice I like using VSS
  • 00:02:46
    code and so if you have VSS code as well
  • 00:02:48
    you can just type in code Dot and this
  • 00:02:50
    is going to pull up everything in Visual
  • 00:02:53
    Studio code now the official
  • 00:02:54
    instructions in the readme that we just
  • 00:02:56
    saw would tell you at this point to run
  • 00:02:58
    everything with the docker compos post
  • 00:03:00
    command now that is not actually the
  • 00:03:02
    right Next Step I'm not really sure why
  • 00:03:04
    they say that cuz we have to actually go
  • 00:03:05
    and edit a couple of things in the code
  • 00:03:07
    to make it customized for us and that
  • 00:03:10
    starts with the EnV file so you're going
  • 00:03:12
    to want to go into your EnV file I've
  • 00:03:14
    just made a env. example file in this
  • 00:03:17
    case because I already have my
  • 00:03:18
    credentials set up so you'll go into
  • 00:03:20
    your EnV and then set up your postgress
  • 00:03:22
    username and password the database name
  • 00:03:25
    and then also a couple of n8n Secrets
  • 00:03:27
    these can be whatever you want just make
  • 00:03:28
    sure that they are very very here and
  • 00:03:30
    basically just a long alpha numeric
  • 00:03:32
    string and then with that we can go into
  • 00:03:34
    our Docker compose file and here's where
  • 00:03:36
    I want to make a couple of extensions to
  • 00:03:38
    really fill in the gaps so the couple of
  • 00:03:40
    things that were missing in the original
  • 00:03:42
    Docker compose file first of all for
  • 00:03:45
    some reason the postgress container
  • 00:03:47
    doesn't have the port exposed by default
  • 00:03:49
    so you can't actually go and use
  • 00:03:51
    postgress as your database in an NN
  • 00:03:53
    workflow I think n uses postgress
  • 00:03:56
    internally which is why it's set up like
  • 00:03:58
    that initially but we want to actually
  • 00:03:59
    be a to use postgress for our chat
  • 00:04:02
    memory for our agents and so I'm going
  • 00:04:03
    to show you how to do that basically all
  • 00:04:05
    you have to do is go down to the
  • 00:04:08
    postgress service here and then just add
  • 00:04:10
    these two lines of code right here ports
  • 00:04:12
    and then just a single item where we
  • 00:04:14
    have 5432 map to the port 5432 inside
  • 00:04:18
    the container and that way we can go
  • 00:04:19
    Local Host 5432 and access postgress so
  • 00:04:23
    that is super super important otherwise
  • 00:04:25
    we won't actually be able to access it
  • 00:04:26
    within an NA end workflow we're going to
  • 00:04:28
    be doing that later when we build the
  • 00:04:29
    rag AI agent now the other thing that we
  • 00:04:32
    want to do is we want to use olama for
  • 00:04:34
    our embeddings for our Vector database
  • 00:04:36
    as well now the base command when we
  • 00:04:39
    initialize olama is just this part right
  • 00:04:41
    here so we sleep for 3 seconds and then
  • 00:04:45
    we pull llama 3.1 with oama so that's
  • 00:04:47
    why we have llama 3.1 Available To Us by
  • 00:04:50
    default but what I've added here is
  • 00:04:52
    another line to pull one of the olama
  • 00:04:55
    embedding models and we need this if we
  • 00:04:57
    want to be able to use AMA for our
  • 00:05:00
    Rag and so I've added this line as well
  • 00:05:02
    that is very very key so that is
  • 00:05:04
    literally everything that you have to
  • 00:05:06
    change in the code to get this to work
  • 00:05:07
    and I'll even have a link in the
  • 00:05:09
    description of this video to my version
  • 00:05:11
    of this you can pull that directly if
  • 00:05:12
    you want to have all the customizations
  • 00:05:14
    that we just went over here and with
  • 00:05:16
    that we can go ahead and actually start
  • 00:05:18
    it with Docker compose and so the
  • 00:05:20
    installation instructions in the readme
  • 00:05:22
    are actually kind of useful here because
  • 00:05:23
    there's a slightly different Docker
  • 00:05:25
    compose command that you want to run
  • 00:05:26
    based on your architecture so if you
  • 00:05:28
    have a Nvidia G
  • 00:05:30
    you can follow these instructions which
  • 00:05:31
    are a bit more complicated but if you
  • 00:05:33
    want to you can and then you can run
  • 00:05:35
    with a GPU Nvidia profile and then if
  • 00:05:38
    you are a Mac User you follow this
  • 00:05:40
    Command right here and then for everyone
  • 00:05:41
    else like what I'm going to use in this
  • 00:05:43
    case even though I have a Nvidia GPU
  • 00:05:45
    I'll just keep it simple with Docker
  • 00:05:47
    compose d-profile CPU up and so we'll
  • 00:05:50
    copy this command go into our terminal
  • 00:05:52
    here and paste it in and in my case I
  • 00:05:55
    have already created all these
  • 00:05:56
    containers and so it's going to run
  • 00:05:58
    really really fast for me but in your
  • 00:06:00
    case it's going to have to pull each of
  • 00:06:01
    the images for olama postgress n8n and
  • 00:06:05
    quadrant and then start them all up and
  • 00:06:07
    it'll take a little bit because I also
  • 00:06:08
    have to do things like pulling llama 3.1
  • 00:06:10
    for the old llama container and so in my
  • 00:06:13
    case it's going to blast through this
  • 00:06:14
    pretty quick because it's already done a
  • 00:06:15
    lot of this I did that on purpose so it
  • 00:06:17
    can be a quicker walkthrough for you
  • 00:06:19
    here um but you can see all the
  • 00:06:21
    different containers the different
  • 00:06:22
    colors here that are running everything
  • 00:06:24
    to set me up for each of the different
  • 00:06:26
    services and so like right here for
  • 00:06:27
    example it pulled llama 3.1 and then
  • 00:06:30
    right here it pulled the embedding model
  • 00:06:32
    that I chose from AMA as well um and so
  • 00:06:35
    at this point it's basically done so I'm
  • 00:06:36
    going to pause here and come back when
  • 00:06:38
    everything is ready all right so
  • 00:06:39
    everything is good to go and now I'm
  • 00:06:41
    going to actually take you in a Docker
  • 00:06:42
    so we can see all of this running live
  • 00:06:45
    so you're going to want to open up your
  • 00:06:46
    Docker desktop and then you'll see one
  • 00:06:48
    record here for the self-hosted AI
  • 00:06:51
    starter kit you can click on this button
  • 00:06:52
    on the left hand side to expand it and
  • 00:06:54
    then we can see every container that is
  • 00:06:56
    currently running or ran for the setup
  • 00:06:59
    so they're going to four containers each
  • 00:07:00
    running for one of our different local
  • 00:07:02
    AI services and we can actually click
  • 00:07:04
    into each one of them which is super
  • 00:07:06
    cool because we can see the output of
  • 00:07:08
    each container and even go to the exec
  • 00:07:10
    tab to run Linux commands within each of
  • 00:07:13
    these containers and so you can actually
  • 00:07:14
    do things in real time as well without
  • 00:07:16
    having to restart the containers you can
  • 00:07:18
    go into the postgress container and run
  • 00:07:21
    commands to query your tables and stuff
  • 00:07:23
    you can go into actually I'll show you
  • 00:07:24
    this really quick you can go into the
  • 00:07:26
    olama container and you can pull in real
  • 00:07:29
    time like if I want to go to exec here I
  • 00:07:31
    can do AMA pull llama
  • 00:07:35
    3.1 if I can spell it right 70b so I can
  • 00:07:38
    pull models in real time and have those
  • 00:07:40
    updated and available to me in n8n
  • 00:07:42
    without having to actually restart
  • 00:07:44
    anything which is super super cool all
  • 00:07:46
    right so now is the really really fun
  • 00:07:48
    part because we get to use all the local
  • 00:07:50
    infrastructure that we spun up just now
  • 00:07:52
    to create a fully local rag AI agent
  • 00:07:55
    within n8n and so to access your new
  • 00:07:57
    self-hosted n8n you can just go to Local
  • 00:08:00
    Host Port 5678 and the way that you know
  • 00:08:03
    that this is the URL is either through
  • 00:08:05
    the docker logs for your n container or
  • 00:08:08
    in the readme that we went over um that
  • 00:08:10
    was in the GitHub repository we cloned
  • 00:08:12
    and with that we can dive into this
  • 00:08:14
    workflow that I created to use postgress
  • 00:08:16
    for the chat memory quadrant for Rag and
  • 00:08:19
    olama for the llm and the embedding
  • 00:08:21
    model and so this is a full rag AI agent
  • 00:08:24
    that I've already built out I don't want
  • 00:08:25
    to build it from scratch just because I
  • 00:08:27
    want this to be a quicker smooth walk
  • 00:08:29
    through for you but I'll still go step
  • 00:08:31
    by step through everything that I set up
  • 00:08:32
    here and so that you can understand it
  • 00:08:34
    for yourself and also just steal this
  • 00:08:36
    from me CU I'm going to have this in the
  • 00:08:38
    description link as well so you can pull
  • 00:08:40
    this workflow and bring it into your own
  • 00:08:41
    n8n instance and so with that we can go
  • 00:08:44
    ahead and get started so there are two
  • 00:08:46
    parts to this workflow first of all we
  • 00:08:48
    have the agent itself with the chat
  • 00:08:50
    interaction here so this chat widget is
  • 00:08:52
    how we can interact with our agent and
  • 00:08:54
    then we also have the workflow that is
  • 00:08:56
    going to bring files from Google Drive
  • 00:08:58
    into our knowledge base with quadrant
  • 00:09:01
    and so I'll show the agent first and
  • 00:09:03
    then I'll dive very quickly into how I
  • 00:09:04
    have this pipeline set up to pull files
  • 00:09:07
    in from a Google drive folder into my
  • 00:09:09
    knowledge base so we have the trigger
  • 00:09:11
    that I just mentioned there where we
  • 00:09:12
    have our chat input and that is fed
  • 00:09:15
    directly into this AI agent where we
  • 00:09:17
    hook up all the different local stuff
  • 00:09:19
    and so first of all we have our olama
  • 00:09:21
    chat model and so I'm referencing llama
  • 00:09:23
    3.1 colon latest which is the 8 billion
  • 00:09:26
    parameter model but if you want to do an
  • 00:09:28
    AMA PLL Within the container like I
  • 00:09:30
    showed you how to do you can use
  • 00:09:31
    literally any olama llm right here it is
  • 00:09:34
    just so so simple to set up and then for
  • 00:09:36
    the credentials here it is very easy you
  • 00:09:39
    just have to put in this base URL right
  • 00:09:41
    here it is so important that for the URL
  • 00:09:44
    you use
  • 00:09:45
    HTTP and instead of Local Host you
  • 00:09:48
    reference host. doer. internal otherwise
  • 00:09:51
    it will not work and then the port for
  • 00:09:53
    Alama is if you don't change it
  • 00:09:56
    11434 and you can get this port either
  • 00:09:59
    in the Docker compost file or in the
  • 00:10:01
    logs for the AMA container you'll see
  • 00:10:03
    this in a lot of places and so with that
  • 00:10:05
    we've got our llm set up for this agent
  • 00:10:08
    and then for the memory of course we're
  • 00:10:09
    going to use postgress and so I'll click
  • 00:10:11
    into this and we're just going to have
  • 00:10:13
    any kind of table name you have here and
  • 00:10:14
    N will create this automatically in your
  • 00:10:16
    postgress database and it'll get the
  • 00:10:19
    session ID from the previous node and
  • 00:10:21
    then for the credentials here this is
  • 00:10:23
    going to be based on what you set in
  • 00:10:25
    yourb file so we have our host which is
  • 00:10:28
    host. doer. internal again just like
  • 00:10:30
    with AMA and then the database name user
  • 00:10:33
    and password all three of those you
  • 00:10:35
    defined in your EnV file that we went
  • 00:10:37
    over earlier and the port for postgress
  • 00:10:40
    is
  • 00:10:41
    5432 and so with that we've got our
  • 00:10:43
    local chat memory set up it is that
  • 00:10:45
    simple and so we can move on to the last
  • 00:10:46
    part of this agent which is the tool for
  • 00:10:49
    rag so we have the vector store tool
  • 00:10:52
    that we attach to our agent and then we
  • 00:10:54
    hook in our quadrant Vector store for
  • 00:10:56
    this and so we're just going to retrieve
  • 00:10:58
    any documents based on the query that
  • 00:11:00
    comes into our agent and then for the
  • 00:11:02
    credentials for Quadrant we just have an
  • 00:11:04
    API key which this was filled in for me
  • 00:11:06
    by default so I hope it is for you as
  • 00:11:08
    well I think it's just the password for
  • 00:11:10
    the NN instance and then for the
  • 00:11:12
    quadrant URL this should look very very
  • 00:11:15
    familiar HTTP host. doer. internal and
  • 00:11:19
    then the port for Quadrant is 6333 again
  • 00:11:21
    you can get this from the docker compose
  • 00:11:23
    file because we have to expose that Port
  • 00:11:25
    make it available or you can get it from
  • 00:11:27
    the quadrant logs as well
  • 00:11:30
    and so one other thing that I want to
  • 00:11:31
    show that is so so cool with hosting
  • 00:11:34
    quadrant locally is if you go to local
  • 00:11:36
    hostport
  • 00:11:38
    6333 like I have right here you can see
  • 00:11:40
    in the top left slash dashboard it's
  • 00:11:43
    going to take you to your very own
  • 00:11:45
    self-hosted quadrant dashboard where you
  • 00:11:48
    can see all your collections your
  • 00:11:50
    knowledge base basically and you can see
  • 00:11:52
    all the different vectors that you have
  • 00:11:53
    in there you can click into visualize
  • 00:11:56
    and I can actually go and see all my
  • 00:11:58
    different vectors which this is a
  • 00:12:00
    document that I already have inserted as
  • 00:12:02
    I was testing things um so you can see
  • 00:12:03
    all the metadata the contents of each
  • 00:12:05
    chunk it is so so cool so we'll go back
  • 00:12:08
    to this in a little bit here but just
  • 00:12:10
    know that like you have so much
  • 00:12:11
    visibility into your own quadrant
  • 00:12:12
    instance and you can even go and like
  • 00:12:14
    run your own queries to uh get
  • 00:12:16
    collections or delete vectors or do a
  • 00:12:18
    search uh it's just really awesome so
  • 00:12:21
    yeah hosting quadrant is a beautiful
  • 00:12:23
    thing um and so with that we have our
  • 00:12:25
    quadrant Vector store and then we're
  • 00:12:27
    using olama for embeddings using that
  • 00:12:29
    embedding model that I pulled that I
  • 00:12:31
    added to the docker compost file and
  • 00:12:34
    then we're just going to use llama 3.1
  • 00:12:36
    again to parse the responses that we get
  • 00:12:38
    from rag when we do our lookups so that
  • 00:12:40
    is everything for our agent and so we'll
  • 00:12:43
    test this in a little bit but first I
  • 00:12:45
    want to actually show you the workflow
  • 00:12:47
    for ingesting files into our knowledge
  • 00:12:49
    base and so the way that works is we
  • 00:12:51
    have two triggers here basically
  • 00:12:53
    whenever a file is created in a specific
  • 00:12:56
    folder in Google Drive or if a file is
  • 00:12:59
    updated in that same folder we want to
  • 00:13:02
    run this pipeline to download the file
  • 00:13:04
    and put it into our quadrant Vector
  • 00:13:05
    database running locally and so that
  • 00:13:08
    folder that I have right here is this
  • 00:13:10
    meeting notes folder in my Google Drive
  • 00:13:12
    and specifically the document that I'm
  • 00:13:14
    going to use for testing purposes here
  • 00:13:17
    is these fake meeting notes that I made
  • 00:13:19
    I just generated something really really
  • 00:13:20
    silly here about a company that is
  • 00:13:23
    selling robotic pets and AI startup um
  • 00:13:26
    and so we're going to use this document
  • 00:13:27
    for our rag I'm not going to do a bunch
  • 00:13:29
    bunch of different documents um because
  • 00:13:31
    I want to keep this really simple right
  • 00:13:32
    now but you can definitely do that and
  • 00:13:34
    the quadrant Vector database can handle
  • 00:13:36
    that but for now I'm just using this
  • 00:13:38
    single document and so I'll walk through
  • 00:13:40
    step by step what this flow actually
  • 00:13:42
    looks like to ingest this into the
  • 00:13:43
    vector database and so first of all I'm
  • 00:13:46
    going to fetch a test event which is
  • 00:13:48
    going to be the creation of this meeting
  • 00:13:50
    Note file that I just showed you and
  • 00:13:52
    then we're going to feed that into this
  • 00:13:53
    node here which is going to extrapolate
  • 00:13:55
    a couple of key pieces of information
  • 00:13:58
    including the file ID and the folder ID
  • 00:14:01
    and so once we have that I'm going to go
  • 00:14:03
    on to this next step right here and this
  • 00:14:05
    is a very very important step okay let
  • 00:14:08
    me just stop here for a second there are
  • 00:14:10
    a lot of rag tutorials with n8n on
  • 00:14:13
    YouTube that miss this when you have
  • 00:14:16
    this Step at the end here I'm just going
  • 00:14:17
    to skip to the end really quick whether
  • 00:14:20
    this is super base quadrant pine cone it
  • 00:14:22
    doesn't matter when you have this
  • 00:14:23
    inserter it is not an upsert it is just
  • 00:14:27
    an insert and so what that means means
  • 00:14:29
    is if you reinsert a document you're
  • 00:14:31
    actually going to have duplicate vectors
  • 00:14:33
    for that document so if I update a
  • 00:14:35
    document in Google Drive and it
  • 00:14:37
    reinserts the vectors into my quadrant
  • 00:14:40
    Vector database I'm going to have the
  • 00:14:42
    old vectors for the first time I
  • 00:14:44
    ingested my document and then new
  • 00:14:46
    vectors for when I updated the file it
  • 00:14:48
    does not get rid of the old files or
  • 00:14:51
    update the vectors in place that is so
  • 00:14:53
    important to keep in mind and so I'm
  • 00:14:56
    giving a lot of value to you right here
  • 00:14:58
    by including this node and it's actually
  • 00:15:00
    custom code because there's not a way to
  • 00:15:02
    do it without code in n8n but it is all
  • 00:15:05
    good because you can just copy this from
  • 00:15:07
    me I'm going to have a link to this
  • 00:15:08
    workflow in the description like I said
  • 00:15:10
    so you can just download this and bring
  • 00:15:12
    it into your own n8n take my code here
  • 00:15:14
    which basically just uses Lang chain to
  • 00:15:16
    connect to my quadrant Vector store get
  • 00:15:19
    all of the vector IDs where the metadata
  • 00:15:22
    file ID is equal to the ID of the file
  • 00:15:25
    I'm currently ingesting and then it just
  • 00:15:27
    deletes those vectors so basically we
  • 00:15:29
    clear everything that's currently in the
  • 00:15:31
    vector database for this file so that we
  • 00:15:34
    can reinsert it and make sure that we
  • 00:15:35
    have zero duplicates that is so so
  • 00:15:38
    important because you don't want
  • 00:15:40
    different versions of your file existing
  • 00:15:42
    at the same time in your knowledge base
  • 00:15:44
    that will confuse the heck out of your
  • 00:15:45
    llm and so this is a very very important
  • 00:15:49
    step and so I'll run this as well and
  • 00:15:51
    that's going to delete everything so I
  • 00:15:52
    can even go back to quadrant here go to
  • 00:15:55
    my Collections and you can see now that
  • 00:15:57
    this number was nine when I first showed
  • 00:15:59
    this quadrant dashboard and now it is
  • 00:16:00
    zero but it's going to go back up to 9
  • 00:16:03
    when I finish this workflow so next up
  • 00:16:05
    we're going to download this Google
  • 00:16:07
    drive
  • 00:16:08
    file nice and simple uh then we're going
  • 00:16:11
    to extract the text from it and so this
  • 00:16:13
    it doesn't matter if it's a PDF a CSP a
  • 00:16:15
    Google doc it'll take the file and get
  • 00:16:18
    the raw text from it and then we're
  • 00:16:21
    going to insert it into our quadrant
  • 00:16:23
    Vector store and so now I'm going to run
  • 00:16:25
    test step here and we're going to go
  • 00:16:26
    back to the UI after it's done doing
  • 00:16:29
    these insertions you can see here nine
  • 00:16:30
    items because it chunked up my document
  • 00:16:33
    so we go back here and I'll refresh it
  • 00:16:35
    right now it's zero I'll refresh and
  • 00:16:38
    there we go boom we're back up to nine
  • 00:16:39
    chunks and the reason there's so many
  • 00:16:41
    chunks for such a small document is
  • 00:16:44
    because if we go to my chunk size here
  • 00:16:46
    in my recursive character text splitter
  • 00:16:49
    I have a chunk size of 100 so every
  • 00:16:51
    single time I put in a document it's
  • 00:16:53
    going to get split up into 100 character
  • 00:16:55
    chunks so I want to keep it small just
  • 00:16:57
    because I'm running llama 3 .1 locally I
  • 00:17:00
    don't have the most powerful computer
  • 00:17:01
    and so I want my prompts to be small so
  • 00:17:03
    I'm keeping my context lower by having
  • 00:17:05
    smaller chunk sizes and not returning a
  • 00:17:07
    lot of documents when I perform Rag and
  • 00:17:10
    so the other thing that I wanted to show
  • 00:17:12
    really quickly here is my document data
  • 00:17:15
    loader and so or my default data loader
  • 00:17:17
    I'm adding two pieces of metadata here
  • 00:17:19
    the file ID and the folder ID the more
  • 00:17:22
    important one right here is the file ID
  • 00:17:24
    because that is how I know that a vector
  • 00:17:26
    is tied to a specific document so I use
  • 00:17:29
    that in that other step right here to
  • 00:17:32
    delete the old document vectors before I
  • 00:17:34
    insert the new one so that's how I make
  • 00:17:36
    that connection there so that's kind of
  • 00:17:38
    the most in-depth part of this
  • 00:17:39
    walkthrough is how that all works and
  • 00:17:41
    having this custom code here but just
  • 00:17:42
    know that this is so so important so
  • 00:17:44
    just take this from me I hope that it
  • 00:17:46
    makes sense to an extent I spent a lot
  • 00:17:48
    of time making this work for you um so
  • 00:17:50
    yeah with that that is everything we've
  • 00:17:53
    got our agent fully set up everything
  • 00:17:55
    ingested in uh we have the document
  • 00:17:58
    currently in the knowledge base CU I ran
  • 00:17:59
    through that step by step and so now we
  • 00:18:02
    can go ahead and test this thing so I'm
  • 00:18:03
    going to go to the chat widget here
  • 00:18:05
    actually I'm going to save it first and
  • 00:18:06
    then go to the chat widget and then I'll
  • 00:18:08
    ask it a question that it can only
  • 00:18:09
    answer if it actually has the document
  • 00:18:11
    in the knowledge base and can retrieve
  • 00:18:13
    it so I'll say what is the ad campaign
  • 00:18:16
    focusing on and because this is llama
  • 00:18:19
    3.1 running locally it's going to
  • 00:18:21
    actually take a little bit to get a
  • 00:18:22
    response because I don't have the BPS
  • 00:18:24
    computer so I'm going to pause and come
  • 00:18:26
    back when it has an answer for me all
  • 00:18:29
    right so we got an answer from llama 3.1
  • 00:18:31
    and this is looking pretty good it's a
  • 00:18:33
    little bit awkward at the start of the
  • 00:18:35
    response here uh but this is just the
  • 00:18:37
    raw output without any instructions from
  • 00:18:39
    me to the model on how to format a
  • 00:18:41
    response and so you can very very easily
  • 00:18:43
    fix this by just adding to the system
  • 00:18:45
    prompt for the llm and telling it how to
  • 00:18:47
    respond with the information it's given
  • 00:18:49
    from rag but overall it does have the
  • 00:18:51
    right answer and it's talking about
  • 00:18:52
    robotic pets which obviously it is only
  • 00:18:54
    going to get that if it's using regag on
  • 00:18:56
    the meaning notes document that I have
  • 00:18:58
    uploaded through my Google Drive so this
  • 00:19:00
    is working absolutely beautifully now I
  • 00:19:02
    would probably want to do a lot more
  • 00:19:04
    testing with this whole setup U but just
  • 00:19:06
    to keep things simple right now I'm
  • 00:19:08
    going to leave it at this as a simple
  • 00:19:09
    example um but yeah I would encourage
  • 00:19:11
    you to just take this forward keep
  • 00:19:13
    working on this agent and um yeah it's
  • 00:19:16
    fully fully local it is just a beautiful
  • 00:19:18
    thing so I hope that this whole local AI
  • 00:19:20
    setup is just as cool for you as it is
  • 00:19:22
    for me because I have been having a
  • 00:19:24
    blast with this and I will continue to
  • 00:19:26
    as I keep expanding on it so just just
  • 00:19:28
    as I promised in the start of the video
  • 00:19:31
    I want to talk a little bit about how
  • 00:19:32
    I'm planning on expanding this in the
  • 00:19:33
    future to make it even better cuz here's
  • 00:19:35
    the thing this whole stack that I showed
  • 00:19:38
    here is a really good starting point but
  • 00:19:39
    there's some things I want to add on to
  • 00:19:41
    it as well to make it even more robust
  • 00:19:43
    things like redis for caching or a
  • 00:19:44
    self-hosted super base instead of the
  • 00:19:47
    vanilla postgress CU then it can handle
  • 00:19:48
    things like authentication as well maybe
  • 00:19:50
    even turning this into a whole local AI
  • 00:19:52
    Tech stack that would even include
  • 00:19:54
    things like the front end as well or
  • 00:19:56
    maybe baking in best practices for red
  • 00:19:58
    and llms or na end workflows for that to
  • 00:20:01
    make this more of like a template as
  • 00:20:03
    well to actually make it really really
  • 00:20:04
    easy to get started with local AI so I
  • 00:20:07
    hope that you're excited about that if
  • 00:20:08
    you are or if you found this video just
  • 00:20:10
    helpful in general getting you set up
  • 00:20:12
    with your local AI Tech stack I would
  • 00:20:14
    really appreciate a like and a subscribe
  • 00:20:16
    and with that I will see you in the next
  • 00:20:18
    video
Tags
  • local AI
  • n8n
  • open-source
  • Llama
  • Quadrant
  • PostgreSQL
  • Docker
  • workflow automation
  • AI stack
  • self-hosted