Double Take is a unified API for processing and training images for facial recognition.

What issues did the user face with Double Take?

The user experienced performance issues, particularly with the Home Assistant add-on, and faced challenges with image size affecting detection accuracy.

Which detectors are used with Double Take?

The user is using DeepStack and CompreFace as detectors.

What is the importance of image size in facial recognition?

Larger image sizes improve detection accuracy; small images may lead to low confidence in detection.

What does the user plan to do next?

The user plans to continue experimenting with the setup and may create a how-to video if there is enough interest.

How can I provide feedback or ask questions about the user's setup?

Viewers are encouraged to leave comments with their thoughts, experiences, and questions.

What is the user's goal with facial recognition?

The user aims to receive alerts for unknown faces while avoiding alerts for known family members.

Is there a follow-up video planned?

Yes, a follow-up video may be created if there is enough interest in the configuration process.

Facial Recognition. Double-Take, Deepstack, Frigate, and CompreFace

00:19:22

https://www.youtube.com/watch?v=5OPOAusvo8I

Summary

TLDRIn this video, the user shares their experience with Double Take, a unified API for facial recognition image processing. They discuss the challenges faced, particularly with the integration of the Home Assistant add-on and the performance of different detectors like DeepStack and CompreFace. The user emphasizes the importance of image size for accurate face detection and the need for proper configuration to achieve reliable results. They express a desire for feedback from viewers regarding their setup and use cases for facial recognition, and mention the possibility of creating a how-to video in the future if there is enough interest.

Takeaways

🔍 Double Take is a unified API for facial recognition.
⚙️ Users face challenges with integration and performance.
📸 Image size is crucial for accurate face detection.
🤖 DeepStack and CompreFace are used as detectors.
📊 Configuration is key to achieving reliable results.
💬 User seeks feedback and suggestions from viewers.
🚨 Goal is to alert for unknown faces only.
📹 A follow-up how-to video may be created if there's interest.

Timeline

00:00:00 - 00:05:00
The video discusses the author's experience with Double Take, a unified API for facial recognition image processing. The author expresses initial excitement but faces challenges with performance and expectations. Double Take simplifies the use of various detection services, but the author encounters issues with the Home Assistant add-on causing system slowdowns, leading to a switch to a Docker container setup on a virtual machine.
00:05:00 - 00:10:00
The author explains the interface of Double Take, highlighting the importance of image size for accurate face detection. They note that small image sizes lead to low confidence in detections, resulting in missed identifications. The author discusses the training process for images, comparing the performance of different detectors, particularly favoring Compre Face over Deep Stack due to better detection results.
00:10:00 - 00:19:22
The video concludes with the author seeking feedback on their setup and use cases for facial recognition. They express a desire to filter alerts for known faces while being notified of unknown faces. The author emphasizes the need for proper camera placement and image quality for reliable results, inviting viewers to share their experiences and suggestions for improving the system.

Mind Map

Video Q&A

What is Double Take?
Double Take is a unified API for processing and training images for facial recognition.
What issues did the user face with Double Take?
The user experienced performance issues, particularly with the Home Assistant add-on, and faced challenges with image size affecting detection accuracy.
Which detectors are used with Double Take?
The user is using DeepStack and CompreFace as detectors.
What is the importance of image size in facial recognition?
Larger image sizes improve detection accuracy; small images may lead to low confidence in detection.
What does the user plan to do next?
The user plans to continue experimenting with the setup and may create a how-to video if there is enough interest.
How can I provide feedback or ask questions about the user's setup?
Viewers are encouraged to leave comments with their thoughts, experiences, and questions.
What is the user's goal with facial recognition?
The user aims to receive alerts for unknown faces while avoiding alerts for known family members.
Is there a follow-up video planned?
Yes, a follow-up video may be created if there is enough interest in the configuration process.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!

Subtitles

Auto Scroll:

00:00:00
[Music]
00:00:07
today I want to talk about frigate and I
00:00:09
want to talk about using double take
00:00:11
which is a unified API an API that
00:00:14
allows you to train and process images
00:00:16
for use and facial
00:00:18
recognition so I've been using it for a
00:00:20
couple days and this is going to be a
00:00:22
little bit different uh type of video
00:00:24
from what I'm normally doing I'm going
00:00:26
to talk more about what my experience is
00:00:28
in this video and maybe later on on
00:00:30
we'll do a little more howto on how I
00:00:32
set it up because I've been playing with
00:00:34
it for the last couple days and it
00:00:35
hasn't been something that I'm really
00:00:38
excited about I was uh on the edge about
00:00:41
making a video anyway on this because
00:00:44
I'm having issues with it or it's it's
00:00:46
just not living up to my expectation of
00:00:49
what I want it to do and with AI and
00:00:52
facial recognition that's not uncommon
00:00:54
you're going to have some learning that
00:00:56
has to be done and it's not going to be
00:00:58
perfect so let's go into what Double
00:01:00
Take is first this is the double take uh
00:01:03
GitHub page and let's just go through a
00:01:05
little bit of the features here as I
00:01:07
mentioned this is a unified UI and API
00:01:09
for processing and training images for
00:01:11
facial recognition it doesn't actually
00:01:13
do the work it relies on detectors to do
00:01:17
the work but it gives you a very nice
00:01:19
easy convenient way to train images and
00:01:22
do other stuff with them so there the uh
00:01:25
the question is why would you do this uh
00:01:27
there's a lot of great software to
00:01:29
perform the actual facial recognition
00:01:30
but they all behave differently Double
00:01:33
Take was created to take the
00:01:35
complexities of those and combine them
00:01:38
into an easy to ous UI so basically
00:01:39
Double Take is a UI for detection
00:01:41
services and it talks about a lot of the
00:01:43
spons uh the the features here and one
00:01:46
of the features you'll notice here is a
00:01:48
an home assistant add-on I tried and
00:01:51
tried and tried to use the home
00:01:53
assistant add-on on on my ooid N2 plus
00:01:57
and I I had an issue with it where it
00:01:59
would just caused my ooid to slow down
00:02:01
to a crawl so much to the point that I
00:02:04
had to log in and kill the process via
00:02:07
SSH cuz I couldn't get back into the box
00:02:10
I don't know that that's everybody's
00:02:11
experience I don't really understand
00:02:12
what's going on because a lot of the
00:02:14
forums I've read talk about double take
00:02:16
being a very lowend or
00:02:19
low uh consumption type of application
00:02:22
and it shouldn't do that so something's
00:02:23
going on with my box or I've got too
00:02:25
many things running on my ooid already
00:02:28
so I've detached that and run that in a
00:02:30
Docker container I am running it on a
00:02:34
virtual machine on My Windows desktop so
00:02:37
I'm running it in a bunto virtual
00:02:39
machine in Virtual box so I virtualized
00:02:42
it into a Docker container or into a
00:02:44
Docker uh system running in that VM now
00:02:46
you can also run it on Docker under
00:02:48
Windows directly if you want to do that
00:02:50
or a number of other ways and you may
00:02:52
have success with the add-on as well has
00:02:54
this architecture support uh the
00:02:57
detectors it uses are deep stack com
00:02:59
proface and face box and it does support
00:03:02
the frig in VR that is what I'm running
00:03:04
I have a number of videos on frig how
00:03:06
I've got frig set up frig's been running
00:03:08
well for a while frig does object
00:03:09
detection very well I don't have to do
00:03:11
anything for object detection but what I
00:03:13
wanted to try to do was layer some face
00:03:15
detection or person uh recognition on
00:03:18
top of that uh fa uh that frig stuff
00:03:22
that it's doing it talks about uh how
00:03:24
you set things up here so we go through
00:03:26
all the setups it has the ability to do
00:03:28
notifications with home assistant for
00:03:31
various things that go on with uh the
00:03:33
Double Take and then of course it
00:03:35
publishes its results to uh sensors via
00:03:39
mqtt so you need to have mqtt up and
00:03:42
running and finally you can use the go
00:03:44
toy service which I actually have up and
00:03:46
running as well this is go toi or got
00:03:50
toi I would say go toi and someone will
00:03:52
correct me as they have in my other
00:03:54
videos on how I pronounce things but
00:03:56
this gives you uh an ability to send
00:03:58
notifications to this gotify service I
00:04:01
am also running gotify on in a Docker
00:04:04
container on my VM so I'm running uh
00:04:07
deepstack uh compr face and uh gotify
00:04:11
and double tag all of my VM on My
00:04:13
Windows desktop so I'm using virtual box
00:04:15
for that you can go through and read the
00:04:18
documentation here on all of this um but
00:04:21
it's very easy to get up and running
00:04:23
again we're not going to talk so much
00:04:24
about getting it up and running this
00:04:26
video on a follow-up video if there's
00:04:28
enough interest in this I'll go through
00:04:29
through how I set up my Docker
00:04:30
containers and everything else I just
00:04:32
want to kind of talk through briefly how
00:04:35
uh I perceive this to be working for me
00:04:38
now this is the interface for double tap
00:04:41
and this is what happens whenever you
00:04:43
see images that are being detected it
00:04:45
only shows this if a face is picked up
00:04:49
in the image I could have frig notify me
00:04:53
uh whenever there's activity an object
00:04:56
detected or something else it will do
00:04:57
that but if there's no face detect ected
00:05:00
by one of the detectors uh coer face uh
00:05:03
or in my case deep stack either one of
00:05:06
those it will not show here but these
00:05:08
have picked up faces and you can see
00:05:09
them here now one of the first things I
00:05:11
want to talk about is the size of the
00:05:15
image uh if I walk into the room my face
00:05:18
is really tiny on here and you'll notice
00:05:22
here that this is pink and what this
00:05:24
tells me uh is that it didn't it even
00:05:26
though it might have detected me the
00:05:28
confidence is too low for for a specific
00:05:31
reason and one of the one of the checks
00:05:33
it performs is the size of the Box area
00:05:36
if you have a teeny tiny box area it's
00:05:39
just not going to be as accurate as
00:05:41
having something like this size box so
00:05:44
here it says the Box area is too low and
00:05:47
so it marks it as uh it it doesn't send
00:05:50
it through as I detected something even
00:05:52
though it shows it
00:05:54
here when you have a green box and um a
00:05:59
a face detection or a name or percentage
00:06:01
here then what you get is an actual an
00:06:05
actual
00:06:07
detection and so for these two faces
00:06:09
here because they were up close I'm
00:06:11
getting myself detected what this means
00:06:14
is for most of my cameras uh on frig
00:06:18
these are the size images that I'm going
00:06:20
to get and even though I have this
00:06:23
particular camera set to crop these
00:06:25
images at a size of 500 the picture
00:06:29
isn't n good enough for me to be able to
00:06:32
have the system reliably do anything
00:06:34
with it and this is face detection going
00:06:37
into a room so if you're doing presence
00:06:39
detection in a room you would have to
00:06:41
get up and close and personal to the
00:06:42
camera or you would have to to adjust
00:06:45
the camera so it only sees a small area
00:06:47
here so that the face is bigger this is
00:06:49
my first issue with this kind of face
00:06:51
detection if we look at my
00:06:54
cameras uh for frig you'll see that all
00:06:57
of these
00:06:58
cameras um are in a wide area and even
00:07:02
though it pulls in snapshots of these
00:07:03
areas it just isn't going to pick it up
00:07:07
well enough so far that I've seen to do
00:07:09
this now I'm going to continue to play
00:07:11
with it and see if I can uh train it to
00:07:14
make it a little bit better there are
00:07:16
ways to change the size of this box so
00:07:20
for example the default is 10,000 for
00:07:23
the size of the or for the Box area I
00:07:27
can make the Box area a th000 and have
00:07:29
it pick it up but you're going to get a
00:07:30
lot of false positives so that's part
00:07:33
one I kind of brought this up kind of
00:07:35
out of order because I wanted to show
00:07:36
you this screen since I'm already here
00:07:38
in this interface which is really nice
00:07:40
it's a very nice interface for training
00:07:43
if I wanted to train this image here or
00:07:45
any image for that matter all I have to
00:07:47
do is click on this
00:07:49
box and I can pull down one of these
00:07:52
names for uh instance my name and I can
00:07:54
click on train and it's going to ask me
00:07:57
if I want to train this file and if I
00:07:59
click on yes it's going to go in here
00:08:01
and now it's going to start training my
00:08:03
files this is a super easy way to tell
00:08:06
your
00:08:07
detectors uh what images to use for what
00:08:10
particular purpose and you create a
00:08:12
folder for each person or thing that you
00:08:15
want to train or each person so I would
00:08:18
call this Del you know if I had a
00:08:20
delivery person that was always the same
00:08:21
and I knew their name I would create a
00:08:23
delivery folder and then I could just
00:08:25
pick these images and train them into
00:08:27
the into the delivery folder
00:08:30
so another way to train your uh images
00:08:33
here using this deep stack interface or
00:08:35
this I'm sorry this double take
00:08:37
interface is to upload files I pick a
00:08:41
folder I click upload and then I can
00:08:44
pick some images here from this list so
00:08:47
I'll just pick a couple of my
00:08:50
names couple of my
00:08:53
images and I will click on open and what
00:08:56
it's going to do is start training
00:08:57
automatically and I picked four Images
00:08:58
but you see 8 here it's because I have
00:09:01
two detectors configured right now I
00:09:04
have a deep stack detector and I have a
00:09:07
comper face detector and I'm using both
00:09:10
of those at the same time right now
00:09:12
because I want to see which one's better
00:09:14
there's been a lot of
00:09:15
discussion uh stating that comper face
00:09:18
is actually better than deepstack and so
00:09:21
far I am seeing that I think comper face
00:09:23
does a better job of detecting
00:09:26
faces now the training time will depend
00:09:28
on how many images you put in there and
00:09:31
also how busy your system is so we'll
00:09:34
let that train for a few minutes and
00:09:35
we'll come back to it the compra face is
00:09:39
the uh is an open source face
00:09:41
recognition system and again it's a
00:09:44
Docker container there's an official
00:09:46
website for it as well and it's like I
00:09:49
said it's a free and open source
00:09:50
recognition system can easily integrated
00:09:52
in any system and uh Double Take allows
00:09:55
you to use that as well as uh face boox
00:09:59
I think is the other one and and the
00:10:01
Deep stack so we're using uh using both
00:10:04
of those and I think again that uh Comer
00:10:07
face actually works better than deep
00:10:09
stack once these images are trained
00:10:11
you'll see them in your trained uh
00:10:13
folder here and then it'll start using
00:10:16
those to match against what's coming in
00:10:19
and these are the images that I chose uh
00:10:21
for that now deep stack has been having
00:10:24
issues even now it's got some time out
00:10:26
issues so you'll notice that it's
00:10:27
trained for comper face but it's not
00:10:29
trained for deep stack and that's
00:10:31
because uh there are timeout issues with
00:10:33
deep uh deep stack uh as I've played
00:10:36
with this over the last couple days I'm
00:10:38
pretty sure I'm just going to get rid of
00:10:39
deep stack and let copper face do all
00:10:40
the work because it seems to be the best
00:10:43
now if you were to have fold those uh
00:10:45
photos that come up as unknown here you
00:10:48
can train them and then you can rescan
00:10:49
them here reprocess the images and those
00:10:51
reprocess images will then hopefully
00:10:55
come back to the person you just trained
00:10:57
or a person that you have in your
00:10:58
trained folder
00:10:59
so we have the main interface for
00:11:01
matches and this is in this does live
00:11:03
updates uh you can either enable or
00:11:05
disable live
00:11:07
updates and you can also disable or
00:11:09
disable this filtering disable or enable
00:11:11
the filtering and I can choose who I
00:11:14
have uh for matches or misses uh what
00:11:17
kind of detector I'm using which camera
00:11:19
they come on and then all these other
00:11:21
settings so the configuration of double
00:11:23
te is done through the UI as well
00:11:25
typically you'll do that first I'm kind
00:11:27
of doing everything backwards cuz I'm
00:11:28
showing you what's happening as we go
00:11:30
along in the wrong order but you would
00:11:32
come to config first and in the
00:11:34
configuration once you're in the
00:11:35
configuration you would set your mqtt
00:11:38
host username and password you would set
00:11:40
your frig URL with its Port make sure
00:11:43
you're using the right port and there
00:11:45
are some settings that you can set
00:11:47
within here that are talked about within
00:11:49
this configuration page both on GitHub
00:11:51
and the docker
00:11:53
page you can leave everything as default
00:11:55
to start with and then come back later
00:11:57
on and do stuff with it if you need to
00:11:59
change it I added a couple of things
00:12:01
here under events I wanted
00:12:04
attempts uh to be no latest snapshot
00:12:08
because I was getting multiple notices
00:12:09
for the snapshot I wanted I mean the the
00:12:12
latest image so you get latest image
00:12:14
snapshots and mqtt if you have it
00:12:16
enabled the latest image is what shows
00:12:20
up as the frig is sending it in and then
00:12:24
you have the snapshot which is the
00:12:26
snapshot that you get from frig and then
00:12:28
you have m qtt which is then sent once
00:12:31
everything is processed the thing about
00:12:33
mqtt is there's quite a delay because it
00:12:35
has to finish the event before it sends
00:12:37
it in so if you want to do snapshots um
00:12:40
or if you don't want to do a latest at
00:12:42
least do snapshots and I've commented it
00:12:44
out because it does a default of 10 and
00:12:46
again all this is set in here and it
00:12:48
talks about these types of changes that
00:12:51
you can make and these changes actually
00:12:53
allow you to set things like the latest
00:12:55
so this the number of times devil te
00:12:57
will request a frigate image for facial
00:12:59
recogition number of times it will
00:13:01
request a frig snapshot and number of
00:13:03
times or um whether it processes images
00:13:06
from the mqtt topic and then you can add
00:13:09
delay Express in seconds between each
00:13:11
detection Loop right now it's set the
00:13:12
default of one these are all defaults so
00:13:15
mqtt is false by default and then it'll
00:13:17
do the last five snapshots the last five
00:13:20
uh latest photos from frig to try to
00:13:22
process the image change those as you
00:13:25
need to but it's a lot of tweaking and a
00:13:27
lot of of that kind of thing you need to
00:13:28
do to make sure that you get it just the
00:13:30
way you want it and then you set up your
00:13:32
detectors you've got your comper face
00:13:34
detector again I'm running these on a VM
00:13:37
so there's my local IP with the port
00:13:39
number and then you generate a key for
00:13:41
comp perace since these are only
00:13:43
accessible inside my network I'm not
00:13:44
worried about the key and uh tokens
00:13:46
being exposed in this video uh that's
00:13:49
okay I can change as at will and
00:13:51
probably will after the video is filmed
00:13:53
deep stack same thing you can run deep
00:13:55
stack uh and then you can set a key and
00:13:57
a timeout keys are not used by default
00:13:59
so I would leave that blank unless you
00:14:01
really need it and then I've got my
00:14:03
gotify or gotify URL which is also
00:14:06
running with a token and that's what
00:14:08
sends my gotify images or notifications
00:14:11
to my gotify service which I'm listening
00:14:13
to on a browser and also on my phone so
00:14:16
that's my configuration it's super
00:14:18
simple to set up and uh you can also do
00:14:20
a Secrets file here this Secrets file
00:14:24
will allow you to set Secrets within
00:14:26
this configuration file and it also Al
00:14:29
pulls your secrets file if you do this
00:14:30
as a home assistant add-on now there's a
00:14:33
home assistant add-on for Double Take
00:14:36
and comper face and deep stack and of
00:14:38
course mqtt and frig both have add-ons
00:14:41
you can run all of those on your home
00:14:42
assistant device if you have enough
00:14:44
horsepower I just don't have enough
00:14:46
horsepower to do everything and by the
00:14:49
way while I'm thinking about it uh you
00:14:51
can see instant status of all your
00:14:53
connections up here so everything you
00:14:55
have configured here will show up here
00:14:56
my double take's running my mqtt is
00:14:58
running running deep stack keeps getting
00:15:01
a timeout and that's what we talked
00:15:02
about I've seen some discussion about
00:15:04
this in recent uh comments that there's
00:15:06
a timeout issue for some reason with
00:15:08
deep stack but I'm running both of these
00:15:10
so really I'll probably again get rid of
00:15:13
deep stack and just run comeras so as
00:15:15
images come in from frig Double Take is
00:15:17
going to go grab those images send them
00:15:19
to the processors the detectors for
00:15:22
processing and then come back with this
00:15:25
uh kind of display right here where you
00:15:27
see uh the images that it picks up with
00:15:30
faces in it and then gives you an
00:15:32
indication of whether or not they
00:15:35
actually worked you can set up
00:15:36
notifications and home assistant for
00:15:38
this as well I am still playing with
00:15:41
that and haven't quite tweaked it so
00:15:43
stay tuned for another video and I keep
00:15:45
saying I'm going to show you all the
00:15:46
configuration later I probably will I
00:15:48
want to give you an idea of this I want
00:15:50
to hear comments about this I want to
00:15:51
hear from you whether or not you're this
00:15:54
is something that uh is even useful to
00:15:57
you do you have uses for facial
00:16:00
recognition does your environment allow
00:16:02
you to set up cameras that are close in
00:16:04
enough to give you a good image so that
00:16:06
you can get facial recognition if I'm
00:16:09
looking at people walking down the
00:16:10
street on my frig cameras for example
00:16:13
and I see this image here someone's
00:16:15
walking down this street right here I'm
00:16:17
not going to see them it's not going to
00:16:19
pick them up there's no way for my
00:16:21
camera to zoom in here and pick up
00:16:22
people on the street they would have to
00:16:24
walk right up in this area right here
00:16:26
for me to be able to have that useful
00:16:28
even out here may not be enough and it
00:16:31
just really depends on your your angle
00:16:33
of what you're looking at on this camera
00:16:36
here may not work so again I want to
00:16:38
hear what your thoughts are on whether
00:16:41
or not this is useful do you want to see
00:16:43
me uh show you my config how I set up
00:16:46
the docker containers and everything
00:16:47
because I will run through that from
00:16:48
scratch again if you would like me to do
00:16:50
that let me know what you think about
00:16:52
this in the comments below let me know
00:16:54
if you're using double take deep stack
00:16:56
com forace let me know if you're using
00:16:58
the add-ons and home assistant or you're
00:17:00
using a standalone in what kind of
00:17:01
environment throw all that in the
00:17:02
comments let's have a discussion about
00:17:04
this because if I'm doing this wrong and
00:17:06
it's likely I might be I want to know
00:17:08
how I can tweak and tune it to make it
00:17:10
useful I would I think one very simple
00:17:13
use case for me is I don't want to be
00:17:15
alerted when my family's walking around
00:17:17
in any of the images here I don't I
00:17:19
don't need to be alerted when they're
00:17:20
doing stuff here I want to be alerted
00:17:22
when there's an unknown face that pops
00:17:24
up so if I can say okay with reliable
00:17:28
results
00:17:29
it picks up my family or whoever I want
00:17:31
in the images as people I know and
00:17:34
everyone else becomes unknown that would
00:17:36
be fine I know how to alert on that but
00:17:38
I just can't get the thing to detect
00:17:39
because the image sizes are too small
00:17:42
and maybe I haven't trained it enough
00:17:44
I'm using high quality images from
00:17:47
either selfies or photos I'm not using
00:17:50
camera images to show it because if you
00:17:53
use camera images and they're too small
00:17:56
the you're going to get an unreliable
00:17:58
result as as well so tell me about it I
00:18:00
know this wasn't a big how-to video it's
00:18:02
just a quick talk about what frig uh and
00:18:05
Double Take and deep stack on com forace
00:18:08
have done for me in the last couple of
00:18:09
days I love the work that's going on I
00:18:12
don't want to minimize the amount of
00:18:13
work that's been going on I think this
00:18:15
is an excellent uh start for this I just
00:18:18
need to know how to make it better for
00:18:20
my use cases and maybe I have to have a
00:18:23
zoom uh zoomable camera maybe I have to
00:18:25
have a different type of camera uh all
00:18:28
that kind of stuff just to see if I get
00:18:29
reliable results or maybe my use case
00:18:32
isn't valid for what this is used for so
00:18:34
with that let me go and end the video
00:18:36
here again it's a little bit different
00:18:38
than what I'm used to to filming for you
00:18:40
it's not a big how-to but if there's
00:18:42
enough interest I will build the entire
00:18:45
configuration of what I'm using from
00:18:46
start to finish on a separate video If
00:18:49
you're not a subscriber uh make sure you
00:18:50
hit that subscribe button uh thank you
00:18:52
for watching if you like the video hit
00:18:54
that thumbs up and uh Channel
00:18:56
memberships are available for you as
00:18:57
well if you would like to contribute and
00:18:59
support the channel and we'll see you on
00:19:00
the next one
00:19:03
[Music]