Colmap is an open-source software for 3D reconstruction from images, allowing users to estimate camera poses and create 3D models.

What is the first step in 3D reconstruction using Colmap?

The first step is feature extraction, where unique landmarks in the images are identified.

How does Colmap handle feature matching?

Colmap uses various algorithms for feature matching, including exhaustive, sequential, and vocabulary tree methods, depending on the nature of the image dataset.

What is geometric verification in Colmap?

Geometric verification is a process that ensures the matches between features in different images make sense geometrically, filtering out incorrect matches.

What is the difference between incremental and global reconstruction?

Incremental reconstruction adds images one at a time, while global reconstruction estimates the 3D poses of all images simultaneously.

Can I use Colmap on a standard computer?

Yes, Colmap can run on standard computers, although performance may vary based on hardware specifications.

What is bundle adjustment?

Bundle adjustment is an optimization process that refines the 3D points and camera poses to improve the accuracy of the reconstruction.

What should I consider when taking images for 3D reconstruction?

Ensure to take sharp images with good features and varied angles to improve the quality of the 3D reconstruction.

Is there a learning curve for using Colmap?

Yes, while Colmap is powerful, understanding its various options and workflows may require some time and experimentation.

Where can I find tutorials for Colmap?

Colmap's official website provides tutorials and documentation to help users get started with the software.

Understanding 3D Reconstruction with COLMAP

00:57:02

https://www.youtube.com/watch?v=EdIuDLicU0c

Résumé

TLDRIn this episode of Computer Vision Decoded, the hosts discuss Structure from Motion (SfM) and 3D reconstruction using Colmap software. They explain the workflow from feature extraction to camera pose estimation and incremental reconstruction. Jared Heinley, a computer vision expert, elaborates on the importance of camera models, feature matching strategies, and geometric verification. The episode also contrasts incremental and global reconstruction methods, emphasizing the efficiency of Colmap and the newer Glowmap software. Listeners are encouraged to experiment with Colmap to gain practical experience in 3D reconstruction.

A retenir

📸 Colmap is an open-source tool for 3D reconstruction.
🔍 Feature extraction identifies unique landmarks in images.
🔗 Feature matching connects similar features across images.
🧮 Geometric verification ensures accurate matches between images.
🔄 Incremental reconstruction adds images one at a time.
🌍 Global reconstruction estimates poses for all images simultaneously.
⚙️ Bundle adjustment refines camera poses and 3D points.
🖼️ Good imagery is crucial for successful reconstruction.
🛠️ Experimenting with Colmap helps understand 3D reconstruction better.
📚 Tutorials and documentation are available on Colmap's website.

Chronologie

00:00:00 - 00:05:00
In this episode, the hosts introduce the topic of structure from motion and 3D reconstruction using Colmap, a free and open-source software. They aim to demystify the process of 3D reconstruction from imagery, with expert Jared Heinley explaining the workflow involved in obtaining camera poses and creating 3D models.
00:05:00 - 00:10:00
Jared shares his background with Colmap, detailing its origins and development by Johannes Schunberger during his time at UNC Chapel Hill. The discussion highlights the evolution of Colmap from earlier software focused on aerial photography to a more generalized tool for 3D reconstruction from various image collections.
00:10:00 - 00:15:00
The hosts discuss the initial steps in using Colmap, emphasizing the importance of understanding camera positions and the process of extracting images from a video or taking multiple photos from different angles to create a 3D model.
00:15:00 - 00:20:00
Jared explains the significance of feature extraction, where unique landmarks in photographs are identified to establish 2D relationships between images. This step is crucial for later 3D reconstruction, as it allows the software to track points across multiple images.
00:20:00 - 00:25:00
The conversation moves to the matching process, where the software identifies correspondences between features in different images. This involves geometric verification to ensure that matches make sense in the context of camera motion and scene geometry.
00:25:00 - 00:30:00
The hosts discuss the various matching algorithms available in Colmap, such as exhaustive, sequential, and vocab tree matching, and how to choose the right one based on the dataset and image collection strategy.
00:30:00 - 00:35:00
Incremental reconstruction is introduced, where the software builds a 3D model by adding images one at a time. Jared explains the initialization process and how the software determines which images to use based on feature matches and camera motion.
00:35:00 - 00:40:00
The episode covers the iterative loop of image registration, triangulation, and bundle adjustment, which refines both the 3D points and camera poses as new images are added to the reconstruction.
00:40:00 - 00:45:00
Jared clarifies the role of bundle adjustment in optimizing the alignment of 3D points and camera positions, and how it can be performed locally or globally depending on the reconstruction size and complexity.
00:45:00 - 00:50:00
The hosts briefly touch on Glowmap, a newer software that offers a global reconstruction approach, allowing for faster processing by estimating camera poses for all images simultaneously, contrasting it with Colmap's incremental method.
00:50:00 - 00:57:02
The episode concludes with encouragement for listeners to experiment with Colmap, emphasizing the importance of taking sharp images and understanding the reconstruction process to achieve better results.

Afficher plus

Carte mentale

Vidéo Q&R

What is Colmap?
Colmap is an open-source software for 3D reconstruction from images, allowing users to estimate camera poses and create 3D models.
What is the first step in 3D reconstruction using Colmap?
The first step is feature extraction, where unique landmarks in the images are identified.
How does Colmap handle feature matching?
Colmap uses various algorithms for feature matching, including exhaustive, sequential, and vocabulary tree methods, depending on the nature of the image dataset.
What is geometric verification in Colmap?
Geometric verification is a process that ensures the matches between features in different images make sense geometrically, filtering out incorrect matches.
What is the difference between incremental and global reconstruction?
Incremental reconstruction adds images one at a time, while global reconstruction estimates the 3D poses of all images simultaneously.
Can I use Colmap on a standard computer?
Yes, Colmap can run on standard computers, although performance may vary based on hardware specifications.
What is bundle adjustment?
Bundle adjustment is an optimization process that refines the 3D points and camera poses to improve the accuracy of the reconstruction.
What should I consider when taking images for 3D reconstruction?
Ensure to take sharp images with good features and varied angles to improve the quality of the 3D reconstruction.
Is there a learning curve for using Colmap?
Yes, while Colmap is powerful, understanding its various options and workflows may require some time and experimentation.
Where can I find tutorials for Colmap?
Colmap's official website provides tutorials and documentation to help users get started with the software.

Voir plus de résumés vidéo

Accédez instantanément à des résumés vidéo gratuits sur YouTube grâce à l'IA !

Sous-titres

Défilement automatique:

00:00:00
Welcome to another episode of computer
00:00:02
vision decoded. I'm really excited about
00:00:04
this episode because it's going to solve
00:00:06
a lot of questions that we get about
00:00:10
structure from motion and 3D
00:00:12
reconstruction when it comes to coal map
00:00:15
and just figuring out how to do some of
00:00:17
the basics of 3D reconstruction from
00:00:20
imagery. And as always, I have Jared
00:00:22
Heinley, our in-house computer vision
00:00:25
expert, to walk us through what happens
00:00:28
when you run software like Colemap to
00:00:31
get camera poses, 3D reconstruction, and
00:00:34
kind of break down how that all works at
00:00:36
a tangible level. So when you walk away
00:00:38
from this episode, you should have a
00:00:40
better understanding of this black box
00:00:43
of Cole Map and other 3D reconstruction
00:00:46
software that follows the same workflow.
00:00:48
So, as always, Jared, thanks for joining
00:00:50
me and welcome to the episode. Yeah,
00:00:52
thank you. Let's just get to what we're
00:00:54
all here for. Let's let's learn about
00:00:56
Cole Map. And I don't want to say
00:00:59
specifically Cole Map, but we're going
00:01:00
to use it as the basis for this episode
00:01:04
to have something for someone to follow
00:01:06
along. And since it's open-source and
00:01:09
free, they can download Cole Map and do
00:01:13
this on their own PC without, you know,
00:01:16
have to pay for some third party
00:01:18
software that they won't learn as much
00:01:20
through. So Jared, let's just start off
00:01:22
with I'm going to share my screen. I
00:01:25
have some images and we want to turn
00:01:27
these images into a 3D model or just at
00:01:31
least know where these cameras are in
00:01:33
relation to each other. I'm going to be
00:01:35
doing some screen shares. If you're
00:01:37
listening to the audio only, I'll do my
00:01:39
best to talk about what we have on the
00:01:42
screen. But, uh, if I start out here, I
00:01:45
have a picture of a well, it was a
00:01:48
fountain that used to work in front of
00:01:50
the Oregon State Capital. I took this
00:01:52
one sunny day last year. And if I flip
00:01:55
through the images, I basically walked
00:01:58
around this fountain and got a bunch of
00:02:02
good angles. In fact, I believe I used a
00:02:04
video and extracted a bunch of images
00:02:07
and at some points there's some sun
00:02:09
issues, things like that. But it was
00:02:12
good enough for me to get a 3D model. So
00:02:15
Jared, what's what's the first step
00:02:17
someone would take then to turn this
00:02:20
into a 3D model? Know where the cameras
00:02:22
are, things like that? Yeah. Yeah. Well,
00:02:24
you just you hinted it right there at
00:02:26
the very end. Know where the cameras
00:02:27
are. And I guess and to try to refine
00:02:29
some of my language. A lot of times when
00:02:30
I say camera, sometimes I mean, you
00:02:32
know, image and camera. I'll use those
00:02:34
words interchangeably sometimes, you
00:02:36
know, but you said that you walked
00:02:37
around with a single camera, you know,
00:02:39
your phone or a DSLR or whatever it may
00:02:42
be. And from that video, maybe you
00:02:44
extract frames, you know, images or you
00:02:46
took photos yourself. And so you have
00:02:49
multiple images taken by a single
00:02:51
physical camera, but you were moving
00:02:54
around that scene, moving around that
00:02:56
object. And so that camera was occupying
00:02:58
different physical 3D points in space
00:03:01
and then these images were captured from
00:03:02
those different 3D points and those from
00:03:04
those different 3D 3D perspectives. So
00:03:07
you know as humans we just do this
00:03:09
naturally like as you just flipped
00:03:10
through those photos there and you know
00:03:12
uh you know and as you kind of orbited
00:03:14
around that fountain it's like yeah our
00:03:17
brains are immediately like oh yeah okay
00:03:19
I can see that the ground is a little
00:03:21
bit closer. Here's this foreground
00:03:23
fountain. I see the trees in the
00:03:24
background. I see some other structures
00:03:26
in the background and I'm immediately I
00:03:29
can see that yep you were moving to the
00:03:31
left and sort of this clockwise motion
00:03:33
this thing's you know near that other
00:03:35
things are far and our brains are
00:03:37
immediately doing all of that 3D
00:03:38
reasoning but in order to have software
00:03:42
do this in order to have a computer
00:03:43
generate a 3D reconstruction or 3D
00:03:46
representation of what's in these photos
00:03:49
it has to figure out it has to do all of
00:03:51
that math and it doesn't know how to do
00:03:52
that reasoning by default. has to figure
00:03:54
out well where were you standing when
00:03:56
that photo was taken? Where was the
00:03:58
camera positioned? How was it angled?
00:04:00
What was the zoom level uh of of the
00:04:02
current lens? And so it's doing all has
00:04:04
to figure out where everything was
00:04:07
oriented. And that's typically one of
00:04:08
the first processes is trying to figure
00:04:09
out how how are things related to each
00:04:11
other, you know, and once we kind of
00:04:12
know how they're related, then figure
00:04:14
out what is the 3D 3D geometry that that
00:04:17
uh describes describes that
00:04:19
relationship.
00:04:21
And so it goes through. So, I I don't
00:04:23
have it on my screen, but I will pull it
00:04:26
up in a second, but Cole Map has in
00:04:28
their tutorial information a good kind
00:04:32
of diagram. I'll let me bring that up,
00:04:35
but it it it basically shows the
00:04:36
workflow that it goes through. So, if I
00:04:39
go to the actual website for Cole Map
00:04:43
and you go look at their tutorial, you
00:04:45
can see that. So, let's just pull that
00:04:47
up on my screen as well. while he's
00:04:50
pulling that up. Um, just jump in with a
00:04:53
little bit of personal history about
00:04:54
call map. So, I did my PhD back at UNC
00:04:59
Chapel Hill. So, I was there from 2010
00:05:00
to 2015. And while I was there, Johannes
00:05:03
Schunberger, he came to UNCC for two
00:05:05
years to do his masters. And so,
00:05:07
Johannes, he's the author of Call Map.
00:05:10
Um, but at the time when he was there,
00:05:11
call Map didn't exist. Johannes had
00:05:14
worked on uh previous structural motion
00:05:16
software and had built he'd worked with
00:05:18
some uh I believe it was drones so
00:05:20
aerial photography 3D reconstruction and
00:05:23
so he had built a pipeline that he had
00:05:24
called MAV map I I'll probably get this
00:05:27
wrong but I think like you know mobile
00:05:28
aerial vehicle mav map like map or
00:05:31
mobile vehicle mapper and so but he was
00:05:34
looking to generalize that to move
00:05:36
beyond just aerial photography and to do
00:05:39
more general purpose image collections
00:05:42
and So it was this idea of image
00:05:44
collections you know where he came up
00:05:46
with call map collection mapper um to
00:05:49
say I want to take a collection of
00:05:51
images and generate a 3D reconstruction
00:05:53
from it. So he was working on that while
00:05:54
he was at UNC. I may have been one of
00:05:57
the first people to actually use calm
00:05:59
map um in in my final uh PhD project. I
00:06:04
had uh processed a 100 million images on
00:06:07
a single PC and I was doing so this you
00:06:10
feature matching extraction but then I
00:06:12
needed some way to reconstruct them and
00:06:14
our lab had some other software that
00:06:16
could do 3D construction but Johannes
00:06:17
had just written this first version of
00:06:19
call map and so I said great let's use
00:06:21
that and that that was efficient that
00:06:24
was fast and it did did exactly what we
00:06:26
needed to do and so that that helped uh
00:06:28
helped get my paper across the gold line
00:06:29
there at the very end so nice and since
00:06:32
then Johannes has gone off you know at
00:06:33
ETH circ and now uh at other companies
00:06:36
and continued to you know open source
00:06:38
call map and now it's used all over the
00:06:40
world and has won won him some awards
00:06:42
for it. So you know interestingly the
00:06:45
glow map what came out last year and he
00:06:48
had his fingers in that as well. Yep. So
00:06:50
it's not over. I still see coal map
00:06:52
being updated on a semi-regular basis as
00:06:56
well. So although it came out a few
00:07:00
several years ago, it's it's not static.
00:07:02
No. No. because it because it is such a
00:07:05
an important step the the task that call
00:07:08
map solves and and similarly glow map
00:07:10
you know figuring out the 3D pose you
00:07:13
know pose is position plus orientation
00:07:16
figuring out the 3D pose of images is a
00:07:19
key step in so many uh 3D pipelines you
00:07:22
if you want to understand the world in
00:07:24
3D you got to figure out where these
00:07:25
images were taken from you know and
00:07:27
that's the key task that that call map
00:07:30
uh solves for a lot of people Okay, that
00:07:33
that makes that makes sense. I had no
00:07:34
idea also that coal map stood for
00:07:36
collection mapper. I'm I mean it makes
00:07:39
sense, but I thought maybe it was a long
00:07:41
acronym. So, um so, okay. Well, I have
00:07:45
this diagram up then. If you're
00:07:47
watching, you can see it on the screen,
00:07:48
but if you're listening, it's basically
00:07:51
a workflow of how images go from just a
00:07:55
collection of images to a 3D
00:07:58
reconstruction. And you got camera
00:08:00
poses. And I'm going to show this in
00:08:01
coal map on my screen as well. But this
00:08:04
diagram just shows the different phases
00:08:06
are the right or steps that you go
00:08:08
through to get from pictures to 3D. And
00:08:11
it starts out with feature extraction.
00:08:14
And if you actually go to the tutorial
00:08:16
as well. So if I share the just the
00:08:17
tutorial page, that diagram makes sense.
00:08:20
But the minute you start diving into it,
00:08:22
you have a wall of text that to most
00:08:25
people won't make through this very well
00:08:28
unless they are perhaps a computer
00:08:30
science major, someone like Jared who
00:08:32
does this academically or for a job. I
00:08:36
look at this and I'm like, okay, some of
00:08:38
this makes sense. A lot of this is
00:08:40
beyond me. So, we're going to break that
00:08:42
down. So, yeah. Okay, starting out with
00:08:43
feature extraction. So, what is that
00:08:45
step? So, what do we We're taking the
00:08:46
images and sounds like something's
00:08:48
happening there with features. Yeah.
00:08:50
Yeah. Absolutely. So, uh, and just to
00:08:52
take a step back here too. So, like we
00:08:54
like you said, this is, you know, sort
00:08:55
of a workflow, a sequence of steps that
00:08:57
goes into generating a reconstruction.
00:08:59
So, you had those images input. There's
00:09:00
a sort of first block of steps that's
00:09:02
labeled correspondent search. After
00:09:04
that, we have incremental reconstruction
00:09:06
and then finally we end up with a final
00:09:08
reconstruction. But yeah, so within
00:09:09
correspondence search, our goal for
00:09:11
correspondent search is to figure out
00:09:14
the 2D relationship between that
00:09:17
collection of images. So, we're not even
00:09:18
talking about, you know, real 3D yet.
00:09:20
There might be some hints at 3D uh in
00:09:23
these steps, but we haven't done any
00:09:25
reasoning to really understand which
00:09:27
photos, you know, are where in 3D space.
00:09:30
Um, so it's just about 2D understanding,
00:09:33
2D matching, 2D correspondence between
00:09:35
this this collection of images. So, with
00:09:38
that in mind, first step is feature
00:09:41
extraction. So the goal there is to
00:09:45
identify unique landmarks within a
00:09:48
photograph. And these unique unique
00:09:51
landmarks, the intent of that is if I
00:09:54
can identify a unique landmark, you
00:09:56
know, a 2D point in one photo, hopefully
00:09:59
I can identify that same point in
00:10:00
another photo and another photo and
00:10:02
another photo. And if I can identify and
00:10:05
follow or you know or track that 2D
00:10:07
point between multiple images now I can
00:10:10
use that as a constraint later on when I
00:10:13
do the 3D construction. I can say, "Hey,
00:10:15
hey, however these images are
00:10:18
positioned, that point that they saw,
00:10:21
that pixel should converge to a common
00:10:24
3D position in space." And so it's
00:10:26
adding a sort of a viewing constraint
00:10:28
saying, you know, each image saw a 2D
00:10:30
point. I don't know the depth of that
00:10:31
point. So it all it sort of gives me is
00:10:33
a viewing ray. So along this direction
00:10:36
out into the scene, I saw this unique
00:10:38
landmark. Now, I've seen that same
00:10:40
landmark in many other photos. I want to
00:10:42
identify that and add that as a
00:10:44
constraint because that is like most
00:10:45
likely you know a 3D point. So feature
00:10:48
extraction is the automatic
00:10:50
identification of typically tens of
00:10:52
thousands thousands or tens of thousands
00:10:54
of these unique landmarks in an image. A
00:10:57
lot of times there are different flavors
00:10:58
of feature detection. The one used in
00:11:01
call map is sift scale and variant
00:11:03
feature transform. What it does is it
00:11:05
looks for I call it a blob style
00:11:08
detector where it's looking for a patch
00:11:11
of pixels that has high contrast to its
00:11:13
its background. So it could be something
00:11:15
that's you know light colored surrounded
00:11:17
by dark or vice versa something that's
00:11:19
dark surrounded by light. You know it's
00:11:21
going to look at look for these at
00:11:23
multiple scales. That's why it's scale
00:11:25
invariant. So multiple resolutions. So
00:11:27
this could be something that's you know
00:11:28
very small or something that's larger in
00:11:31
the image. Mhm. But once it's
00:11:34
found that sort of high contrast
00:11:36
landmark, it now will then extract some
00:11:40
representation of uh the appearance, you
00:11:44
know, of of the area around that
00:11:45
landmark. So it'll say, "Hey, I found
00:11:47
something interesting." So maybe it's
00:11:48
the um you know, a doorork knob on a
00:11:51
door, you know. So it'll say, "Hey, that
00:11:53
that doorork knob is a different color
00:11:55
than the background, the rest of the
00:11:57
door." And so now I want to describe
00:11:59
that door knob. And so I'm going to look
00:12:01
I don't want to look just at the
00:12:02
doororknob itself. I'm going to look
00:12:03
around it and say here's my doorork knob
00:12:05
and then oh there's this wood pattern on
00:12:08
the door around it. And so it's going to
00:12:09
come up with a representation for that.
00:12:12
And so what sift actually does or what
00:12:14
different feature representations are
00:12:16
that could be a whole podcast in and of
00:12:18
itself. But at a conceptual level you
00:12:20
just think about it. It sort of
00:12:23
summarizes what that looks like at at a
00:12:25
rough level. It says, "Okay, I saw
00:12:27
something dark in the middle and then
00:12:28
there was this, you know, rough pattern
00:12:30
around it vicinity." Mhm. Okay. So then
00:12:34
I'm bringing up coal map and this is
00:12:36
I've unfortunately had already run the
00:12:38
project because I didn't want us to have
00:12:40
to sit and watch things go and a lot of
00:12:42
these things run really fast. So sift is
00:12:44
fast if you can run it on GPU. I don't
00:12:47
can't necessarily show what 10 what say
00:12:49
I think it maxes at 10,000 by default
00:12:52
but if you have coal map and you kind of
00:12:54
want to follow along the first thing you
00:12:56
do is set up a new project and that
00:12:59
part's pretty easy but then you just go
00:13:01
to processing and hit feature
00:13:03
extraction and you get to pick a camera
00:13:06
model. Why is that important? Why is
00:13:07
picking a camera model important for
00:13:09
this? Well, this is important and and
00:13:11
this this ends up being really important
00:13:13
later on when we start thinking about
00:13:14
the geometry of these images and and
00:13:17
what kind of camera and lens was used
00:13:20
because these camera models are it is
00:13:23
defining the geometry of that camera. So
00:13:27
this right now you have a simple radial
00:13:29
camera you know selected and so
00:13:30
underneath of it sort of in grayscale
00:13:34
are some parameters listed. It says,
00:13:35
"Oh, simple radial has f, cx, c y, and
00:13:39
k." Mhm. And so you kind of have to know
00:13:41
from a computer vision literature that f
00:13:44
is your focal length. CX and CY, that's
00:13:46
the principal point. So that's defined,
00:13:48
well, where is the center uh of my image
00:13:50
or where is the optical axis of my my
00:13:54
lens and how is that aligned with the
00:13:56
image center? So a lot of times I just
00:13:57
kind of say, hey, hand wavy, it's you
00:13:59
know, what's the center of my image? And
00:14:01
then that K is a a single radial
00:14:05
distortion term. So it's assuming a lot
00:14:07
of times lenses introduce a little bit
00:14:10
of curvature effect, you know, curvature
00:14:12
distortion to them. And so we're going
00:14:13
to use a single mathematical term, a
00:14:16
single, you know, polomial term to
00:14:19
represent the distortion in that lens.
00:14:22
This might be great. This is great for a
00:14:24
lot of just, you know, general cameras.
00:14:26
But if you know that your lens has a
00:14:29
little bit more distortion, maybe you're
00:14:31
using, you know, a wide-angle camera, a
00:14:33
GoPro or a drone that has uh a wider
00:14:36
field of view and some distortion. If
00:14:38
you have a really wide angle camera,
00:14:40
something that you can see lot of
00:14:42
distortion, then you might want one of
00:14:43
these one of these fisheye versions.
00:14:45
They have simple radial fisheye or the
00:14:47
normal fisheye. There's even I think at
00:14:49
the very bottom of the list, there's one
00:14:50
called FOV. That's one that's really
00:14:53
great for super wide angle. Mhm. You
00:14:56
know, a lot of times for a normal camera
00:14:58
like your iPhone in your pocket or your
00:14:59
DSLR or your point and shoot or whatever
00:15:01
it ends up being or your simple radial
00:15:04
or your radial models um are nice
00:15:07
because they assume that you've um
00:15:12
you've got a single focal length. You
00:15:14
know, your pixels are square. So, I
00:15:16
don't need more than one f term. You
00:15:18
want to model your principal point with
00:15:20
CXC. And here the radial model added an
00:15:24
extra lens distortion. So now instead of
00:15:25
just K, now we have K1 and K2. So that's
00:15:28
two radial distortion terms. So we can
00:15:30
do a little better job of estimating the
00:15:33
distortion of our lens. Okay. And so
00:15:35
call map asks for this right away
00:15:37
because what it's doing is it has that,
00:15:39
you know, part of that project creation
00:15:41
process is you create a database. And so
00:15:43
that's going to be, you know, a
00:15:44
collection of data stored on disk. And
00:15:46
so this process of feature extraction uh
00:15:49
is when call map goes through all of
00:15:51
your images, extracts features, but then
00:15:54
also creates those image entries in the
00:15:56
database. And so it needs to know what
00:15:58
style of camera is going to be
00:16:00
associated with that image. Mhm. And and
00:16:03
we could go and deepen a bunch of
00:16:05
buttons on here. Yeah. I don't want if
00:16:07
you just run this in default and simple
00:16:09
radio and using smartphone or something,
00:16:12
you'll be okay. But, you know, like here
00:16:14
is thinking I have all these different
00:16:15
cameras. There's options where you can
00:16:17
say use it's always one camera. So, it
00:16:20
just assumes then everyone's the same
00:16:21
camera, which is great. Yeah, that's
00:16:23
good. There's options for masks. I just
00:16:27
bring up a mask on my screen. This is me
00:16:30
masked. This is a mask. Not necessarily
00:16:33
the mask you would um use, but basically
00:16:36
there's a picture of me. This might be
00:16:38
the wrong picture. And I've been with
00:16:40
the mask as a separate file. And then if
00:16:42
you kind of like combine the two, you
00:16:43
end up with me masked out. And that's
00:16:46
like a way to say you want me not to be
00:16:48
in this result. You can mask out things.
00:16:51
Specifically, if you want perhaps just
00:16:52
an object to be reconstructed, you want
00:16:54
to mask out a background, things like
00:16:56
that we could go deep into. But there's
00:16:59
all these options, right, to help get
00:17:01
the right key points. So, if I go to
00:17:03
this database, so I ran this already and
00:17:05
I have this database manager where I can
00:17:07
kind of jump into things and I pick one
00:17:09
of these and I'm just going to hit show
00:17:11
image, it's going to bring up the image
00:17:13
and I can make this nice and big on my
00:17:15
screen. What we're seeing now is an
00:17:16
image of the fountain. I'm on the back
00:17:20
side of it right now with all these red
00:17:22
circles which are key points. Not
00:17:24
necessarily all the features, right?
00:17:25
It's just some of the ones that I think
00:17:27
it matched on. Is that wrong or am I on
00:17:29
the wrong I'm not I'm not entirely sure.
00:17:31
Yeah. Yeah. In some software packages,
00:17:33
they may show you all of them or may
00:17:35
show you just just the ones that have
00:17:37
been matched. I'm not sure with this
00:17:39
spec specific viewer right now. So,
00:17:41
yeah. And I'm not 100% clear either. I
00:17:44
haven't read the documentation. All I
00:17:45
know is visualizing. So, this is an idea
00:17:47
of key points where you'll notice
00:17:49
there's no key points where you have a
00:17:50
lot of low contrast, not a lot of visual
00:17:53
variation. So, I'm on my screen. And
00:17:55
there's a part where it shows the street
00:17:57
and there's just not much going on there
00:18:00
versus there's a lot of points on the
00:18:01
fountain which has all these ornate
00:18:03
decorations on it. In the background
00:18:05
there's trees and buildings that it's
00:18:07
latching onto. So it makes sense that
00:18:09
where you have less variation you're
00:18:11
going to have less features that it's
00:18:14
it's oh the sky is also another one
00:18:16
where you
00:18:18
this nice tree behind this thing it
00:18:19
caught a lot on. So it doesn't mean it
00:18:21
matched on those because you might not
00:18:23
see those. So if I then I'm going to
00:18:25
close this and then you can look at show
00:18:27
overlapping images. So you know if I
00:18:29
click here you can look at the the
00:18:30
matches. You're going to see then this
00:18:32
kind of correspondence matches where
00:18:34
it's finding key points between two
00:18:37
images and they show these green lines
00:18:39
basically saying these two images have
00:18:41
matching features that it it believes
00:18:43
are the same points. Right. Is that what
00:18:45
we're seeing? Exactly. Exactly. So that
00:18:47
this is now sort of moved to the second
00:18:50
and third bubbles within that
00:18:52
correspondent search block. So back to
00:18:54
that correspondent search. The first
00:18:55
step was the feature extraction which
00:18:57
was just the identification of these key
00:19:00
points in each of the images. So it
00:19:02
wasn't even trying to compare images
00:19:03
yet. We're just saying for each image
00:19:05
let me find those key points. And as as
00:19:07
Jonathan said, by default, if you've got
00:19:09
a GPU enabled version of call map and
00:19:12
you've got a nice GPU in your computer,
00:19:14
uh it will use the GPU implementation,
00:19:15
that graphics processor, which makes it
00:19:17
go a lot faster. So once we've extracted
00:19:20
those key points and those or or
00:19:22
features, again, I use those terms
00:19:24
interchangeably a lot, the key point and
00:19:25
the feature. Now, we want to match
00:19:28
images together, and that's to discover
00:19:31
which images show similar content. And
00:19:34
so the result of that is going to be the
00:19:37
set of correspondences, the set of uh
00:19:39
features saying the features in this
00:19:40
image matched to the features in this
00:19:42
image. And those were those green lines
00:19:43
that Jonathan had shown up uh just prior
00:19:46
saying that you know not all of the key
00:19:48
points from one image matched to the
00:19:49
other. There was some subset but um
00:19:52
we're trying to discover what those
00:19:54
matches are. In this diagram we said
00:19:57
that you know we had feature extraction,
00:19:58
matching and then geometric
00:20:00
verification. matching and geometric
00:20:02
verification uh a lot of times will go
00:20:04
hand inand you know so you run matching
00:20:06
and then you immediately run geometric
00:20:08
verification after that. So the
00:20:10
intention there is your matching is just
00:20:14
trying to figure out which features look
00:20:18
similar between two images but it's not
00:20:20
trying to do any sort of 2D or 3D
00:20:23
reasoning. So, it may think that, oh,
00:20:26
the the top of the tree in one image
00:20:29
looks like the top of another tree in
00:20:31
another image, but they're in completely
00:20:32
different parts of the image, and it
00:20:33
doesn't even make sense. Like, it it may
00:20:35
confuse things or especially if you
00:20:37
have, you know, a building with some
00:20:38
sort of repetitive pattern on it. You
00:20:40
know, the same brick repeated over and
00:20:42
over again, but you have some sort of
00:20:43
unique windows or unique artwork, you
00:20:46
know, that appears, you know, on that
00:20:47
wall. For feature matching, it may end
00:20:49
up matching incorrect parts of the image
00:20:51
to each other. So matching does its best
00:20:55
to try to figure out what matches, but
00:20:56
it might be wrong. It's geometric
00:20:58
verification's job to come in and clean
00:21:01
those up to figure out, well, now that I
00:21:03
have these initial set of matching key
00:21:05
points between my two images, which ones
00:21:08
actually make sense based on our
00:21:10
knowledge of geometry and how cameras
00:21:12
move. And so that's where sometimes you
00:21:15
can leverage, you know, knowing what
00:21:16
kind of camera model you have can be
00:21:18
helpful. knowing if if you expect a lot
00:21:20
of distortion or if it's a fisheye lens,
00:21:22
that can help. But sometimes um some
00:21:25
methods don't even try to use that
00:21:27
information. We'll just look at the 2D2D
00:21:29
relationships. Mhm. And so there are
00:21:31
some key words that you might see would
00:21:33
be it's you know estimating a homography
00:21:36
homography like a perspective transform
00:21:38
or an essential matrix or a fundamental
00:21:40
matrix. So each of these sort of
00:21:42
relationships, each of these matrices is
00:21:44
a way to describe how a point in one
00:21:47
image matches to a location in another
00:21:50
image or a set of locations in another
00:21:52
image. And and so we're trying to
00:21:54
estimate, you know, is there a valid
00:21:57
camera motion that we can imagine to get
00:22:01
a set of points in one image to move to
00:22:03
the set of points in the other image.
00:22:04
That's what geometric verification is
00:22:06
doing. Just figuring out those those 2D
00:22:09
relationships uh between images. And
00:22:12
somewhere in my logs, you can see some
00:22:14
hints of that. So, as this running, it's
00:22:16
showing all kinds of text on your screen
00:22:18
and it's I'm sure some of that when it's
00:22:21
well, it's showing bundle adjustment on
00:22:23
my screen right now, but at one point
00:22:25
it's it's talking about some of that the
00:22:27
matches and running different algorithms
00:22:30
in the background to get that. Um, so
00:22:32
and then if I if I click on like one of
00:22:34
these points that it created, it almost
00:22:36
it shows you where you have multiple
00:22:37
matches on a specific point and things
00:22:40
you can do to kind of get different
00:22:42
views and get hints of what we're
00:22:44
talking about here. But so one thing we
00:22:46
we didn't really talk about when you're
00:22:47
matching these images too that there's
00:22:50
there's different options as well. So
00:22:52
when I go through here, I'm processing,
00:22:53
I've got my key points. It goes fast on
00:22:55
a GPU because it's able to like look at
00:22:57
all the different images all at once,
00:22:59
right? They don't care about respect to
00:23:00
each other when you're extracting
00:23:02
features. But then you get to the point
00:23:03
where you need to do your matching. This
00:23:06
is where it's all CPU driven because
00:23:08
it's kind of either a sequential or
00:23:10
exhaustive, but it's not able to look at
00:23:11
every image all at once. But there's
00:23:14
options here where if I go to this
00:23:17
button here, it's not displaying on my
00:23:19
screen correctly for some reason. Oh,
00:23:20
there we go. You can you can do
00:23:22
exhaustive, sequential, vocab tree,
00:23:25
spatial. There's these different styles
00:23:27
you can pick or I want to say styles,
00:23:29
different algorithms you can pick to
00:23:31
match these. Yep. My understanding
00:23:33
always is if you have a random
00:23:35
collection of images like someone walked
00:23:37
around and they're not necessarily one
00:23:39
image is taken and then your next image
00:23:42
you moved over and took just of the same
00:23:44
part of the scene. But I don't know,
00:23:46
maybe you're just walking around taking
00:23:47
pictures in all which directions.
00:23:49
Exhaustive is what you want to use
00:23:50
because it's going to you can explain
00:23:52
this but it's going to like kind of try
00:23:54
to get every image to match to every
00:23:56
image versus sequential where you're
00:23:58
saying no no no each image was taken in
00:24:01
sequence. So I see the fountain from one
00:24:03
spot I moved a few feet took another
00:24:05
photo of it. They should be sequentially
00:24:06
somewhat matching to each other. Does
00:24:09
that sound correct? Is am that the right
00:24:11
assumption? You you're exactly right.
00:24:13
You're exactly right. So yeah, once you
00:24:15
once you've extracted the key points
00:24:16
from a single image, now you want to
00:24:18
figure out well which pairs of images
00:24:20
you know are related to each other. So
00:24:22
the the simplest most naive way is to
00:24:24
say well let me match every single image
00:24:25
to every single other one. Let me look
00:24:27
at all order n squared every single
00:24:30
combination of pairs of images that I
00:24:32
can imagine. And so that's what
00:24:34
exhaustive matching is doing. So
00:24:35
exhaustive matching like you said it's
00:24:37
great when you have sort of an unsorted
00:24:40
random collection of images and
00:24:41
especially it works well if you have you
00:24:43
know the order of a few hundred images
00:24:45
um you know because because it is doing
00:24:48
this you know every image to every other
00:24:50
image that quickly gets expensive in
00:24:52
terms of time like that's going to take
00:24:53
a lot of time to compute if you try to
00:24:55
do this on thousands of images you can
00:24:57
still do it you just have to wait a long
00:24:59
time but yeah it's it's great because
00:25:00
it's going to try to discover every
00:25:02
single pair of matching images that it
00:25:05
Mhm. And so that's where then the
00:25:06
sequential is nice if you have something
00:25:08
like you said there in the fountain
00:25:10
sequence where you know hey these are
00:25:12
you know frames from a video or my
00:25:14
images you maybe I was taking photos but
00:25:16
I I'm taking them in order like oh I
00:25:18
started here took a photo took a few
00:25:20
steps took another photo took a few more
00:25:22
steps took another photo and so there is
00:25:23
some sort of sequential information to
00:25:26
those photos you know that images taken
00:25:29
near each other in that list show
00:25:31
similar content and that's what
00:25:33
sequential it'll leverage leverage that
00:25:35
information to help the the matching be
00:25:36
more efficient. And then I don't really
00:25:39
understand vocab tree. I do know that if
00:25:41
you want to do an exhaustive style
00:25:43
match, not sequential, but you have
00:25:45
let's say 800 images, I've always heard
00:25:47
use a vocab tree. Yeah. Yeah, that
00:25:50
that's exactly right. So the vocab tree,
00:25:53
you might heard like it's a vocabulary
00:25:55
tree or image retrieval style matching.
00:25:57
Yeah. What it's doing behind the scenes
00:25:59
is is it uses a image lookup data
00:26:04
structure. So it takes all the images,
00:26:06
comes up with a really compact
00:26:09
summarization of the kinds of things
00:26:11
that are in each image and then provides
00:26:13
a way that I can say, hey, for this
00:26:15
given image, what other images in my
00:26:18
data set are likely to have the same
00:26:21
kinds of things in them. you know, it's
00:26:23
not a guarantee, but it just says, you
00:26:25
know, if I'm I have one image and I've
00:26:27
got 10,000 other images I can match to,
00:26:29
I can ask it, well, hey, I don't want to
00:26:31
look at all 10,000. Can you at least
00:26:33
give me a sorted list of the ones that
00:26:35
are most likely to match? And so that's
00:26:37
what the vocab tree option does for you
00:26:38
is it returns that ranked list and then
00:26:41
so instead of matching all 10,000, I can
00:26:43
choose to match the best 50 or the best
00:26:45
100 or whatever my threshold. Speed up.
00:26:48
Yep. It's more efficient. Yeah. Um, once
00:26:51
you get beyond three to 400 images,
00:26:55
exhaustive should not be your option.
00:26:57
You should go to vocab tree unless
00:26:59
they're all sequentially taken. And then
00:27:01
always use sequential. Well, not always,
00:27:02
but that's that's probably your default.
00:27:04
So, if I'm taking a video and then
00:27:06
extracting images, sequential is always
00:27:09
going to work. Well, always going to be
00:27:10
your first option if you want to be as
00:27:12
fast as possible. And so, and then and
00:27:14
then in here, you can I know you can you
00:27:16
can uh pick loop detection. So, it's
00:27:18
trying to we've talked about that
00:27:19
before, right? is it's trying to detect
00:27:21
have you come back to an area correct
00:27:23
and and and that will do it using the
00:27:26
vocab tree option like so if I do loop
00:27:28
detection so under the sequential tab if
00:27:31
I do loop detection and then specify a
00:27:33
vocab tree path there at the bottom that
00:27:37
will enable it to say oh as I'm
00:27:39
processing through all those video
00:27:40
frames you know every 10th frame or
00:27:42
every 50th frame or every 100th frame
00:27:43
whatever you set it to you can have it
00:27:46
go and then do a vocabulary tree
00:27:48
retrieval do that image retrieval step
00:27:50
to try to discover loop closures within
00:27:53
within some of that uh that okay so we
00:27:56
have these options I always just say and
00:27:58
then there's spatial and transitive we
00:28:00
haven't talked about that does spatial
00:28:02
have to do with GPS exactly right so it
00:28:04
just says you know for each image
00:28:05
assuming if the images have embedded uh
00:28:08
geo tags so GPS data embedded in the
00:28:10
EXIF it will say for each image just
00:28:13
find other images with similar GPS and
00:28:15
match to those yes I love that a lot of
00:28:18
people here listening probably are
00:28:20
taking drone images and spatial is the
00:28:24
one I always use. That's a great option
00:28:26
because a lot of times that drone is
00:28:27
looking straight down or you know it's
00:28:29
not looking at completely random
00:28:31
directions but there is some order and
00:28:33
structure to that drone data and so that
00:28:36
and in fact a lot of the drones that
00:28:37
people are using nowadays have a really
00:28:39
good GPS on it. thinking of the
00:28:42
enterprise versions of like a DJI drone
00:28:45
are getting really good GPS. Even even
00:28:47
without a RTK attachments, it's not
00:28:50
going to it's not going to throw a bunch
00:28:52
of air into there. And then what's
00:28:53
transitive? That's the one I don't think
00:28:55
I've ever touched. I don't even know
00:28:56
what that means. Yeah, that just that's
00:28:58
a way to densify a set of existing
00:29:01
matches. So suppose you had gone and run
00:29:04
one of the existing modes. see ran.
00:29:06
Okay, maybe not exhaustive, but like if
00:29:08
you had ran sequential or ran your
00:29:11
spatial or ran your vocab tree, but then
00:29:14
you wanted to go back and create a a
00:29:16
more complete set of connections between
00:29:18
images. What transitive will do is it'll
00:29:20
look at your database and it'll say,
00:29:22
"Hey, if image A matched to B and image
00:29:26
B matched to image C, but I didn't try
00:29:29
to match image A directly to C, let me
00:29:31
go ahead and do that now." And so it
00:29:33
goes back and finds these transitive
00:29:35
links between images and attempts to do
00:29:38
that matching. And so what that does
00:29:39
that just creates a stronger set of
00:29:40
connections between images which will
00:29:43
help it call map out during the
00:29:44
reconstruction phase. Okay. So that I
00:29:46
feel like this gives me a good idea then
00:29:48
of or the the
00:29:49
listener/viewer an idea. There's
00:29:52
different options. Pick the one that
00:29:54
makes sense for the data set you have.
00:29:56
You might get the best results out of
00:29:59
exhaustive as far as air, but you might
00:30:01
be waiting a day. Heard people say, "I
00:30:04
set this and now it's telling me it'll
00:30:06
be ready in 28 hours." Well, probably
00:30:08
not the right mode. You probably used a
00:30:09
vocab tree, but you know, I always say
00:30:12
find the right one. Start with
00:30:14
sequential. If you have sequential
00:30:15
images, at least you probably get good a
00:30:17
good result there. I also want and just
00:30:20
to mention it back you know in the
00:30:22
diagram under the corresponding search
00:30:24
you know they do break it down versus
00:30:26
the feature extraction feature matching
00:30:28
and then geometric
00:30:29
verification that geometric verification
00:30:32
those options show up on that matching
00:30:35
those matching settings screens that we
00:30:36
just saw for each of those tabs at the
00:30:39
bottom there was the general settings or
00:30:41
general options and a lot of those
00:30:43
general options are related to geometric
00:30:46
verification saying when I'm matching
00:30:49
these points and I want to then verify,
00:30:51
you know, what sort of pixel error do I
00:30:53
expect or what is the minimum number of
00:30:55
inliers or an inlier ratio and so that
00:30:58
those inliers are the number of
00:31:00
geometrically verified matches between a
00:31:02
pair of images. And so that's that's
00:31:04
where geometric verification kind of
00:31:06
comes into play within this call map
00:31:08
workflow. Okay. So just move this along.
00:31:11
Then I do want to point out I'm going to
00:31:13
show call map one more time. At this
00:31:15
point, you've ran both your feature
00:31:17
extraction and feature matching. You
00:31:19
will still see nothing on your screen.
00:31:21
Well, you will see logs, but you will
00:31:22
not see these camera poses, which I
00:31:24
have. So, I have a point I have this
00:31:26
sparse point cloud. I have these red
00:31:28
camera positions around it, and none of
00:31:31
this shows up because at this point, we
00:31:33
haven't we haven't created a point
00:31:35
cloud. We haven't projected anything
00:31:37
yet. So, we're moving from
00:31:40
correspondence search to, if I bring up
00:31:42
that diagram one more time, we're moving
00:31:43
on to incremental reconstruction, and
00:31:45
that's where we start to see fun things
00:31:47
happening on a cool map uh guey screen.
00:31:50
If you're running on a guey, you'll
00:31:51
start to see camera poses show up. So,
00:31:54
the first step is initialization. What
00:31:56
is that? So, is that just just starting?
00:31:59
Yeah, that's what it is. I mean, it's
00:32:00
it's the starting process for this
00:32:03
incremental reconstruction. So
00:32:06
incremental reconstruction is just one
00:32:08
style to attempt to do 3D
00:32:10
reconstruction. And so the the core idea
00:32:13
here is that you know like you said we
00:32:16
don't have any in 3D information yet. So
00:32:18
we're going to start with the minimum
00:32:19
amount that we need which is a pair of
00:32:21
images. So let's start with a pair of
00:32:22
images and then figure out what is their
00:32:25
3D relationship you know between those
00:32:27
images as well as what 3D points did
00:32:30
they see in the scene. And so we're
00:32:32
going to create this two view
00:32:33
reconstruction. take that pair of
00:32:34
images, triangulate an initial set of 3D
00:32:36
points, and then we use that as the
00:32:39
initialization for the rest of the
00:32:41
reconstruction. And so everything after
00:32:43
that is going to figure out, well, based
00:32:44
on these initial two images and some
00:32:46
points, how can I add a third image to
00:32:49
that? And how does it relate? And now
00:32:50
that I have these three, how can I add a
00:32:52
fourth and then a fifth and a sixth? And
00:32:54
so you just keep adding images one at a
00:32:56
time to grow a larger and larger
00:32:59
reconstruction. But initialization is
00:33:01
just what is that initial pair? Which
00:33:04
two images am I going to start with to
00:33:07
build this entire reconstruction?
00:33:09
Okay. And then and then it kind of goes
00:33:11
into a circle. So if you look at this, I
00:33:13
say circle the the diagram on the screen
00:33:15
shows image registration, triangulation,
00:33:18
bundle adjustment, outlier filtering,
00:33:20
and then if you follow the lines, you
00:33:22
notice you're really doing a loop. Yep.
00:33:24
So it's looping through that process.
00:33:26
And then also this dashed line showing
00:33:29
reconstruction. So it's kind of probably
00:33:30
looping through that and adding to the
00:33:33
reconstruction while it's going or Yep.
00:33:36
Okay. Exactly right. Exactly right. So
00:33:38
it's it's that initialization that picks
00:33:41
the first pair of images. But as but
00:33:43
once I have my pair of images now I'm
00:33:46
going to enter in this loop that starts
00:33:49
with image registration. So image
00:33:50
registration is is a fancy name to say
00:33:53
how does a new image how can I add a new
00:33:55
image to my existing reconstruction. And
00:33:58
so what it's going to look at is based
00:34:01
on the 3D points that have already been
00:34:03
triangulated. It's going to ask what's
00:34:06
the best next image in my data set that
00:34:10
also saw those points. And then if um
00:34:13
and once I find that image you know via
00:34:15
via the set of feature matches. So we
00:34:17
say you know uh if if I've matched image
00:34:19
one and two and triangulated that well
00:34:21
two image two matched to image three
00:34:23
well then image three is seeing the same
00:34:25
points in the scene. So let me add image
00:34:27
three and so there it's a 2D to 3D
00:34:30
registration process 2D 3D pose
00:34:32
estimation process where I take the 2D
00:34:34
points in that third image and I want to
00:34:37
align those 2D points with the 3D points
00:34:40
that have been triangulated. So you
00:34:41
might hear that as image registration or
00:34:44
perspective endpoint problem, pose
00:34:47
estimation. There's a few different
00:34:48
words for what this process is, but
00:34:50
you're adding a new image to the
00:34:52
reconstruction. And so that's the image
00:34:54
registration step. I do know when I ran
00:34:56
this um I can always take a a video and
00:34:59
kind of project onto this in post. But
00:35:02
when it's creating this reconstruction,
00:35:04
instead of taking image one and then
00:35:08
image two and then image three and kind
00:35:11
of building off that, I'll notice it'll
00:35:13
pick, if you look at my if if you're
00:35:15
watching this on video, you'll notice I
00:35:17
took two loops and some of the images
00:35:19
are like right above each other almost
00:35:20
where I held the phone at like above my
00:35:22
head and then I held it down at chest
00:35:23
level. So I have two loops and there's a
00:35:25
lot of common key points, common
00:35:27
features. So, as it's building this up,
00:35:30
it started at this kind of where I
00:35:31
started walking around this this
00:35:33
fountain, but it's using images from
00:35:35
further along in the video extraction or
00:35:38
sorry, the images I had. So, it use like
00:35:40
image one and image 180 because those
00:35:45
are next to each other and had a lot of
00:35:47
strong feature matches. So, they're not
00:35:48
necessarily using images in sequence of
00:35:50
how you took them. It's ones that had
00:35:52
strong correlation.
00:35:54
That's a great point. That's a great
00:35:56
point. Yeah, it it isn't just going to
00:35:57
go, you know, 1 2 3 4 5 6, you know,
00:36:00
it's not going to do them in order, you
00:36:01
know, it's going to start that pair of
00:36:03
images. It's going to look through all
00:36:04
of the images in your collection and
00:36:07
find the pair. And it might not be the
00:36:08
consecutive pair, but find the pair of
00:36:10
images, you know, that maximizes some
00:36:12
criteria. You know, it's a pair of
00:36:14
images that has strong connectivity. So,
00:36:16
there were a lot of feature matches, but
00:36:18
I also want to make sure that that pair
00:36:19
of images has, you know, differences in
00:36:22
viewpoint. I don't want two images that
00:36:24
were taken at the exact same position in
00:36:26
space because that gives me no 3D
00:36:29
information. I need, you know, we talked
00:36:31
about this in the last episode, this
00:36:32
concept of a baseline. I need some sort
00:36:34
of translation. I need some motion
00:36:36
between two images or maybe it was in
00:36:38
our depth map depth map episode, you
00:36:41
know, we talked about this, you know, in
00:36:43
that we need motion between images in
00:36:45
order to estimate depth. So the
00:36:47
initialization could look for the same
00:36:48
thing. I want it wants lots of matches
00:36:50
between the image, but it also wants a
00:36:52
strong amount of motion between that.
00:36:54
So, it's going to pick whichever pair of
00:36:56
images maximizes those that criteria and
00:36:59
once it has that, then it'll start
00:37:01
adding other images that are strongly
00:37:04
connected to those initial ones. And
00:37:05
yeah, it won't necessarily do it in
00:37:07
order that you capture those images. It
00:37:09
can be in the order in which those
00:37:10
connections are strongest. And I I I was
00:37:13
seeing mostly you were I was seeing like
00:37:16
the first photo and then somewhere
00:37:17
further along where I came and did a
00:37:20
loop. I saw those two photos start
00:37:22
together because I think there was more
00:37:24
as we were talking about a baseline was
00:37:26
was better. There was more parallax
00:37:28
because I have these are pretty closely
00:37:30
spaced images I took from picture to
00:37:32
picture. So not a lot has changed versus
00:37:34
the next loop I have a I'm looking the
00:37:36
exact same part of the fountain but I
00:37:38
have a different elevation and angle. So
00:37:40
there's a lot of parallax movement
00:37:42
between those those images. So it it was
00:37:45
it was matching those better as opposed
00:37:48
to image one to image two. It's more of
00:37:50
image one to image 180 because of that
00:37:52
baseline was probably better. So you get
00:37:54
to the fun thing is when you run this in
00:37:56
the guey, this coal map, you get to
00:37:57
watch those build and you get to see the
00:37:59
point cloud just start to generate in
00:38:03
front of you and you get an
00:38:04
understanding then of what it's doing in
00:38:06
these logs that are looping through this
00:38:08
process over and over. And you can kind
00:38:10
of see it just iteratively add to the
00:38:12
scene and build and refine. When it's
00:38:14
doing this incremental reconstruction,
00:38:17
is it refining the camera poses as it
00:38:19
goes or is it just saying, "Here's the
00:38:21
camera poses. There's where they are."
00:38:24
No, there's there's refinement. There's
00:38:26
refinement. And a lot of times that
00:38:27
refinement is is called bundle
00:38:29
adjustment. That's that's a key word
00:38:31
that's used commonly in the literature.
00:38:32
I remember the first time I heard the
00:38:33
word bundle adjustment. I was a first
00:38:35
year grad student and I had no idea what
00:38:38
the person was talking about. I was
00:38:39
like, "What? A bundle of sticks? A
00:38:41
bundle of what? A straw? What is going
00:38:43
on?" Um, but no, b a bundle adjustment.
00:38:45
So, it's the idea of refining the 3D
00:38:49
points as well as the camera positions.
00:38:51
And so you end up with just a bundle of
00:38:53
constraints, you know, a bunch of
00:38:54
constraints saying, you know, these 2D
00:38:56
points in these images all triangulate
00:38:58
and all saw the same 3D point in the
00:39:00
scene, but I've got a bunch of images
00:39:01
and I've got a bunch of points. How can
00:39:04
I optimize the alignment of all of this
00:39:07
data? And that's what bundle adjustment
00:39:09
is. So yeah, so as call map is running,
00:39:13
it's it's doing that image registration
00:39:15
process. It'll add a new image. It then
00:39:17
runs triangulation which creates new 3D
00:39:20
points based on that new image and other
00:39:22
images that are already there but then
00:39:24
it'll do bundle adjustment which will
00:39:25
say how can I refine that and there's
00:39:28
two styles of bundle adjustment that I
00:39:30
believe call map uses one of them is
00:39:32
local bundle adjustment the other is
00:39:33
global so a lot of times what you will
00:39:35
see is you know suppose suppose we had
00:39:38
already reconstructed a thousand images
00:39:40
and we're adding that a thousand in
00:39:42
first um when I add that thousand first
00:39:45
you know trying to do bundle adjustment
00:39:46
using all thousand images that takes a
00:39:48
long time. Um, and so I can re we
00:39:52
recognize that well that first image
00:39:54
that that that thousand first that next
00:39:56
image that I'm adding, you know, well,
00:39:57
it's off in the corner of the
00:39:58
reconstruction, you know, it's far away
00:40:00
from the other side of the
00:40:01
reconstruction. You know, these these
00:40:02
things aren't really related to each
00:40:04
other. So, I can run a local bundle
00:40:06
adjustment. Let me just optimize only
00:40:08
those cameras and points that are near
00:40:11
that new image that I just added or
00:40:13
those new points that I've triangulated.
00:40:15
And so, that's a way to sort of do this
00:40:16
local refinement. And I can do that
00:40:18
every single time I add a new image. And
00:40:21
then periodically, com will run a global
00:40:24
bundle adjustment. So there's some
00:40:25
settings there. I think every, you know,
00:40:27
once the reconstruction is increased in
00:40:29
size by 10% or you've added every, you
00:40:31
know, 500 images or something, there's
00:40:33
certain criteria, especially at the end
00:40:35
of the reconstruction, homeup will run a
00:40:37
global bundle adjustment which says,
00:40:39
let's optimize everything. Let's
00:40:41
optimize the points. Let's optimize the
00:40:44
camera poses. And something we haven't
00:40:46
mentioned is it will also be optimizing
00:40:48
the camera parameters. So back when we
00:40:51
picked that camera model and we said,
00:40:52
"Oh, you know, we're going to use a
00:40:53
camera model that has a focal length
00:40:55
term and a principal point CX and C Y or
00:40:57
maybe has some radial distortion terms."
00:40:59
During bundle adjustment, COM app will
00:41:01
also be optimizing those parameters as
00:41:04
well to figure out well what is the
00:41:05
field of view of my camera that's the
00:41:07
focal length or how much lens distortion
00:41:09
was there in order to achieve that line
00:41:12
of. Would it run those? if you cuz we
00:41:15
didn't cover this earlier on, but let's
00:41:18
say you do have a camera model
00:41:21
uh calibration file. So, you're saying I
00:41:23
know this. I think DJI's in their again
00:41:27
in their enterprise level drones will
00:41:28
give you this information on their
00:41:31
lenses cuz they've been calibrated and
00:41:33
it's in the XF data. Will will that
00:41:35
change? Does it do like a refinement on
00:41:37
top of that or does it just say no, no,
00:41:39
no, you give us that, we won't change
00:41:40
that. That's that's an option. So I
00:41:43
think under the either under the
00:41:44
reconstruction options or under the
00:41:46
bundle adjustment options there are ways
00:41:47
to say hey do I want to refine my focal
00:41:50
length you want to refine you know my
00:41:52
distortion terms. Um so you could you
00:41:55
know enable or disable that setting. To
00:41:57
that point I do believe you know that
00:41:59
call map will parse the XF data in those
00:42:02
images and if it sees that yeah there is
00:42:03
a focal length cuz a lot of times an
00:42:05
image will you know will contain you
00:42:07
know that oh this was taken with a 10 mm
00:42:09
lens or a 24 mm lens you know and so
00:42:12
call map can parse that data to take an
00:42:14
initial guess at what it thinks that
00:42:16
focal length is you know what's the
00:42:17
field of view of the camera and can use
00:42:18
that as initialization. But a lot of
00:42:21
times there is benefit to refine that um
00:42:23
because it may it may be make it you
00:42:25
close but not might not be close enough
00:42:28
to get a really sharp
00:42:30
reconstruction. So okay so I got a lot
00:42:32
more appreciation for what's happening
00:42:34
here. I tell people run this on their
00:42:36
computer. You don't need the highest
00:42:38
spec computer to run a small data set
00:42:40
and learn how this works. I ran this on
00:42:42
my older computer which doesn't have you
00:42:45
know 24 cores or anything and it still
00:42:47
ran fairly quick. I'd say there's
00:42:49
there's some things you gave me some
00:42:51
notes. I think we covered largely most
00:42:53
of it. But then from here, you can do
00:42:55
things. So, I've ran this through. You
00:42:58
can hit automatic reconstruction. It'll
00:43:00
create all this, but then you can hit
00:43:01
bundle adjustment, which is that global
00:43:03
one at the end. And then you can build a
00:43:05
dense reconstruction, which we're not
00:43:06
really going to cover on this episode.
00:43:08
This is just kind of like here's how we
00:43:10
got that that workflow I showed to get
00:43:12
the camera poses, the sparse point
00:43:14
cloud, and then from there, you can use
00:43:15
it for more downstream tasks, right? So
00:43:17
I could use this for again doing a dense
00:43:20
3D reconstruction where you're going to
00:43:21
I want to get millions of points on this
00:43:23
scene or I can use this as the basis for
00:43:27
initializing 3D god and splatting.
00:43:29
There's just different things you can
00:43:31
use once you got camera positions and a
00:43:34
point cloud spar sparse point cloud. I'm
00:43:36
showing also on my screen I didn't talk
00:43:38
about you have these kind of magenta
00:43:40
lines. This is showing kind of your
00:43:42
these images matched. If I clicked on
00:43:44
double clicked on one, it'll it'll show
00:43:47
that kind of that information of the key
00:43:49
points and which ones matched to it. But
00:43:51
you can just click around and and and
00:43:52
learn things. Double click on different
00:43:54
parts of the scene. It'll show you the
00:43:57
point and which which different cameras
00:43:59
made up that point. And it's a good tool
00:44:02
to kind of learn how this works because
00:44:03
it's very visual on the screen. Lots of
00:44:06
data, lots of options. You can even
00:44:09
create animations in this if you really
00:44:11
want to show off what you learned. There
00:44:12
is one thing we didn't really talk
00:44:14
about. Well, there's a couple things.
00:44:15
So, incremental reconstruction. Everyone
00:44:17
always complains. I got the newest GPU.
00:44:19
This should be really fast. Why is this
00:44:22
running so slow? My GPU is not even
00:44:23
being used and it says it's taking 5
00:44:26
hours to run my thousand image data set.
00:44:29
Why is that? Why can't we use a GPU for
00:44:30
this incremental reconstruction? Or I
00:44:32
know we can, but why can't we in co map
00:44:34
the way it's configured? Yeah. Yeah.
00:44:36
Because coal map Yeah. A lot of these
00:44:38
algorithms are not easily to parallelize
00:44:41
on a GPU. So a GPU works well when
00:44:44
you're doing the exact same operation on
00:44:46
millions of things, you know, cuz that's
00:44:48
what a GPU does. Its job is to draw
00:44:51
pixels to a screen, you know, on your on
00:44:53
your monitor on your desktop. And so
00:44:55
you've got millions of pixels on your
00:44:56
screen. And so that GPU is processing a
00:44:58
million pixels at once and figures out
00:45:00
what to draw. And so for tasks like
00:45:03
feature extraction where hey I've got a
00:45:05
again millions of pixels and I want to
00:45:07
figure out which ones have features in
00:45:08
them. GPU is great or feature matching.
00:45:12
I've got tens of thousands of features
00:45:13
in one image, tens of thousands of the
00:45:15
other. I want to figure out which
00:45:16
features match with each other. Then
00:45:18
again that's great for a GPU. for
00:45:20
incremental reconstruction. It's like
00:45:23
I'm operating on one image at a time and
00:45:25
I have to just solve a math equation and
00:45:28
do some, you know, linear algebra to
00:45:30
figure out what's the 3D position or
00:45:32
pose of that image. That's not a very
00:45:34
paralyzable task. And so it's it's not
00:45:37
very easy to uh adapt some of these
00:45:40
algorithms to the GPU. I will say in the
00:45:42
another thing too that contributes to it
00:45:44
is COMAP is very uh flexible. There's a
00:45:48
lot of algorithms, a lot of switches, a
00:45:50
lot of different techniques that you can
00:45:51
use and to implement all of those on the
00:45:54
GPU would just take a lot of time. It's
00:45:56
nice having software that's flexible.
00:45:58
You know, with Clap being open source, a
00:46:00
bunch of people contributing to it, it's
00:46:02
nice having a flexible platform where
00:46:04
people can easily dive in, make changes,
00:46:08
add their own algorithm, plug it in,
00:46:10
tweak things, and play with it. So
00:46:11
having that having that sort of more
00:46:13
general purpose CPUbased implementation
00:46:16
is is helpful. But yeah, to get back to
00:46:18
the core, it really is primarily just
00:46:19
around the algorithms. A lot of these
00:46:21
algorithms are not parallelizable or or
00:46:24
not well suited for processing on a GPU.
00:46:27
That makes sense. I someone I once
00:46:29
explained it or someone was trying to
00:46:31
explain it. It's like your CPU is a
00:46:32
really good detective at solving clue by
00:46:35
clue one thing at a time versus GPU.
00:46:38
It's like it can just point out all the
00:46:39
clues all at once. Yeah. But you really
00:46:41
need that like hard math equation. and
00:46:44
you need a really fast cores to trying
00:46:46
to solve those things one at a time and
00:46:48
it's incremental. So think about it.
00:46:50
It's like you can't you can't solve all
00:46:51
these all at once as is. So that's
00:46:54
something that people just have to keep
00:46:55
in mind that don't get frustrated. It's
00:46:58
just how this technology works today.
00:47:00
And there's glow map. So how does glow
00:47:01
map make this all a sudden magically
00:47:03
fast? Yeah. So glow map is a different
00:47:06
style for that reconstruction process.
00:47:09
So glow map deals with global mapper you
00:47:12
know. So global reconstruction versus
00:47:15
incremental reconstruction. So instead
00:47:16
of here in colap we just talked about it
00:47:18
uses an incremental reconstruction adds
00:47:20
you know one image at a time whereas
00:47:23
global reconstruction it tries to figure
00:47:26
out the 3D poses of all of the images
00:47:29
all at once. So glow map still has that
00:47:32
same correspondent search step. So to
00:47:34
run glow map you still got to extract
00:47:36
key points extract features from your
00:47:38
image. You got to match them. Got to run
00:47:39
your geometric verification. But once
00:47:42
you have that web of connectivity
00:47:44
between your images, you can then run
00:47:46
global reconstruction techniques. And so
00:47:49
there's a few different steps there. In
00:47:51
glow map, they run rotation averaging
00:47:54
first. So the idea with that is that you
00:47:57
look at all of the feature matches
00:47:59
between your pairs of images. For each
00:48:02
pair, you estimate how much rotation
00:48:05
occurred between that pair of images,
00:48:07
you know. So that gives you a
00:48:08
constraint. But now if I look at all of
00:48:10
the rotations that I estimated between
00:48:12
all of the pairs, can I come up with a
00:48:15
consistent orientation for all of my
00:48:17
images that satisfies each of those
00:48:19
pair-wise constraints? So, can I arrange
00:48:22
the orientations of my images so that
00:48:24
all of those pair-wise rotations make
00:48:26
sense? And that's what rotation
00:48:28
averaging does. So, it's not even
00:48:30
looking at position. It's just trying to
00:48:32
rotate all of the images. And once
00:48:34
they're rotated in 3D space, then it
00:48:36
does a global positioning step which
00:48:39
simultaneously solves both the camera
00:48:41
positions as well as some of the 3D
00:48:43
points. And so it kind of throws all of
00:48:45
the cameras into a big soup, a big mess.
00:48:47
It gives them a bunch of random
00:48:49
initializations and then defines these
00:48:51
constraints saying, well, these images
00:48:53
saw these common points. How can I
00:48:56
rearrange all of these images so that
00:48:59
they line up and see those common
00:49:01
points? So it's it's similar to bundle
00:49:03
adjustment. So that the idea of take a
00:49:05
bunch of images that see points and
00:49:07
refine it, but uh it uses a different
00:49:10
formulation, a different set of
00:49:11
constraints that is better suited to,
00:49:15
you know, random unknown camera
00:49:16
positions. And so that's this global
00:49:18
positioning sol problem that they solve.
00:49:20
So that gets you pretty pretty so once
00:49:22
you've run your rotation averaging, your
00:49:23
global positioning, you get a
00:49:24
reconstruction that's pretty close. And
00:49:26
then you can run bundle adjustment, you
00:49:29
know, an actual high quality refinement
00:49:31
using bundle adjustment. And then you
00:49:33
have your your 3D reconstruction. So it
00:49:35
skips a lot of this incremental slow
00:49:37
process that wasn't parallelizable. The
00:49:39
rotation averaging uh and global
00:49:41
positioning that's a little better
00:49:42
suited to parallelization and is is more
00:49:44
efficient because you're not having to
00:49:45
do this one after the other after the
00:49:47
other. Yeah. And I have it on my screen
00:49:49
here, the project page where it kind of
00:49:51
showed you were talking about. And this
00:49:53
last showing it all happening all at
00:49:55
once where it just kind of all just kind
00:49:58
of resolves at once. I do want to say
00:50:01
that it it's something it's it to me
00:50:03
it's there's a low a low what's the
00:50:06
right words? It's not you're not going
00:50:08
to be wasting a lot of your time to give
00:50:10
this a shot to see if this works well
00:50:12
for your project because you don't have
00:50:14
to wait a lot of time for it to do the
00:50:16
incremental reconstruction. So, it
00:50:17
doesn't work well with all scenes as I
00:50:19
found, but because you know within
00:50:22
minutes if it's going to work well or
00:50:23
not, it's worth a shot and you get to
00:50:26
learn what scenes work well with it.
00:50:27
You've done some tests as well, Jared.
00:50:30
You kind of you can't get too tied in on
00:50:31
a bunch of little things. I feel like
00:50:33
you need a more of a global view or a
00:50:35
you know, the the example images have a
00:50:37
lot of features and aren't really close
00:50:39
tied in on little features in a scene.
00:50:43
Mhm. Yeah. You want from my for my
00:50:46
experience with glow map and and other
00:50:48
global structure for motion, global
00:50:49
reconstruction techniques, they work
00:50:52
best when you have a lot of connections
00:50:55
between your images. Mhm. So it's not
00:50:58
you just walking through a cave or
00:51:00
walking down, you know, a city street
00:51:01
and never returning back. It likes a lot
00:51:04
of loop closures. It likes a lot of
00:51:06
connectivity, a lot of different vantage
00:51:08
points and overlap and diverse content.
00:51:10
And so it it takes the strength of those
00:51:14
diverse and dense connections and very
00:51:17
quickly figures out how to arrange them
00:51:18
to produce that final reconstruction.
00:51:20
And that's probably why in my experience
00:51:22
when I have these more broader view
00:51:24
shots, it works well because I have a
00:51:25
lot of connections. I have a lot of
00:51:28
unique features and you get too close in
00:51:31
on one little object or you have a lot
00:51:33
of like I think inside I've done some
00:51:35
indoors that haven't turned out because
00:51:36
you have a lot of just blank white walls
00:51:38
with not a lot of features. So, it's
00:51:40
just not able to do that. So, all right.
00:51:43
Well, this is something I say I had on
00:51:44
my screen just to to kind of show some
00:51:47
examples. If you're listening, I I will
00:51:49
make sure I'll link in the show notes as
00:51:50
well. Glow map and coal map, but glow
00:51:53
map's an interesting one you can look
00:51:55
at. It's it it drops on top of coal map.
00:51:59
So, even get it running isn't like a
00:52:00
large lift. And you see Johannes in the
00:52:03
the list of names. So, you can see he's
00:52:05
still working on these things. I think
00:52:07
this is interesting because it does make
00:52:08
things go faster. And if you look in the
00:52:10
results that they are in the same range
00:52:13
of accuracy as you get with incremental
00:52:15
reconstruction using coal maps. So it's
00:52:17
not saying well this is fast but it's
00:52:18
not nearly as good. It's fast and is
00:52:20
good if you have a good result but you
00:52:22
find out really quick because I've
00:52:23
noticed that the results either are
00:52:25
absolutely all over the place or you
00:52:27
have a really good sparse point cloud
00:52:29
and so you know if it's good or not. In
00:52:31
fact, you'll see cameras all over the
00:52:33
place where everything's kind of like
00:52:34
this weird looking cube and and that's
00:52:37
how you know it didn't work. But you
00:52:39
will know based off of your output.
00:52:40
Yeah, I've gotten a few B bork I say
00:52:43
Borg cubes, that's what I think they
00:52:44
look like, but I think I've gotten a few
00:52:46
cubes in my uh Yeah, and as my results.
00:52:49
All right. Well, I think we covered I
00:52:51
think we covered this all really well. I
00:52:53
hope at the end of this people will go
00:52:56
try coal map or go I mean even if they
00:52:58
use other software it will follow
00:53:01
relatively the same sort of process. I
00:53:04
don't think you could maybe there's
00:53:05
other ways it's done. I'm sure there is,
00:53:06
but this is the standard kind of method
00:53:09
that most at least follow this sort of
00:53:12
style. And now there's all this machine
00:53:14
learning stuff that's different. But as
00:53:15
far as classical 3D reconstruction from
00:53:18
imagery, this is a very well-known and
00:53:20
reused pipeline for a lot of projects.
00:53:24
Yeah. And it's a great, like you said,
00:53:25
like just go and try that. That's that's
00:53:27
I can't stress that enough. Just just
00:53:28
try it. you know, if if you're either
00:53:31
one just get, you know, new to computer
00:53:33
vision and want to understand how 3D
00:53:34
reconstruction works, you know, or maybe
00:53:36
you kind of understand it but don't
00:53:37
under, you know, but want to get a
00:53:39
better insight of how things work behind
00:53:41
the scenes. A tool like call map is
00:53:43
great just to, you know, throw some
00:53:44
images at it, run a reconstruction, and
00:53:46
then start poking around. There's a lot
00:53:48
of neat visualizations that Jonathan
00:53:50
showed where you can look at a point and
00:53:51
see which image is solid or in an image,
00:53:54
what did it match to. There's other
00:53:56
debug visualizations where you can look
00:53:57
at sort of the match graph or the match
00:53:59
matrix and see how uh the different
00:54:02
patterns or ways that images are
00:54:04
matching to each other. So, it's it's a
00:54:06
nice way to get in get your hands dirty
00:54:09
and see how this process of turning
00:54:12
pixels to 2D information to final 3D
00:54:15
results, you know, and and that mapping
00:54:17
from, you know, 2D to 3D and all the uh
00:54:20
information that goes into that. So,
00:54:21
it's a great way to get in there and get
00:54:22
an intuition for how this all works
00:54:23
behind the scenes. Yes, definitely. And
00:54:26
I would say the most important part when
00:54:29
you're trying to run this is picking the
00:54:31
right matching strategy because that can
00:54:33
be the that can be the difference
00:54:34
between waiting hours and an hour or
00:54:37
minutes. So, well, thanks Jared for this
00:54:39
episode and kind of covering all this
00:54:42
stuff. I hope this was tangible enough
00:54:44
for people to go try it and having the
00:54:46
visuals up. So, if you're listening, go
00:54:48
find this video on the EveryPoint
00:54:51
YouTube channel. We have a playlist of
00:54:53
all of our episodes. I'll make sure. I
00:54:56
haven't named it yet, but I'm sure
00:54:57
Colemap will be in the name. It'll be uh
00:55:00
I can't remember what episode we're on,
00:55:01
but it's like 15 or 16. You will see
00:55:04
that it's a great it's a it'll be a
00:55:07
great way for you to learn this if
00:55:08
you're if you're getting into there, cuz
00:55:09
I see every day I didn't go over these,
00:55:11
but we have questions I see every day
00:55:13
either on my videos or on Reddit or
00:55:18
Discord. There's these different
00:55:19
communities that are all using projects
00:55:21
that require coal map to run to start
00:55:24
think 3D gods been splatting and it's
00:55:26
just obvious that this is something that
00:55:28
people just know they have to use but
00:55:30
have no idea what's happening. They just
00:55:32
know they threw a bunch of images at it
00:55:34
and something came out and then they're
00:55:36
going to do something else with it. But
00:55:38
they have no appreciation for the
00:55:40
sausage making of coal mapping. If you
00:55:43
know what each step is, you can get
00:55:45
better results in my opinion. Just play
00:55:47
with it. see what works, learning what
00:55:49
those different options are. If you
00:55:51
don't know what an option is as well,
00:55:52
jump on our YouTube channel, ask a
00:55:54
question. I will be watching and trying
00:55:56
to respond as intelligently as possible
00:55:59
on those and and give you a a good
00:56:01
answer. So Jared, any other parting
00:56:03
thoughts you want on this? You you said
00:56:04
go get give it a try. Any other tips you
00:56:07
would give people? Take good sharp
00:56:09
imagery. Take I just do it do it
00:56:11
yourself. Get out and try, you know,
00:56:12
take your own photos and see how see how
00:56:14
they turn out. Yeah, take your own
00:56:16
photos. Don't go use the like open-
00:56:18
source data sets because they know those
00:56:20
are going to work and you know those are
00:56:22
great for testing but not great for
00:56:24
learning on your own data. So right well
00:56:28
thank you and if again you're if you're
00:56:29
listening this will be on all major
00:56:31
podcast players please if you can
00:56:34
subscribe to our to our channel or to
00:56:37
one of our podcast episodes that'll mean
00:56:39
a lot to us know that we're making the
00:56:40
right content and that you guys care
00:56:42
about learning about this information.
00:56:44
And as always, let us know in the
00:56:46
comments as well on our YouTube channel
00:56:47
if there is something here that you
00:56:49
would like us to go deeper in. Maybe we
00:56:51
can get someone like Johannes on one of
00:56:53
these episodes to go super deep if you
00:56:56
want to. Anyways, well, thanks Jared for
00:56:58
being on this episode and I'll see you
00:57:00
guys in the next