Understanding 3D Reconstruction with COLMAP
Résumé
TLDRIn this episode of Computer Vision Decoded, the hosts discuss Structure from Motion (SfM) and 3D reconstruction using Colmap software. They explain the workflow from feature extraction to camera pose estimation and incremental reconstruction. Jared Heinley, a computer vision expert, elaborates on the importance of camera models, feature matching strategies, and geometric verification. The episode also contrasts incremental and global reconstruction methods, emphasizing the efficiency of Colmap and the newer Glowmap software. Listeners are encouraged to experiment with Colmap to gain practical experience in 3D reconstruction.
A retenir
- 📸 Colmap is an open-source tool for 3D reconstruction.
- 🔍 Feature extraction identifies unique landmarks in images.
- 🔗 Feature matching connects similar features across images.
- 🧮 Geometric verification ensures accurate matches between images.
- 🔄 Incremental reconstruction adds images one at a time.
- 🌍 Global reconstruction estimates poses for all images simultaneously.
- ⚙️ Bundle adjustment refines camera poses and 3D points.
- 🖼️ Good imagery is crucial for successful reconstruction.
- 🛠️ Experimenting with Colmap helps understand 3D reconstruction better.
- 📚 Tutorials and documentation are available on Colmap's website.
Chronologie
- 00:00:00 - 00:05:00
In this episode, the hosts introduce the topic of structure from motion and 3D reconstruction using Colmap, a free and open-source software. They aim to demystify the process of 3D reconstruction from imagery, with expert Jared Heinley explaining the workflow involved in obtaining camera poses and creating 3D models.
- 00:05:00 - 00:10:00
Jared shares his background with Colmap, detailing its origins and development by Johannes Schunberger during his time at UNC Chapel Hill. The discussion highlights the evolution of Colmap from earlier software focused on aerial photography to a more generalized tool for 3D reconstruction from various image collections.
- 00:10:00 - 00:15:00
The hosts discuss the initial steps in using Colmap, emphasizing the importance of understanding camera positions and the process of extracting images from a video or taking multiple photos from different angles to create a 3D model.
- 00:15:00 - 00:20:00
Jared explains the significance of feature extraction, where unique landmarks in photographs are identified to establish 2D relationships between images. This step is crucial for later 3D reconstruction, as it allows the software to track points across multiple images.
- 00:20:00 - 00:25:00
The conversation moves to the matching process, where the software identifies correspondences between features in different images. This involves geometric verification to ensure that matches make sense in the context of camera motion and scene geometry.
- 00:25:00 - 00:30:00
The hosts discuss the various matching algorithms available in Colmap, such as exhaustive, sequential, and vocab tree matching, and how to choose the right one based on the dataset and image collection strategy.
- 00:30:00 - 00:35:00
Incremental reconstruction is introduced, where the software builds a 3D model by adding images one at a time. Jared explains the initialization process and how the software determines which images to use based on feature matches and camera motion.
- 00:35:00 - 00:40:00
The episode covers the iterative loop of image registration, triangulation, and bundle adjustment, which refines both the 3D points and camera poses as new images are added to the reconstruction.
- 00:40:00 - 00:45:00
Jared clarifies the role of bundle adjustment in optimizing the alignment of 3D points and camera positions, and how it can be performed locally or globally depending on the reconstruction size and complexity.
- 00:45:00 - 00:50:00
The hosts briefly touch on Glowmap, a newer software that offers a global reconstruction approach, allowing for faster processing by estimating camera poses for all images simultaneously, contrasting it with Colmap's incremental method.
- 00:50:00 - 00:57:02
The episode concludes with encouragement for listeners to experiment with Colmap, emphasizing the importance of taking sharp images and understanding the reconstruction process to achieve better results.
Carte mentale
Vidéo Q&R
What is Colmap?
Colmap is an open-source software for 3D reconstruction from images, allowing users to estimate camera poses and create 3D models.
What is the first step in 3D reconstruction using Colmap?
The first step is feature extraction, where unique landmarks in the images are identified.
How does Colmap handle feature matching?
Colmap uses various algorithms for feature matching, including exhaustive, sequential, and vocabulary tree methods, depending on the nature of the image dataset.
What is geometric verification in Colmap?
Geometric verification is a process that ensures the matches between features in different images make sense geometrically, filtering out incorrect matches.
What is the difference between incremental and global reconstruction?
Incremental reconstruction adds images one at a time, while global reconstruction estimates the 3D poses of all images simultaneously.
Can I use Colmap on a standard computer?
Yes, Colmap can run on standard computers, although performance may vary based on hardware specifications.
What is bundle adjustment?
Bundle adjustment is an optimization process that refines the 3D points and camera poses to improve the accuracy of the reconstruction.
What should I consider when taking images for 3D reconstruction?
Ensure to take sharp images with good features and varied angles to improve the quality of the 3D reconstruction.
Is there a learning curve for using Colmap?
Yes, while Colmap is powerful, understanding its various options and workflows may require some time and experimentation.
Where can I find tutorials for Colmap?
Colmap's official website provides tutorials and documentation to help users get started with the software.
Voir plus de résumés vidéo
Microsoft Build 2025 Keynote: Everything Revealed, in 14 Minutes
Long Distance Relationships Tested! |
Pelatihan SMILE Logistik Provinsi Kaltim, Kalsel & Kaltara 19-22 Mei 2025 - Sesi Panel
How to Start a Speech: The Best (and Worst) Speech Openers
Cancer: Unregulated Cell Division
Chairman Powell speaks after Fed hikes interest rates by 0.75% to fight inflation — 9/21/2022
- 00:00:00Welcome to another episode of computer
- 00:00:02vision decoded. I'm really excited about
- 00:00:04this episode because it's going to solve
- 00:00:06a lot of questions that we get about
- 00:00:10structure from motion and 3D
- 00:00:12reconstruction when it comes to coal map
- 00:00:15and just figuring out how to do some of
- 00:00:17the basics of 3D reconstruction from
- 00:00:20imagery. And as always, I have Jared
- 00:00:22Heinley, our in-house computer vision
- 00:00:25expert, to walk us through what happens
- 00:00:28when you run software like Colemap to
- 00:00:31get camera poses, 3D reconstruction, and
- 00:00:34kind of break down how that all works at
- 00:00:36a tangible level. So when you walk away
- 00:00:38from this episode, you should have a
- 00:00:40better understanding of this black box
- 00:00:43of Cole Map and other 3D reconstruction
- 00:00:46software that follows the same workflow.
- 00:00:48So, as always, Jared, thanks for joining
- 00:00:50me and welcome to the episode. Yeah,
- 00:00:52thank you. Let's just get to what we're
- 00:00:54all here for. Let's let's learn about
- 00:00:56Cole Map. And I don't want to say
- 00:00:59specifically Cole Map, but we're going
- 00:01:00to use it as the basis for this episode
- 00:01:04to have something for someone to follow
- 00:01:06along. And since it's open-source and
- 00:01:09free, they can download Cole Map and do
- 00:01:13this on their own PC without, you know,
- 00:01:16have to pay for some third party
- 00:01:18software that they won't learn as much
- 00:01:20through. So Jared, let's just start off
- 00:01:22with I'm going to share my screen. I
- 00:01:25have some images and we want to turn
- 00:01:27these images into a 3D model or just at
- 00:01:31least know where these cameras are in
- 00:01:33relation to each other. I'm going to be
- 00:01:35doing some screen shares. If you're
- 00:01:37listening to the audio only, I'll do my
- 00:01:39best to talk about what we have on the
- 00:01:42screen. But, uh, if I start out here, I
- 00:01:45have a picture of a well, it was a
- 00:01:48fountain that used to work in front of
- 00:01:50the Oregon State Capital. I took this
- 00:01:52one sunny day last year. And if I flip
- 00:01:55through the images, I basically walked
- 00:01:58around this fountain and got a bunch of
- 00:02:02good angles. In fact, I believe I used a
- 00:02:04video and extracted a bunch of images
- 00:02:07and at some points there's some sun
- 00:02:09issues, things like that. But it was
- 00:02:12good enough for me to get a 3D model. So
- 00:02:15Jared, what's what's the first step
- 00:02:17someone would take then to turn this
- 00:02:20into a 3D model? Know where the cameras
- 00:02:22are, things like that? Yeah. Yeah. Well,
- 00:02:24you just you hinted it right there at
- 00:02:26the very end. Know where the cameras
- 00:02:27are. And I guess and to try to refine
- 00:02:29some of my language. A lot of times when
- 00:02:30I say camera, sometimes I mean, you
- 00:02:32know, image and camera. I'll use those
- 00:02:34words interchangeably sometimes, you
- 00:02:36know, but you said that you walked
- 00:02:37around with a single camera, you know,
- 00:02:39your phone or a DSLR or whatever it may
- 00:02:42be. And from that video, maybe you
- 00:02:44extract frames, you know, images or you
- 00:02:46took photos yourself. And so you have
- 00:02:49multiple images taken by a single
- 00:02:51physical camera, but you were moving
- 00:02:54around that scene, moving around that
- 00:02:56object. And so that camera was occupying
- 00:02:58different physical 3D points in space
- 00:03:01and then these images were captured from
- 00:03:02those different 3D points and those from
- 00:03:04those different 3D 3D perspectives. So
- 00:03:07you know as humans we just do this
- 00:03:09naturally like as you just flipped
- 00:03:10through those photos there and you know
- 00:03:12uh you know and as you kind of orbited
- 00:03:14around that fountain it's like yeah our
- 00:03:17brains are immediately like oh yeah okay
- 00:03:19I can see that the ground is a little
- 00:03:21bit closer. Here's this foreground
- 00:03:23fountain. I see the trees in the
- 00:03:24background. I see some other structures
- 00:03:26in the background and I'm immediately I
- 00:03:29can see that yep you were moving to the
- 00:03:31left and sort of this clockwise motion
- 00:03:33this thing's you know near that other
- 00:03:35things are far and our brains are
- 00:03:37immediately doing all of that 3D
- 00:03:38reasoning but in order to have software
- 00:03:42do this in order to have a computer
- 00:03:43generate a 3D reconstruction or 3D
- 00:03:46representation of what's in these photos
- 00:03:49it has to figure out it has to do all of
- 00:03:51that math and it doesn't know how to do
- 00:03:52that reasoning by default. has to figure
- 00:03:54out well where were you standing when
- 00:03:56that photo was taken? Where was the
- 00:03:58camera positioned? How was it angled?
- 00:04:00What was the zoom level uh of of the
- 00:04:02current lens? And so it's doing all has
- 00:04:04to figure out where everything was
- 00:04:07oriented. And that's typically one of
- 00:04:08the first processes is trying to figure
- 00:04:09out how how are things related to each
- 00:04:11other, you know, and once we kind of
- 00:04:12know how they're related, then figure
- 00:04:14out what is the 3D 3D geometry that that
- 00:04:17uh describes describes that
- 00:04:19relationship.
- 00:04:21And so it goes through. So, I I don't
- 00:04:23have it on my screen, but I will pull it
- 00:04:26up in a second, but Cole Map has in
- 00:04:28their tutorial information a good kind
- 00:04:32of diagram. I'll let me bring that up,
- 00:04:35but it it it basically shows the
- 00:04:36workflow that it goes through. So, if I
- 00:04:39go to the actual website for Cole Map
- 00:04:43and you go look at their tutorial, you
- 00:04:45can see that. So, let's just pull that
- 00:04:47up on my screen as well. while he's
- 00:04:50pulling that up. Um, just jump in with a
- 00:04:53little bit of personal history about
- 00:04:54call map. So, I did my PhD back at UNC
- 00:04:59Chapel Hill. So, I was there from 2010
- 00:05:00to 2015. And while I was there, Johannes
- 00:05:03Schunberger, he came to UNCC for two
- 00:05:05years to do his masters. And so,
- 00:05:07Johannes, he's the author of Call Map.
- 00:05:10Um, but at the time when he was there,
- 00:05:11call Map didn't exist. Johannes had
- 00:05:14worked on uh previous structural motion
- 00:05:16software and had built he'd worked with
- 00:05:18some uh I believe it was drones so
- 00:05:20aerial photography 3D reconstruction and
- 00:05:23so he had built a pipeline that he had
- 00:05:24called MAV map I I'll probably get this
- 00:05:27wrong but I think like you know mobile
- 00:05:28aerial vehicle mav map like map or
- 00:05:31mobile vehicle mapper and so but he was
- 00:05:34looking to generalize that to move
- 00:05:36beyond just aerial photography and to do
- 00:05:39more general purpose image collections
- 00:05:42and So it was this idea of image
- 00:05:44collections you know where he came up
- 00:05:46with call map collection mapper um to
- 00:05:49say I want to take a collection of
- 00:05:51images and generate a 3D reconstruction
- 00:05:53from it. So he was working on that while
- 00:05:54he was at UNC. I may have been one of
- 00:05:57the first people to actually use calm
- 00:05:59map um in in my final uh PhD project. I
- 00:06:04had uh processed a 100 million images on
- 00:06:07a single PC and I was doing so this you
- 00:06:10feature matching extraction but then I
- 00:06:12needed some way to reconstruct them and
- 00:06:14our lab had some other software that
- 00:06:16could do 3D construction but Johannes
- 00:06:17had just written this first version of
- 00:06:19call map and so I said great let's use
- 00:06:21that and that that was efficient that
- 00:06:24was fast and it did did exactly what we
- 00:06:26needed to do and so that that helped uh
- 00:06:28helped get my paper across the gold line
- 00:06:29there at the very end so nice and since
- 00:06:32then Johannes has gone off you know at
- 00:06:33ETH circ and now uh at other companies
- 00:06:36and continued to you know open source
- 00:06:38call map and now it's used all over the
- 00:06:40world and has won won him some awards
- 00:06:42for it. So you know interestingly the
- 00:06:45glow map what came out last year and he
- 00:06:48had his fingers in that as well. Yep. So
- 00:06:50it's not over. I still see coal map
- 00:06:52being updated on a semi-regular basis as
- 00:06:56well. So although it came out a few
- 00:07:00several years ago, it's it's not static.
- 00:07:02No. No. because it because it is such a
- 00:07:05an important step the the task that call
- 00:07:08map solves and and similarly glow map
- 00:07:10you know figuring out the 3D pose you
- 00:07:13know pose is position plus orientation
- 00:07:16figuring out the 3D pose of images is a
- 00:07:19key step in so many uh 3D pipelines you
- 00:07:22if you want to understand the world in
- 00:07:243D you got to figure out where these
- 00:07:25images were taken from you know and
- 00:07:27that's the key task that that call map
- 00:07:30uh solves for a lot of people Okay, that
- 00:07:33that makes that makes sense. I had no
- 00:07:34idea also that coal map stood for
- 00:07:36collection mapper. I'm I mean it makes
- 00:07:39sense, but I thought maybe it was a long
- 00:07:41acronym. So, um so, okay. Well, I have
- 00:07:45this diagram up then. If you're
- 00:07:47watching, you can see it on the screen,
- 00:07:48but if you're listening, it's basically
- 00:07:51a workflow of how images go from just a
- 00:07:55collection of images to a 3D
- 00:07:58reconstruction. And you got camera
- 00:08:00poses. And I'm going to show this in
- 00:08:01coal map on my screen as well. But this
- 00:08:04diagram just shows the different phases
- 00:08:06are the right or steps that you go
- 00:08:08through to get from pictures to 3D. And
- 00:08:11it starts out with feature extraction.
- 00:08:14And if you actually go to the tutorial
- 00:08:16as well. So if I share the just the
- 00:08:17tutorial page, that diagram makes sense.
- 00:08:20But the minute you start diving into it,
- 00:08:22you have a wall of text that to most
- 00:08:25people won't make through this very well
- 00:08:28unless they are perhaps a computer
- 00:08:30science major, someone like Jared who
- 00:08:32does this academically or for a job. I
- 00:08:36look at this and I'm like, okay, some of
- 00:08:38this makes sense. A lot of this is
- 00:08:40beyond me. So, we're going to break that
- 00:08:42down. So, yeah. Okay, starting out with
- 00:08:43feature extraction. So, what is that
- 00:08:45step? So, what do we We're taking the
- 00:08:46images and sounds like something's
- 00:08:48happening there with features. Yeah.
- 00:08:50Yeah. Absolutely. So, uh, and just to
- 00:08:52take a step back here too. So, like we
- 00:08:54like you said, this is, you know, sort
- 00:08:55of a workflow, a sequence of steps that
- 00:08:57goes into generating a reconstruction.
- 00:08:59So, you had those images input. There's
- 00:09:00a sort of first block of steps that's
- 00:09:02labeled correspondent search. After
- 00:09:04that, we have incremental reconstruction
- 00:09:06and then finally we end up with a final
- 00:09:08reconstruction. But yeah, so within
- 00:09:09correspondence search, our goal for
- 00:09:11correspondent search is to figure out
- 00:09:14the 2D relationship between that
- 00:09:17collection of images. So, we're not even
- 00:09:18talking about, you know, real 3D yet.
- 00:09:20There might be some hints at 3D uh in
- 00:09:23these steps, but we haven't done any
- 00:09:25reasoning to really understand which
- 00:09:27photos, you know, are where in 3D space.
- 00:09:30Um, so it's just about 2D understanding,
- 00:09:332D matching, 2D correspondence between
- 00:09:35this this collection of images. So, with
- 00:09:38that in mind, first step is feature
- 00:09:41extraction. So the goal there is to
- 00:09:45identify unique landmarks within a
- 00:09:48photograph. And these unique unique
- 00:09:51landmarks, the intent of that is if I
- 00:09:54can identify a unique landmark, you
- 00:09:56know, a 2D point in one photo, hopefully
- 00:09:59I can identify that same point in
- 00:10:00another photo and another photo and
- 00:10:02another photo. And if I can identify and
- 00:10:05follow or you know or track that 2D
- 00:10:07point between multiple images now I can
- 00:10:10use that as a constraint later on when I
- 00:10:13do the 3D construction. I can say, "Hey,
- 00:10:15hey, however these images are
- 00:10:18positioned, that point that they saw,
- 00:10:21that pixel should converge to a common
- 00:10:243D position in space." And so it's
- 00:10:26adding a sort of a viewing constraint
- 00:10:28saying, you know, each image saw a 2D
- 00:10:30point. I don't know the depth of that
- 00:10:31point. So it all it sort of gives me is
- 00:10:33a viewing ray. So along this direction
- 00:10:36out into the scene, I saw this unique
- 00:10:38landmark. Now, I've seen that same
- 00:10:40landmark in many other photos. I want to
- 00:10:42identify that and add that as a
- 00:10:44constraint because that is like most
- 00:10:45likely you know a 3D point. So feature
- 00:10:48extraction is the automatic
- 00:10:50identification of typically tens of
- 00:10:52thousands thousands or tens of thousands
- 00:10:54of these unique landmarks in an image. A
- 00:10:57lot of times there are different flavors
- 00:10:58of feature detection. The one used in
- 00:11:01call map is sift scale and variant
- 00:11:03feature transform. What it does is it
- 00:11:05looks for I call it a blob style
- 00:11:08detector where it's looking for a patch
- 00:11:11of pixels that has high contrast to its
- 00:11:13its background. So it could be something
- 00:11:15that's you know light colored surrounded
- 00:11:17by dark or vice versa something that's
- 00:11:19dark surrounded by light. You know it's
- 00:11:21going to look at look for these at
- 00:11:23multiple scales. That's why it's scale
- 00:11:25invariant. So multiple resolutions. So
- 00:11:27this could be something that's you know
- 00:11:28very small or something that's larger in
- 00:11:31the image. Mhm. But once it's
- 00:11:34found that sort of high contrast
- 00:11:36landmark, it now will then extract some
- 00:11:40representation of uh the appearance, you
- 00:11:44know, of of the area around that
- 00:11:45landmark. So it'll say, "Hey, I found
- 00:11:47something interesting." So maybe it's
- 00:11:48the um you know, a doorork knob on a
- 00:11:51door, you know. So it'll say, "Hey, that
- 00:11:53that doorork knob is a different color
- 00:11:55than the background, the rest of the
- 00:11:57door." And so now I want to describe
- 00:11:59that door knob. And so I'm going to look
- 00:12:01I don't want to look just at the
- 00:12:02doororknob itself. I'm going to look
- 00:12:03around it and say here's my doorork knob
- 00:12:05and then oh there's this wood pattern on
- 00:12:08the door around it. And so it's going to
- 00:12:09come up with a representation for that.
- 00:12:12And so what sift actually does or what
- 00:12:14different feature representations are
- 00:12:16that could be a whole podcast in and of
- 00:12:18itself. But at a conceptual level you
- 00:12:20just think about it. It sort of
- 00:12:23summarizes what that looks like at at a
- 00:12:25rough level. It says, "Okay, I saw
- 00:12:27something dark in the middle and then
- 00:12:28there was this, you know, rough pattern
- 00:12:30around it vicinity." Mhm. Okay. So then
- 00:12:34I'm bringing up coal map and this is
- 00:12:36I've unfortunately had already run the
- 00:12:38project because I didn't want us to have
- 00:12:40to sit and watch things go and a lot of
- 00:12:42these things run really fast. So sift is
- 00:12:44fast if you can run it on GPU. I don't
- 00:12:47can't necessarily show what 10 what say
- 00:12:49I think it maxes at 10,000 by default
- 00:12:52but if you have coal map and you kind of
- 00:12:54want to follow along the first thing you
- 00:12:56do is set up a new project and that
- 00:12:59part's pretty easy but then you just go
- 00:13:01to processing and hit feature
- 00:13:03extraction and you get to pick a camera
- 00:13:06model. Why is that important? Why is
- 00:13:07picking a camera model important for
- 00:13:09this? Well, this is important and and
- 00:13:11this this ends up being really important
- 00:13:13later on when we start thinking about
- 00:13:14the geometry of these images and and
- 00:13:17what kind of camera and lens was used
- 00:13:20because these camera models are it is
- 00:13:23defining the geometry of that camera. So
- 00:13:27this right now you have a simple radial
- 00:13:29camera you know selected and so
- 00:13:30underneath of it sort of in grayscale
- 00:13:34are some parameters listed. It says,
- 00:13:35"Oh, simple radial has f, cx, c y, and
- 00:13:39k." Mhm. And so you kind of have to know
- 00:13:41from a computer vision literature that f
- 00:13:44is your focal length. CX and CY, that's
- 00:13:46the principal point. So that's defined,
- 00:13:48well, where is the center uh of my image
- 00:13:50or where is the optical axis of my my
- 00:13:54lens and how is that aligned with the
- 00:13:56image center? So a lot of times I just
- 00:13:57kind of say, hey, hand wavy, it's you
- 00:13:59know, what's the center of my image? And
- 00:14:01then that K is a a single radial
- 00:14:05distortion term. So it's assuming a lot
- 00:14:07of times lenses introduce a little bit
- 00:14:10of curvature effect, you know, curvature
- 00:14:12distortion to them. And so we're going
- 00:14:13to use a single mathematical term, a
- 00:14:16single, you know, polomial term to
- 00:14:19represent the distortion in that lens.
- 00:14:22This might be great. This is great for a
- 00:14:24lot of just, you know, general cameras.
- 00:14:26But if you know that your lens has a
- 00:14:29little bit more distortion, maybe you're
- 00:14:31using, you know, a wide-angle camera, a
- 00:14:33GoPro or a drone that has uh a wider
- 00:14:36field of view and some distortion. If
- 00:14:38you have a really wide angle camera,
- 00:14:40something that you can see lot of
- 00:14:42distortion, then you might want one of
- 00:14:43these one of these fisheye versions.
- 00:14:45They have simple radial fisheye or the
- 00:14:47normal fisheye. There's even I think at
- 00:14:49the very bottom of the list, there's one
- 00:14:50called FOV. That's one that's really
- 00:14:53great for super wide angle. Mhm. You
- 00:14:56know, a lot of times for a normal camera
- 00:14:58like your iPhone in your pocket or your
- 00:14:59DSLR or your point and shoot or whatever
- 00:15:01it ends up being or your simple radial
- 00:15:04or your radial models um are nice
- 00:15:07because they assume that you've um
- 00:15:12you've got a single focal length. You
- 00:15:14know, your pixels are square. So, I
- 00:15:16don't need more than one f term. You
- 00:15:18want to model your principal point with
- 00:15:20CXC. And here the radial model added an
- 00:15:24extra lens distortion. So now instead of
- 00:15:25just K, now we have K1 and K2. So that's
- 00:15:28two radial distortion terms. So we can
- 00:15:30do a little better job of estimating the
- 00:15:33distortion of our lens. Okay. And so
- 00:15:35call map asks for this right away
- 00:15:37because what it's doing is it has that,
- 00:15:39you know, part of that project creation
- 00:15:41process is you create a database. And so
- 00:15:43that's going to be, you know, a
- 00:15:44collection of data stored on disk. And
- 00:15:46so this process of feature extraction uh
- 00:15:49is when call map goes through all of
- 00:15:51your images, extracts features, but then
- 00:15:54also creates those image entries in the
- 00:15:56database. And so it needs to know what
- 00:15:58style of camera is going to be
- 00:16:00associated with that image. Mhm. And and
- 00:16:03we could go and deepen a bunch of
- 00:16:05buttons on here. Yeah. I don't want if
- 00:16:07you just run this in default and simple
- 00:16:09radio and using smartphone or something,
- 00:16:12you'll be okay. But, you know, like here
- 00:16:14is thinking I have all these different
- 00:16:15cameras. There's options where you can
- 00:16:17say use it's always one camera. So, it
- 00:16:20just assumes then everyone's the same
- 00:16:21camera, which is great. Yeah, that's
- 00:16:23good. There's options for masks. I just
- 00:16:27bring up a mask on my screen. This is me
- 00:16:30masked. This is a mask. Not necessarily
- 00:16:33the mask you would um use, but basically
- 00:16:36there's a picture of me. This might be
- 00:16:38the wrong picture. And I've been with
- 00:16:40the mask as a separate file. And then if
- 00:16:42you kind of like combine the two, you
- 00:16:43end up with me masked out. And that's
- 00:16:46like a way to say you want me not to be
- 00:16:48in this result. You can mask out things.
- 00:16:51Specifically, if you want perhaps just
- 00:16:52an object to be reconstructed, you want
- 00:16:54to mask out a background, things like
- 00:16:56that we could go deep into. But there's
- 00:16:59all these options, right, to help get
- 00:17:01the right key points. So, if I go to
- 00:17:03this database, so I ran this already and
- 00:17:05I have this database manager where I can
- 00:17:07kind of jump into things and I pick one
- 00:17:09of these and I'm just going to hit show
- 00:17:11image, it's going to bring up the image
- 00:17:13and I can make this nice and big on my
- 00:17:15screen. What we're seeing now is an
- 00:17:16image of the fountain. I'm on the back
- 00:17:20side of it right now with all these red
- 00:17:22circles which are key points. Not
- 00:17:24necessarily all the features, right?
- 00:17:25It's just some of the ones that I think
- 00:17:27it matched on. Is that wrong or am I on
- 00:17:29the wrong I'm not I'm not entirely sure.
- 00:17:31Yeah. Yeah. In some software packages,
- 00:17:33they may show you all of them or may
- 00:17:35show you just just the ones that have
- 00:17:37been matched. I'm not sure with this
- 00:17:39spec specific viewer right now. So,
- 00:17:41yeah. And I'm not 100% clear either. I
- 00:17:44haven't read the documentation. All I
- 00:17:45know is visualizing. So, this is an idea
- 00:17:47of key points where you'll notice
- 00:17:49there's no key points where you have a
- 00:17:50lot of low contrast, not a lot of visual
- 00:17:53variation. So, I'm on my screen. And
- 00:17:55there's a part where it shows the street
- 00:17:57and there's just not much going on there
- 00:18:00versus there's a lot of points on the
- 00:18:01fountain which has all these ornate
- 00:18:03decorations on it. In the background
- 00:18:05there's trees and buildings that it's
- 00:18:07latching onto. So it makes sense that
- 00:18:09where you have less variation you're
- 00:18:11going to have less features that it's
- 00:18:14it's oh the sky is also another one
- 00:18:16where you
- 00:18:18this nice tree behind this thing it
- 00:18:19caught a lot on. So it doesn't mean it
- 00:18:21matched on those because you might not
- 00:18:23see those. So if I then I'm going to
- 00:18:25close this and then you can look at show
- 00:18:27overlapping images. So you know if I
- 00:18:29click here you can look at the the
- 00:18:30matches. You're going to see then this
- 00:18:32kind of correspondence matches where
- 00:18:34it's finding key points between two
- 00:18:37images and they show these green lines
- 00:18:39basically saying these two images have
- 00:18:41matching features that it it believes
- 00:18:43are the same points. Right. Is that what
- 00:18:45we're seeing? Exactly. Exactly. So that
- 00:18:47this is now sort of moved to the second
- 00:18:50and third bubbles within that
- 00:18:52correspondent search block. So back to
- 00:18:54that correspondent search. The first
- 00:18:55step was the feature extraction which
- 00:18:57was just the identification of these key
- 00:19:00points in each of the images. So it
- 00:19:02wasn't even trying to compare images
- 00:19:03yet. We're just saying for each image
- 00:19:05let me find those key points. And as as
- 00:19:07Jonathan said, by default, if you've got
- 00:19:09a GPU enabled version of call map and
- 00:19:12you've got a nice GPU in your computer,
- 00:19:14uh it will use the GPU implementation,
- 00:19:15that graphics processor, which makes it
- 00:19:17go a lot faster. So once we've extracted
- 00:19:20those key points and those or or
- 00:19:22features, again, I use those terms
- 00:19:24interchangeably a lot, the key point and
- 00:19:25the feature. Now, we want to match
- 00:19:28images together, and that's to discover
- 00:19:31which images show similar content. And
- 00:19:34so the result of that is going to be the
- 00:19:37set of correspondences, the set of uh
- 00:19:39features saying the features in this
- 00:19:40image matched to the features in this
- 00:19:42image. And those were those green lines
- 00:19:43that Jonathan had shown up uh just prior
- 00:19:46saying that you know not all of the key
- 00:19:48points from one image matched to the
- 00:19:49other. There was some subset but um
- 00:19:52we're trying to discover what those
- 00:19:54matches are. In this diagram we said
- 00:19:57that you know we had feature extraction,
- 00:19:58matching and then geometric
- 00:20:00verification. matching and geometric
- 00:20:02verification uh a lot of times will go
- 00:20:04hand inand you know so you run matching
- 00:20:06and then you immediately run geometric
- 00:20:08verification after that. So the
- 00:20:10intention there is your matching is just
- 00:20:14trying to figure out which features look
- 00:20:18similar between two images but it's not
- 00:20:20trying to do any sort of 2D or 3D
- 00:20:23reasoning. So, it may think that, oh,
- 00:20:26the the top of the tree in one image
- 00:20:29looks like the top of another tree in
- 00:20:31another image, but they're in completely
- 00:20:32different parts of the image, and it
- 00:20:33doesn't even make sense. Like, it it may
- 00:20:35confuse things or especially if you
- 00:20:37have, you know, a building with some
- 00:20:38sort of repetitive pattern on it. You
- 00:20:40know, the same brick repeated over and
- 00:20:42over again, but you have some sort of
- 00:20:43unique windows or unique artwork, you
- 00:20:46know, that appears, you know, on that
- 00:20:47wall. For feature matching, it may end
- 00:20:49up matching incorrect parts of the image
- 00:20:51to each other. So matching does its best
- 00:20:55to try to figure out what matches, but
- 00:20:56it might be wrong. It's geometric
- 00:20:58verification's job to come in and clean
- 00:21:01those up to figure out, well, now that I
- 00:21:03have these initial set of matching key
- 00:21:05points between my two images, which ones
- 00:21:08actually make sense based on our
- 00:21:10knowledge of geometry and how cameras
- 00:21:12move. And so that's where sometimes you
- 00:21:15can leverage, you know, knowing what
- 00:21:16kind of camera model you have can be
- 00:21:18helpful. knowing if if you expect a lot
- 00:21:20of distortion or if it's a fisheye lens,
- 00:21:22that can help. But sometimes um some
- 00:21:25methods don't even try to use that
- 00:21:27information. We'll just look at the 2D2D
- 00:21:29relationships. Mhm. And so there are
- 00:21:31some key words that you might see would
- 00:21:33be it's you know estimating a homography
- 00:21:36homography like a perspective transform
- 00:21:38or an essential matrix or a fundamental
- 00:21:40matrix. So each of these sort of
- 00:21:42relationships, each of these matrices is
- 00:21:44a way to describe how a point in one
- 00:21:47image matches to a location in another
- 00:21:50image or a set of locations in another
- 00:21:52image. And and so we're trying to
- 00:21:54estimate, you know, is there a valid
- 00:21:57camera motion that we can imagine to get
- 00:22:01a set of points in one image to move to
- 00:22:03the set of points in the other image.
- 00:22:04That's what geometric verification is
- 00:22:06doing. Just figuring out those those 2D
- 00:22:09relationships uh between images. And
- 00:22:12somewhere in my logs, you can see some
- 00:22:14hints of that. So, as this running, it's
- 00:22:16showing all kinds of text on your screen
- 00:22:18and it's I'm sure some of that when it's
- 00:22:21well, it's showing bundle adjustment on
- 00:22:23my screen right now, but at one point
- 00:22:25it's it's talking about some of that the
- 00:22:27matches and running different algorithms
- 00:22:30in the background to get that. Um, so
- 00:22:32and then if I if I click on like one of
- 00:22:34these points that it created, it almost
- 00:22:36it shows you where you have multiple
- 00:22:37matches on a specific point and things
- 00:22:40you can do to kind of get different
- 00:22:42views and get hints of what we're
- 00:22:44talking about here. But so one thing we
- 00:22:46we didn't really talk about when you're
- 00:22:47matching these images too that there's
- 00:22:50there's different options as well. So
- 00:22:52when I go through here, I'm processing,
- 00:22:53I've got my key points. It goes fast on
- 00:22:55a GPU because it's able to like look at
- 00:22:57all the different images all at once,
- 00:22:59right? They don't care about respect to
- 00:23:00each other when you're extracting
- 00:23:02features. But then you get to the point
- 00:23:03where you need to do your matching. This
- 00:23:06is where it's all CPU driven because
- 00:23:08it's kind of either a sequential or
- 00:23:10exhaustive, but it's not able to look at
- 00:23:11every image all at once. But there's
- 00:23:14options here where if I go to this
- 00:23:17button here, it's not displaying on my
- 00:23:19screen correctly for some reason. Oh,
- 00:23:20there we go. You can you can do
- 00:23:22exhaustive, sequential, vocab tree,
- 00:23:25spatial. There's these different styles
- 00:23:27you can pick or I want to say styles,
- 00:23:29different algorithms you can pick to
- 00:23:31match these. Yep. My understanding
- 00:23:33always is if you have a random
- 00:23:35collection of images like someone walked
- 00:23:37around and they're not necessarily one
- 00:23:39image is taken and then your next image
- 00:23:42you moved over and took just of the same
- 00:23:44part of the scene. But I don't know,
- 00:23:46maybe you're just walking around taking
- 00:23:47pictures in all which directions.
- 00:23:49Exhaustive is what you want to use
- 00:23:50because it's going to you can explain
- 00:23:52this but it's going to like kind of try
- 00:23:54to get every image to match to every
- 00:23:56image versus sequential where you're
- 00:23:58saying no no no each image was taken in
- 00:24:01sequence. So I see the fountain from one
- 00:24:03spot I moved a few feet took another
- 00:24:05photo of it. They should be sequentially
- 00:24:06somewhat matching to each other. Does
- 00:24:09that sound correct? Is am that the right
- 00:24:11assumption? You you're exactly right.
- 00:24:13You're exactly right. So yeah, once you
- 00:24:15once you've extracted the key points
- 00:24:16from a single image, now you want to
- 00:24:18figure out well which pairs of images
- 00:24:20you know are related to each other. So
- 00:24:22the the simplest most naive way is to
- 00:24:24say well let me match every single image
- 00:24:25to every single other one. Let me look
- 00:24:27at all order n squared every single
- 00:24:30combination of pairs of images that I
- 00:24:32can imagine. And so that's what
- 00:24:34exhaustive matching is doing. So
- 00:24:35exhaustive matching like you said it's
- 00:24:37great when you have sort of an unsorted
- 00:24:40random collection of images and
- 00:24:41especially it works well if you have you
- 00:24:43know the order of a few hundred images
- 00:24:45um you know because because it is doing
- 00:24:48this you know every image to every other
- 00:24:50image that quickly gets expensive in
- 00:24:52terms of time like that's going to take
- 00:24:53a lot of time to compute if you try to
- 00:24:55do this on thousands of images you can
- 00:24:57still do it you just have to wait a long
- 00:24:59time but yeah it's it's great because
- 00:25:00it's going to try to discover every
- 00:25:02single pair of matching images that it
- 00:25:05Mhm. And so that's where then the
- 00:25:06sequential is nice if you have something
- 00:25:08like you said there in the fountain
- 00:25:10sequence where you know hey these are
- 00:25:12you know frames from a video or my
- 00:25:14images you maybe I was taking photos but
- 00:25:16I I'm taking them in order like oh I
- 00:25:18started here took a photo took a few
- 00:25:20steps took another photo took a few more
- 00:25:22steps took another photo and so there is
- 00:25:23some sort of sequential information to
- 00:25:26those photos you know that images taken
- 00:25:29near each other in that list show
- 00:25:31similar content and that's what
- 00:25:33sequential it'll leverage leverage that
- 00:25:35information to help the the matching be
- 00:25:36more efficient. And then I don't really
- 00:25:39understand vocab tree. I do know that if
- 00:25:41you want to do an exhaustive style
- 00:25:43match, not sequential, but you have
- 00:25:45let's say 800 images, I've always heard
- 00:25:47use a vocab tree. Yeah. Yeah, that
- 00:25:50that's exactly right. So the vocab tree,
- 00:25:53you might heard like it's a vocabulary
- 00:25:55tree or image retrieval style matching.
- 00:25:57Yeah. What it's doing behind the scenes
- 00:25:59is is it uses a image lookup data
- 00:26:04structure. So it takes all the images,
- 00:26:06comes up with a really compact
- 00:26:09summarization of the kinds of things
- 00:26:11that are in each image and then provides
- 00:26:13a way that I can say, hey, for this
- 00:26:15given image, what other images in my
- 00:26:18data set are likely to have the same
- 00:26:21kinds of things in them. you know, it's
- 00:26:23not a guarantee, but it just says, you
- 00:26:25know, if I'm I have one image and I've
- 00:26:27got 10,000 other images I can match to,
- 00:26:29I can ask it, well, hey, I don't want to
- 00:26:31look at all 10,000. Can you at least
- 00:26:33give me a sorted list of the ones that
- 00:26:35are most likely to match? And so that's
- 00:26:37what the vocab tree option does for you
- 00:26:38is it returns that ranked list and then
- 00:26:41so instead of matching all 10,000, I can
- 00:26:43choose to match the best 50 or the best
- 00:26:45100 or whatever my threshold. Speed up.
- 00:26:48Yep. It's more efficient. Yeah. Um, once
- 00:26:51you get beyond three to 400 images,
- 00:26:55exhaustive should not be your option.
- 00:26:57You should go to vocab tree unless
- 00:26:59they're all sequentially taken. And then
- 00:27:01always use sequential. Well, not always,
- 00:27:02but that's that's probably your default.
- 00:27:04So, if I'm taking a video and then
- 00:27:06extracting images, sequential is always
- 00:27:09going to work. Well, always going to be
- 00:27:10your first option if you want to be as
- 00:27:12fast as possible. And so, and then and
- 00:27:14then in here, you can I know you can you
- 00:27:16can uh pick loop detection. So, it's
- 00:27:18trying to we've talked about that
- 00:27:19before, right? is it's trying to detect
- 00:27:21have you come back to an area correct
- 00:27:23and and and that will do it using the
- 00:27:26vocab tree option like so if I do loop
- 00:27:28detection so under the sequential tab if
- 00:27:31I do loop detection and then specify a
- 00:27:33vocab tree path there at the bottom that
- 00:27:37will enable it to say oh as I'm
- 00:27:39processing through all those video
- 00:27:40frames you know every 10th frame or
- 00:27:42every 50th frame or every 100th frame
- 00:27:43whatever you set it to you can have it
- 00:27:46go and then do a vocabulary tree
- 00:27:48retrieval do that image retrieval step
- 00:27:50to try to discover loop closures within
- 00:27:53within some of that uh that okay so we
- 00:27:56have these options I always just say and
- 00:27:58then there's spatial and transitive we
- 00:28:00haven't talked about that does spatial
- 00:28:02have to do with GPS exactly right so it
- 00:28:04just says you know for each image
- 00:28:05assuming if the images have embedded uh
- 00:28:08geo tags so GPS data embedded in the
- 00:28:10EXIF it will say for each image just
- 00:28:13find other images with similar GPS and
- 00:28:15match to those yes I love that a lot of
- 00:28:18people here listening probably are
- 00:28:20taking drone images and spatial is the
- 00:28:24one I always use. That's a great option
- 00:28:26because a lot of times that drone is
- 00:28:27looking straight down or you know it's
- 00:28:29not looking at completely random
- 00:28:31directions but there is some order and
- 00:28:33structure to that drone data and so that
- 00:28:36and in fact a lot of the drones that
- 00:28:37people are using nowadays have a really
- 00:28:39good GPS on it. thinking of the
- 00:28:42enterprise versions of like a DJI drone
- 00:28:45are getting really good GPS. Even even
- 00:28:47without a RTK attachments, it's not
- 00:28:50going to it's not going to throw a bunch
- 00:28:52of air into there. And then what's
- 00:28:53transitive? That's the one I don't think
- 00:28:55I've ever touched. I don't even know
- 00:28:56what that means. Yeah, that just that's
- 00:28:58a way to densify a set of existing
- 00:29:01matches. So suppose you had gone and run
- 00:29:04one of the existing modes. see ran.
- 00:29:06Okay, maybe not exhaustive, but like if
- 00:29:08you had ran sequential or ran your
- 00:29:11spatial or ran your vocab tree, but then
- 00:29:14you wanted to go back and create a a
- 00:29:16more complete set of connections between
- 00:29:18images. What transitive will do is it'll
- 00:29:20look at your database and it'll say,
- 00:29:22"Hey, if image A matched to B and image
- 00:29:26B matched to image C, but I didn't try
- 00:29:29to match image A directly to C, let me
- 00:29:31go ahead and do that now." And so it
- 00:29:33goes back and finds these transitive
- 00:29:35links between images and attempts to do
- 00:29:38that matching. And so what that does
- 00:29:39that just creates a stronger set of
- 00:29:40connections between images which will
- 00:29:43help it call map out during the
- 00:29:44reconstruction phase. Okay. So that I
- 00:29:46feel like this gives me a good idea then
- 00:29:48of or the the
- 00:29:49listener/viewer an idea. There's
- 00:29:52different options. Pick the one that
- 00:29:54makes sense for the data set you have.
- 00:29:56You might get the best results out of
- 00:29:59exhaustive as far as air, but you might
- 00:30:01be waiting a day. Heard people say, "I
- 00:30:04set this and now it's telling me it'll
- 00:30:06be ready in 28 hours." Well, probably
- 00:30:08not the right mode. You probably used a
- 00:30:09vocab tree, but you know, I always say
- 00:30:12find the right one. Start with
- 00:30:14sequential. If you have sequential
- 00:30:15images, at least you probably get good a
- 00:30:17good result there. I also want and just
- 00:30:20to mention it back you know in the
- 00:30:22diagram under the corresponding search
- 00:30:24you know they do break it down versus
- 00:30:26the feature extraction feature matching
- 00:30:28and then geometric
- 00:30:29verification that geometric verification
- 00:30:32those options show up on that matching
- 00:30:35those matching settings screens that we
- 00:30:36just saw for each of those tabs at the
- 00:30:39bottom there was the general settings or
- 00:30:41general options and a lot of those
- 00:30:43general options are related to geometric
- 00:30:46verification saying when I'm matching
- 00:30:49these points and I want to then verify,
- 00:30:51you know, what sort of pixel error do I
- 00:30:53expect or what is the minimum number of
- 00:30:55inliers or an inlier ratio and so that
- 00:30:58those inliers are the number of
- 00:31:00geometrically verified matches between a
- 00:31:02pair of images. And so that's that's
- 00:31:04where geometric verification kind of
- 00:31:06comes into play within this call map
- 00:31:08workflow. Okay. So just move this along.
- 00:31:11Then I do want to point out I'm going to
- 00:31:13show call map one more time. At this
- 00:31:15point, you've ran both your feature
- 00:31:17extraction and feature matching. You
- 00:31:19will still see nothing on your screen.
- 00:31:21Well, you will see logs, but you will
- 00:31:22not see these camera poses, which I
- 00:31:24have. So, I have a point I have this
- 00:31:26sparse point cloud. I have these red
- 00:31:28camera positions around it, and none of
- 00:31:31this shows up because at this point, we
- 00:31:33haven't we haven't created a point
- 00:31:35cloud. We haven't projected anything
- 00:31:37yet. So, we're moving from
- 00:31:40correspondence search to, if I bring up
- 00:31:42that diagram one more time, we're moving
- 00:31:43on to incremental reconstruction, and
- 00:31:45that's where we start to see fun things
- 00:31:47happening on a cool map uh guey screen.
- 00:31:50If you're running on a guey, you'll
- 00:31:51start to see camera poses show up. So,
- 00:31:54the first step is initialization. What
- 00:31:56is that? So, is that just just starting?
- 00:31:59Yeah, that's what it is. I mean, it's
- 00:32:00it's the starting process for this
- 00:32:03incremental reconstruction. So
- 00:32:06incremental reconstruction is just one
- 00:32:08style to attempt to do 3D
- 00:32:10reconstruction. And so the the core idea
- 00:32:13here is that you know like you said we
- 00:32:16don't have any in 3D information yet. So
- 00:32:18we're going to start with the minimum
- 00:32:19amount that we need which is a pair of
- 00:32:21images. So let's start with a pair of
- 00:32:22images and then figure out what is their
- 00:32:253D relationship you know between those
- 00:32:27images as well as what 3D points did
- 00:32:30they see in the scene. And so we're
- 00:32:32going to create this two view
- 00:32:33reconstruction. take that pair of
- 00:32:34images, triangulate an initial set of 3D
- 00:32:36points, and then we use that as the
- 00:32:39initialization for the rest of the
- 00:32:41reconstruction. And so everything after
- 00:32:43that is going to figure out, well, based
- 00:32:44on these initial two images and some
- 00:32:46points, how can I add a third image to
- 00:32:49that? And how does it relate? And now
- 00:32:50that I have these three, how can I add a
- 00:32:52fourth and then a fifth and a sixth? And
- 00:32:54so you just keep adding images one at a
- 00:32:56time to grow a larger and larger
- 00:32:59reconstruction. But initialization is
- 00:33:01just what is that initial pair? Which
- 00:33:04two images am I going to start with to
- 00:33:07build this entire reconstruction?
- 00:33:09Okay. And then and then it kind of goes
- 00:33:11into a circle. So if you look at this, I
- 00:33:13say circle the the diagram on the screen
- 00:33:15shows image registration, triangulation,
- 00:33:18bundle adjustment, outlier filtering,
- 00:33:20and then if you follow the lines, you
- 00:33:22notice you're really doing a loop. Yep.
- 00:33:24So it's looping through that process.
- 00:33:26And then also this dashed line showing
- 00:33:29reconstruction. So it's kind of probably
- 00:33:30looping through that and adding to the
- 00:33:33reconstruction while it's going or Yep.
- 00:33:36Okay. Exactly right. Exactly right. So
- 00:33:38it's it's that initialization that picks
- 00:33:41the first pair of images. But as but
- 00:33:43once I have my pair of images now I'm
- 00:33:46going to enter in this loop that starts
- 00:33:49with image registration. So image
- 00:33:50registration is is a fancy name to say
- 00:33:53how does a new image how can I add a new
- 00:33:55image to my existing reconstruction. And
- 00:33:58so what it's going to look at is based
- 00:34:01on the 3D points that have already been
- 00:34:03triangulated. It's going to ask what's
- 00:34:06the best next image in my data set that
- 00:34:10also saw those points. And then if um
- 00:34:13and once I find that image you know via
- 00:34:15via the set of feature matches. So we
- 00:34:17say you know uh if if I've matched image
- 00:34:19one and two and triangulated that well
- 00:34:21two image two matched to image three
- 00:34:23well then image three is seeing the same
- 00:34:25points in the scene. So let me add image
- 00:34:27three and so there it's a 2D to 3D
- 00:34:30registration process 2D 3D pose
- 00:34:32estimation process where I take the 2D
- 00:34:34points in that third image and I want to
- 00:34:37align those 2D points with the 3D points
- 00:34:40that have been triangulated. So you
- 00:34:41might hear that as image registration or
- 00:34:44perspective endpoint problem, pose
- 00:34:47estimation. There's a few different
- 00:34:48words for what this process is, but
- 00:34:50you're adding a new image to the
- 00:34:52reconstruction. And so that's the image
- 00:34:54registration step. I do know when I ran
- 00:34:56this um I can always take a a video and
- 00:34:59kind of project onto this in post. But
- 00:35:02when it's creating this reconstruction,
- 00:35:04instead of taking image one and then
- 00:35:08image two and then image three and kind
- 00:35:11of building off that, I'll notice it'll
- 00:35:13pick, if you look at my if if you're
- 00:35:15watching this on video, you'll notice I
- 00:35:17took two loops and some of the images
- 00:35:19are like right above each other almost
- 00:35:20where I held the phone at like above my
- 00:35:22head and then I held it down at chest
- 00:35:23level. So I have two loops and there's a
- 00:35:25lot of common key points, common
- 00:35:27features. So, as it's building this up,
- 00:35:30it started at this kind of where I
- 00:35:31started walking around this this
- 00:35:33fountain, but it's using images from
- 00:35:35further along in the video extraction or
- 00:35:38sorry, the images I had. So, it use like
- 00:35:40image one and image 180 because those
- 00:35:45are next to each other and had a lot of
- 00:35:47strong feature matches. So, they're not
- 00:35:48necessarily using images in sequence of
- 00:35:50how you took them. It's ones that had
- 00:35:52strong correlation.
- 00:35:54That's a great point. That's a great
- 00:35:56point. Yeah, it it isn't just going to
- 00:35:57go, you know, 1 2 3 4 5 6, you know,
- 00:36:00it's not going to do them in order, you
- 00:36:01know, it's going to start that pair of
- 00:36:03images. It's going to look through all
- 00:36:04of the images in your collection and
- 00:36:07find the pair. And it might not be the
- 00:36:08consecutive pair, but find the pair of
- 00:36:10images, you know, that maximizes some
- 00:36:12criteria. You know, it's a pair of
- 00:36:14images that has strong connectivity. So,
- 00:36:16there were a lot of feature matches, but
- 00:36:18I also want to make sure that that pair
- 00:36:19of images has, you know, differences in
- 00:36:22viewpoint. I don't want two images that
- 00:36:24were taken at the exact same position in
- 00:36:26space because that gives me no 3D
- 00:36:29information. I need, you know, we talked
- 00:36:31about this in the last episode, this
- 00:36:32concept of a baseline. I need some sort
- 00:36:34of translation. I need some motion
- 00:36:36between two images or maybe it was in
- 00:36:38our depth map depth map episode, you
- 00:36:41know, we talked about this, you know, in
- 00:36:43that we need motion between images in
- 00:36:45order to estimate depth. So the
- 00:36:47initialization could look for the same
- 00:36:48thing. I want it wants lots of matches
- 00:36:50between the image, but it also wants a
- 00:36:52strong amount of motion between that.
- 00:36:54So, it's going to pick whichever pair of
- 00:36:56images maximizes those that criteria and
- 00:36:59once it has that, then it'll start
- 00:37:01adding other images that are strongly
- 00:37:04connected to those initial ones. And
- 00:37:05yeah, it won't necessarily do it in
- 00:37:07order that you capture those images. It
- 00:37:09can be in the order in which those
- 00:37:10connections are strongest. And I I I was
- 00:37:13seeing mostly you were I was seeing like
- 00:37:16the first photo and then somewhere
- 00:37:17further along where I came and did a
- 00:37:20loop. I saw those two photos start
- 00:37:22together because I think there was more
- 00:37:24as we were talking about a baseline was
- 00:37:26was better. There was more parallax
- 00:37:28because I have these are pretty closely
- 00:37:30spaced images I took from picture to
- 00:37:32picture. So not a lot has changed versus
- 00:37:34the next loop I have a I'm looking the
- 00:37:36exact same part of the fountain but I
- 00:37:38have a different elevation and angle. So
- 00:37:40there's a lot of parallax movement
- 00:37:42between those those images. So it it was
- 00:37:45it was matching those better as opposed
- 00:37:48to image one to image two. It's more of
- 00:37:50image one to image 180 because of that
- 00:37:52baseline was probably better. So you get
- 00:37:54to the fun thing is when you run this in
- 00:37:56the guey, this coal map, you get to
- 00:37:57watch those build and you get to see the
- 00:37:59point cloud just start to generate in
- 00:38:03front of you and you get an
- 00:38:04understanding then of what it's doing in
- 00:38:06these logs that are looping through this
- 00:38:08process over and over. And you can kind
- 00:38:10of see it just iteratively add to the
- 00:38:12scene and build and refine. When it's
- 00:38:14doing this incremental reconstruction,
- 00:38:17is it refining the camera poses as it
- 00:38:19goes or is it just saying, "Here's the
- 00:38:21camera poses. There's where they are."
- 00:38:24No, there's there's refinement. There's
- 00:38:26refinement. And a lot of times that
- 00:38:27refinement is is called bundle
- 00:38:29adjustment. That's that's a key word
- 00:38:31that's used commonly in the literature.
- 00:38:32I remember the first time I heard the
- 00:38:33word bundle adjustment. I was a first
- 00:38:35year grad student and I had no idea what
- 00:38:38the person was talking about. I was
- 00:38:39like, "What? A bundle of sticks? A
- 00:38:41bundle of what? A straw? What is going
- 00:38:43on?" Um, but no, b a bundle adjustment.
- 00:38:45So, it's the idea of refining the 3D
- 00:38:49points as well as the camera positions.
- 00:38:51And so you end up with just a bundle of
- 00:38:53constraints, you know, a bunch of
- 00:38:54constraints saying, you know, these 2D
- 00:38:56points in these images all triangulate
- 00:38:58and all saw the same 3D point in the
- 00:39:00scene, but I've got a bunch of images
- 00:39:01and I've got a bunch of points. How can
- 00:39:04I optimize the alignment of all of this
- 00:39:07data? And that's what bundle adjustment
- 00:39:09is. So yeah, so as call map is running,
- 00:39:13it's it's doing that image registration
- 00:39:15process. It'll add a new image. It then
- 00:39:17runs triangulation which creates new 3D
- 00:39:20points based on that new image and other
- 00:39:22images that are already there but then
- 00:39:24it'll do bundle adjustment which will
- 00:39:25say how can I refine that and there's
- 00:39:28two styles of bundle adjustment that I
- 00:39:30believe call map uses one of them is
- 00:39:32local bundle adjustment the other is
- 00:39:33global so a lot of times what you will
- 00:39:35see is you know suppose suppose we had
- 00:39:38already reconstructed a thousand images
- 00:39:40and we're adding that a thousand in
- 00:39:42first um when I add that thousand first
- 00:39:45you know trying to do bundle adjustment
- 00:39:46using all thousand images that takes a
- 00:39:48long time. Um, and so I can re we
- 00:39:52recognize that well that first image
- 00:39:54that that that thousand first that next
- 00:39:56image that I'm adding, you know, well,
- 00:39:57it's off in the corner of the
- 00:39:58reconstruction, you know, it's far away
- 00:40:00from the other side of the
- 00:40:01reconstruction. You know, these these
- 00:40:02things aren't really related to each
- 00:40:04other. So, I can run a local bundle
- 00:40:06adjustment. Let me just optimize only
- 00:40:08those cameras and points that are near
- 00:40:11that new image that I just added or
- 00:40:13those new points that I've triangulated.
- 00:40:15And so, that's a way to sort of do this
- 00:40:16local refinement. And I can do that
- 00:40:18every single time I add a new image. And
- 00:40:21then periodically, com will run a global
- 00:40:24bundle adjustment. So there's some
- 00:40:25settings there. I think every, you know,
- 00:40:27once the reconstruction is increased in
- 00:40:29size by 10% or you've added every, you
- 00:40:31know, 500 images or something, there's
- 00:40:33certain criteria, especially at the end
- 00:40:35of the reconstruction, homeup will run a
- 00:40:37global bundle adjustment which says,
- 00:40:39let's optimize everything. Let's
- 00:40:41optimize the points. Let's optimize the
- 00:40:44camera poses. And something we haven't
- 00:40:46mentioned is it will also be optimizing
- 00:40:48the camera parameters. So back when we
- 00:40:51picked that camera model and we said,
- 00:40:52"Oh, you know, we're going to use a
- 00:40:53camera model that has a focal length
- 00:40:55term and a principal point CX and C Y or
- 00:40:57maybe has some radial distortion terms."
- 00:40:59During bundle adjustment, COM app will
- 00:41:01also be optimizing those parameters as
- 00:41:04well to figure out well what is the
- 00:41:05field of view of my camera that's the
- 00:41:07focal length or how much lens distortion
- 00:41:09was there in order to achieve that line
- 00:41:12of. Would it run those? if you cuz we
- 00:41:15didn't cover this earlier on, but let's
- 00:41:18say you do have a camera model
- 00:41:21uh calibration file. So, you're saying I
- 00:41:23know this. I think DJI's in their again
- 00:41:27in their enterprise level drones will
- 00:41:28give you this information on their
- 00:41:31lenses cuz they've been calibrated and
- 00:41:33it's in the XF data. Will will that
- 00:41:35change? Does it do like a refinement on
- 00:41:37top of that or does it just say no, no,
- 00:41:39no, you give us that, we won't change
- 00:41:40that. That's that's an option. So I
- 00:41:43think under the either under the
- 00:41:44reconstruction options or under the
- 00:41:46bundle adjustment options there are ways
- 00:41:47to say hey do I want to refine my focal
- 00:41:50length you want to refine you know my
- 00:41:52distortion terms. Um so you could you
- 00:41:55know enable or disable that setting. To
- 00:41:57that point I do believe you know that
- 00:41:59call map will parse the XF data in those
- 00:42:02images and if it sees that yeah there is
- 00:42:03a focal length cuz a lot of times an
- 00:42:05image will you know will contain you
- 00:42:07know that oh this was taken with a 10 mm
- 00:42:09lens or a 24 mm lens you know and so
- 00:42:12call map can parse that data to take an
- 00:42:14initial guess at what it thinks that
- 00:42:16focal length is you know what's the
- 00:42:17field of view of the camera and can use
- 00:42:18that as initialization. But a lot of
- 00:42:21times there is benefit to refine that um
- 00:42:23because it may it may be make it you
- 00:42:25close but not might not be close enough
- 00:42:28to get a really sharp
- 00:42:30reconstruction. So okay so I got a lot
- 00:42:32more appreciation for what's happening
- 00:42:34here. I tell people run this on their
- 00:42:36computer. You don't need the highest
- 00:42:38spec computer to run a small data set
- 00:42:40and learn how this works. I ran this on
- 00:42:42my older computer which doesn't have you
- 00:42:45know 24 cores or anything and it still
- 00:42:47ran fairly quick. I'd say there's
- 00:42:49there's some things you gave me some
- 00:42:51notes. I think we covered largely most
- 00:42:53of it. But then from here, you can do
- 00:42:55things. So, I've ran this through. You
- 00:42:58can hit automatic reconstruction. It'll
- 00:43:00create all this, but then you can hit
- 00:43:01bundle adjustment, which is that global
- 00:43:03one at the end. And then you can build a
- 00:43:05dense reconstruction, which we're not
- 00:43:06really going to cover on this episode.
- 00:43:08This is just kind of like here's how we
- 00:43:10got that that workflow I showed to get
- 00:43:12the camera poses, the sparse point
- 00:43:14cloud, and then from there, you can use
- 00:43:15it for more downstream tasks, right? So
- 00:43:17I could use this for again doing a dense
- 00:43:203D reconstruction where you're going to
- 00:43:21I want to get millions of points on this
- 00:43:23scene or I can use this as the basis for
- 00:43:27initializing 3D god and splatting.
- 00:43:29There's just different things you can
- 00:43:31use once you got camera positions and a
- 00:43:34point cloud spar sparse point cloud. I'm
- 00:43:36showing also on my screen I didn't talk
- 00:43:38about you have these kind of magenta
- 00:43:40lines. This is showing kind of your
- 00:43:42these images matched. If I clicked on
- 00:43:44double clicked on one, it'll it'll show
- 00:43:47that kind of that information of the key
- 00:43:49points and which ones matched to it. But
- 00:43:51you can just click around and and and
- 00:43:52learn things. Double click on different
- 00:43:54parts of the scene. It'll show you the
- 00:43:57point and which which different cameras
- 00:43:59made up that point. And it's a good tool
- 00:44:02to kind of learn how this works because
- 00:44:03it's very visual on the screen. Lots of
- 00:44:06data, lots of options. You can even
- 00:44:09create animations in this if you really
- 00:44:11want to show off what you learned. There
- 00:44:12is one thing we didn't really talk
- 00:44:14about. Well, there's a couple things.
- 00:44:15So, incremental reconstruction. Everyone
- 00:44:17always complains. I got the newest GPU.
- 00:44:19This should be really fast. Why is this
- 00:44:22running so slow? My GPU is not even
- 00:44:23being used and it says it's taking 5
- 00:44:26hours to run my thousand image data set.
- 00:44:29Why is that? Why can't we use a GPU for
- 00:44:30this incremental reconstruction? Or I
- 00:44:32know we can, but why can't we in co map
- 00:44:34the way it's configured? Yeah. Yeah.
- 00:44:36Because coal map Yeah. A lot of these
- 00:44:38algorithms are not easily to parallelize
- 00:44:41on a GPU. So a GPU works well when
- 00:44:44you're doing the exact same operation on
- 00:44:46millions of things, you know, cuz that's
- 00:44:48what a GPU does. Its job is to draw
- 00:44:51pixels to a screen, you know, on your on
- 00:44:53your monitor on your desktop. And so
- 00:44:55you've got millions of pixels on your
- 00:44:56screen. And so that GPU is processing a
- 00:44:58million pixels at once and figures out
- 00:45:00what to draw. And so for tasks like
- 00:45:03feature extraction where hey I've got a
- 00:45:05again millions of pixels and I want to
- 00:45:07figure out which ones have features in
- 00:45:08them. GPU is great or feature matching.
- 00:45:12I've got tens of thousands of features
- 00:45:13in one image, tens of thousands of the
- 00:45:15other. I want to figure out which
- 00:45:16features match with each other. Then
- 00:45:18again that's great for a GPU. for
- 00:45:20incremental reconstruction. It's like
- 00:45:23I'm operating on one image at a time and
- 00:45:25I have to just solve a math equation and
- 00:45:28do some, you know, linear algebra to
- 00:45:30figure out what's the 3D position or
- 00:45:32pose of that image. That's not a very
- 00:45:34paralyzable task. And so it's it's not
- 00:45:37very easy to uh adapt some of these
- 00:45:40algorithms to the GPU. I will say in the
- 00:45:42another thing too that contributes to it
- 00:45:44is COMAP is very uh flexible. There's a
- 00:45:48lot of algorithms, a lot of switches, a
- 00:45:50lot of different techniques that you can
- 00:45:51use and to implement all of those on the
- 00:45:54GPU would just take a lot of time. It's
- 00:45:56nice having software that's flexible.
- 00:45:58You know, with Clap being open source, a
- 00:46:00bunch of people contributing to it, it's
- 00:46:02nice having a flexible platform where
- 00:46:04people can easily dive in, make changes,
- 00:46:08add their own algorithm, plug it in,
- 00:46:10tweak things, and play with it. So
- 00:46:11having that having that sort of more
- 00:46:13general purpose CPUbased implementation
- 00:46:16is is helpful. But yeah, to get back to
- 00:46:18the core, it really is primarily just
- 00:46:19around the algorithms. A lot of these
- 00:46:21algorithms are not parallelizable or or
- 00:46:24not well suited for processing on a GPU.
- 00:46:27That makes sense. I someone I once
- 00:46:29explained it or someone was trying to
- 00:46:31explain it. It's like your CPU is a
- 00:46:32really good detective at solving clue by
- 00:46:35clue one thing at a time versus GPU.
- 00:46:38It's like it can just point out all the
- 00:46:39clues all at once. Yeah. But you really
- 00:46:41need that like hard math equation. and
- 00:46:44you need a really fast cores to trying
- 00:46:46to solve those things one at a time and
- 00:46:48it's incremental. So think about it.
- 00:46:50It's like you can't you can't solve all
- 00:46:51these all at once as is. So that's
- 00:46:54something that people just have to keep
- 00:46:55in mind that don't get frustrated. It's
- 00:46:58just how this technology works today.
- 00:47:00And there's glow map. So how does glow
- 00:47:01map make this all a sudden magically
- 00:47:03fast? Yeah. So glow map is a different
- 00:47:06style for that reconstruction process.
- 00:47:09So glow map deals with global mapper you
- 00:47:12know. So global reconstruction versus
- 00:47:15incremental reconstruction. So instead
- 00:47:16of here in colap we just talked about it
- 00:47:18uses an incremental reconstruction adds
- 00:47:20you know one image at a time whereas
- 00:47:23global reconstruction it tries to figure
- 00:47:26out the 3D poses of all of the images
- 00:47:29all at once. So glow map still has that
- 00:47:32same correspondent search step. So to
- 00:47:34run glow map you still got to extract
- 00:47:36key points extract features from your
- 00:47:38image. You got to match them. Got to run
- 00:47:39your geometric verification. But once
- 00:47:42you have that web of connectivity
- 00:47:44between your images, you can then run
- 00:47:46global reconstruction techniques. And so
- 00:47:49there's a few different steps there. In
- 00:47:51glow map, they run rotation averaging
- 00:47:54first. So the idea with that is that you
- 00:47:57look at all of the feature matches
- 00:47:59between your pairs of images. For each
- 00:48:02pair, you estimate how much rotation
- 00:48:05occurred between that pair of images,
- 00:48:07you know. So that gives you a
- 00:48:08constraint. But now if I look at all of
- 00:48:10the rotations that I estimated between
- 00:48:12all of the pairs, can I come up with a
- 00:48:15consistent orientation for all of my
- 00:48:17images that satisfies each of those
- 00:48:19pair-wise constraints? So, can I arrange
- 00:48:22the orientations of my images so that
- 00:48:24all of those pair-wise rotations make
- 00:48:26sense? And that's what rotation
- 00:48:28averaging does. So, it's not even
- 00:48:30looking at position. It's just trying to
- 00:48:32rotate all of the images. And once
- 00:48:34they're rotated in 3D space, then it
- 00:48:36does a global positioning step which
- 00:48:39simultaneously solves both the camera
- 00:48:41positions as well as some of the 3D
- 00:48:43points. And so it kind of throws all of
- 00:48:45the cameras into a big soup, a big mess.
- 00:48:47It gives them a bunch of random
- 00:48:49initializations and then defines these
- 00:48:51constraints saying, well, these images
- 00:48:53saw these common points. How can I
- 00:48:56rearrange all of these images so that
- 00:48:59they line up and see those common
- 00:49:01points? So it's it's similar to bundle
- 00:49:03adjustment. So that the idea of take a
- 00:49:05bunch of images that see points and
- 00:49:07refine it, but uh it uses a different
- 00:49:10formulation, a different set of
- 00:49:11constraints that is better suited to,
- 00:49:15you know, random unknown camera
- 00:49:16positions. And so that's this global
- 00:49:18positioning sol problem that they solve.
- 00:49:20So that gets you pretty pretty so once
- 00:49:22you've run your rotation averaging, your
- 00:49:23global positioning, you get a
- 00:49:24reconstruction that's pretty close. And
- 00:49:26then you can run bundle adjustment, you
- 00:49:29know, an actual high quality refinement
- 00:49:31using bundle adjustment. And then you
- 00:49:33have your your 3D reconstruction. So it
- 00:49:35skips a lot of this incremental slow
- 00:49:37process that wasn't parallelizable. The
- 00:49:39rotation averaging uh and global
- 00:49:41positioning that's a little better
- 00:49:42suited to parallelization and is is more
- 00:49:44efficient because you're not having to
- 00:49:45do this one after the other after the
- 00:49:47other. Yeah. And I have it on my screen
- 00:49:49here, the project page where it kind of
- 00:49:51showed you were talking about. And this
- 00:49:53last showing it all happening all at
- 00:49:55once where it just kind of all just kind
- 00:49:58of resolves at once. I do want to say
- 00:50:01that it it's something it's it to me
- 00:50:03it's there's a low a low what's the
- 00:50:06right words? It's not you're not going
- 00:50:08to be wasting a lot of your time to give
- 00:50:10this a shot to see if this works well
- 00:50:12for your project because you don't have
- 00:50:14to wait a lot of time for it to do the
- 00:50:16incremental reconstruction. So, it
- 00:50:17doesn't work well with all scenes as I
- 00:50:19found, but because you know within
- 00:50:22minutes if it's going to work well or
- 00:50:23not, it's worth a shot and you get to
- 00:50:26learn what scenes work well with it.
- 00:50:27You've done some tests as well, Jared.
- 00:50:30You kind of you can't get too tied in on
- 00:50:31a bunch of little things. I feel like
- 00:50:33you need a more of a global view or a
- 00:50:35you know, the the example images have a
- 00:50:37lot of features and aren't really close
- 00:50:39tied in on little features in a scene.
- 00:50:43Mhm. Yeah. You want from my for my
- 00:50:46experience with glow map and and other
- 00:50:48global structure for motion, global
- 00:50:49reconstruction techniques, they work
- 00:50:52best when you have a lot of connections
- 00:50:55between your images. Mhm. So it's not
- 00:50:58you just walking through a cave or
- 00:51:00walking down, you know, a city street
- 00:51:01and never returning back. It likes a lot
- 00:51:04of loop closures. It likes a lot of
- 00:51:06connectivity, a lot of different vantage
- 00:51:08points and overlap and diverse content.
- 00:51:10And so it it takes the strength of those
- 00:51:14diverse and dense connections and very
- 00:51:17quickly figures out how to arrange them
- 00:51:18to produce that final reconstruction.
- 00:51:20And that's probably why in my experience
- 00:51:22when I have these more broader view
- 00:51:24shots, it works well because I have a
- 00:51:25lot of connections. I have a lot of
- 00:51:28unique features and you get too close in
- 00:51:31on one little object or you have a lot
- 00:51:33of like I think inside I've done some
- 00:51:35indoors that haven't turned out because
- 00:51:36you have a lot of just blank white walls
- 00:51:38with not a lot of features. So, it's
- 00:51:40just not able to do that. So, all right.
- 00:51:43Well, this is something I say I had on
- 00:51:44my screen just to to kind of show some
- 00:51:47examples. If you're listening, I I will
- 00:51:49make sure I'll link in the show notes as
- 00:51:50well. Glow map and coal map, but glow
- 00:51:53map's an interesting one you can look
- 00:51:55at. It's it it drops on top of coal map.
- 00:51:59So, even get it running isn't like a
- 00:52:00large lift. And you see Johannes in the
- 00:52:03the list of names. So, you can see he's
- 00:52:05still working on these things. I think
- 00:52:07this is interesting because it does make
- 00:52:08things go faster. And if you look in the
- 00:52:10results that they are in the same range
- 00:52:13of accuracy as you get with incremental
- 00:52:15reconstruction using coal maps. So it's
- 00:52:17not saying well this is fast but it's
- 00:52:18not nearly as good. It's fast and is
- 00:52:20good if you have a good result but you
- 00:52:22find out really quick because I've
- 00:52:23noticed that the results either are
- 00:52:25absolutely all over the place or you
- 00:52:27have a really good sparse point cloud
- 00:52:29and so you know if it's good or not. In
- 00:52:31fact, you'll see cameras all over the
- 00:52:33place where everything's kind of like
- 00:52:34this weird looking cube and and that's
- 00:52:37how you know it didn't work. But you
- 00:52:39will know based off of your output.
- 00:52:40Yeah, I've gotten a few B bork I say
- 00:52:43Borg cubes, that's what I think they
- 00:52:44look like, but I think I've gotten a few
- 00:52:46cubes in my uh Yeah, and as my results.
- 00:52:49All right. Well, I think we covered I
- 00:52:51think we covered this all really well. I
- 00:52:53hope at the end of this people will go
- 00:52:56try coal map or go I mean even if they
- 00:52:58use other software it will follow
- 00:53:01relatively the same sort of process. I
- 00:53:04don't think you could maybe there's
- 00:53:05other ways it's done. I'm sure there is,
- 00:53:06but this is the standard kind of method
- 00:53:09that most at least follow this sort of
- 00:53:12style. And now there's all this machine
- 00:53:14learning stuff that's different. But as
- 00:53:15far as classical 3D reconstruction from
- 00:53:18imagery, this is a very well-known and
- 00:53:20reused pipeline for a lot of projects.
- 00:53:24Yeah. And it's a great, like you said,
- 00:53:25like just go and try that. That's that's
- 00:53:27I can't stress that enough. Just just
- 00:53:28try it. you know, if if you're either
- 00:53:31one just get, you know, new to computer
- 00:53:33vision and want to understand how 3D
- 00:53:34reconstruction works, you know, or maybe
- 00:53:36you kind of understand it but don't
- 00:53:37under, you know, but want to get a
- 00:53:39better insight of how things work behind
- 00:53:41the scenes. A tool like call map is
- 00:53:43great just to, you know, throw some
- 00:53:44images at it, run a reconstruction, and
- 00:53:46then start poking around. There's a lot
- 00:53:48of neat visualizations that Jonathan
- 00:53:50showed where you can look at a point and
- 00:53:51see which image is solid or in an image,
- 00:53:54what did it match to. There's other
- 00:53:56debug visualizations where you can look
- 00:53:57at sort of the match graph or the match
- 00:53:59matrix and see how uh the different
- 00:54:02patterns or ways that images are
- 00:54:04matching to each other. So, it's it's a
- 00:54:06nice way to get in get your hands dirty
- 00:54:09and see how this process of turning
- 00:54:12pixels to 2D information to final 3D
- 00:54:15results, you know, and and that mapping
- 00:54:17from, you know, 2D to 3D and all the uh
- 00:54:20information that goes into that. So,
- 00:54:21it's a great way to get in there and get
- 00:54:22an intuition for how this all works
- 00:54:23behind the scenes. Yes, definitely. And
- 00:54:26I would say the most important part when
- 00:54:29you're trying to run this is picking the
- 00:54:31right matching strategy because that can
- 00:54:33be the that can be the difference
- 00:54:34between waiting hours and an hour or
- 00:54:37minutes. So, well, thanks Jared for this
- 00:54:39episode and kind of covering all this
- 00:54:42stuff. I hope this was tangible enough
- 00:54:44for people to go try it and having the
- 00:54:46visuals up. So, if you're listening, go
- 00:54:48find this video on the EveryPoint
- 00:54:51YouTube channel. We have a playlist of
- 00:54:53all of our episodes. I'll make sure. I
- 00:54:56haven't named it yet, but I'm sure
- 00:54:57Colemap will be in the name. It'll be uh
- 00:55:00I can't remember what episode we're on,
- 00:55:01but it's like 15 or 16. You will see
- 00:55:04that it's a great it's a it'll be a
- 00:55:07great way for you to learn this if
- 00:55:08you're if you're getting into there, cuz
- 00:55:09I see every day I didn't go over these,
- 00:55:11but we have questions I see every day
- 00:55:13either on my videos or on Reddit or
- 00:55:18Discord. There's these different
- 00:55:19communities that are all using projects
- 00:55:21that require coal map to run to start
- 00:55:24think 3D gods been splatting and it's
- 00:55:26just obvious that this is something that
- 00:55:28people just know they have to use but
- 00:55:30have no idea what's happening. They just
- 00:55:32know they threw a bunch of images at it
- 00:55:34and something came out and then they're
- 00:55:36going to do something else with it. But
- 00:55:38they have no appreciation for the
- 00:55:40sausage making of coal mapping. If you
- 00:55:43know what each step is, you can get
- 00:55:45better results in my opinion. Just play
- 00:55:47with it. see what works, learning what
- 00:55:49those different options are. If you
- 00:55:51don't know what an option is as well,
- 00:55:52jump on our YouTube channel, ask a
- 00:55:54question. I will be watching and trying
- 00:55:56to respond as intelligently as possible
- 00:55:59on those and and give you a a good
- 00:56:01answer. So Jared, any other parting
- 00:56:03thoughts you want on this? You you said
- 00:56:04go get give it a try. Any other tips you
- 00:56:07would give people? Take good sharp
- 00:56:09imagery. Take I just do it do it
- 00:56:11yourself. Get out and try, you know,
- 00:56:12take your own photos and see how see how
- 00:56:14they turn out. Yeah, take your own
- 00:56:16photos. Don't go use the like open-
- 00:56:18source data sets because they know those
- 00:56:20are going to work and you know those are
- 00:56:22great for testing but not great for
- 00:56:24learning on your own data. So right well
- 00:56:28thank you and if again you're if you're
- 00:56:29listening this will be on all major
- 00:56:31podcast players please if you can
- 00:56:34subscribe to our to our channel or to
- 00:56:37one of our podcast episodes that'll mean
- 00:56:39a lot to us know that we're making the
- 00:56:40right content and that you guys care
- 00:56:42about learning about this information.
- 00:56:44And as always, let us know in the
- 00:56:46comments as well on our YouTube channel
- 00:56:47if there is something here that you
- 00:56:49would like us to go deeper in. Maybe we
- 00:56:51can get someone like Johannes on one of
- 00:56:53these episodes to go super deep if you
- 00:56:56want to. Anyways, well, thanks Jared for
- 00:56:58being on this episode and I'll see you
- 00:57:00guys in the next
- Colmap
- 3D Reconstruction
- Structure from Motion
- Feature Extraction
- Camera Pose
- Geometric Verification
- Incremental Reconstruction
- Global Reconstruction
- Computer Vision
- Open Source