Most of what I've done so far regarding my thesis research has been reading and playing with other peoples' code. I started off playing with matching code written by Rob Hess of Oregon State University. He seems to have created his own implementation of SIFT (rather than using the inventor's freely available binary file) and used the keypoint descriptors to find matches between two images. You can also have the program try finding an appropriate transformation between the two images using RANSAC methods. I started looking at this implementation because it should be easy to use as a base for working with SURF descriptors instead, as well as writing several matching strategies to test.
I was rather confused when two images that seemed very similar were not getting the kind of matching I would have expected (and no transformation was found between them). The images taken were of the same building, with a change in viewpoint that isn't particularly significant.
This image was taken with my digital camera a couple of weeks ago:
This is the portion of the panoramic image that contains the same building. I don't know when it was taken. The panoramic images in this case are stored as cubes laid out in the plane. This is one face of that cube.
What do you notice that's different about these images? The illumination isn't quite the same, but that is not a factor when using the more sophisticated keypoint detectors and descriptors. Look closely at the windows of the building. They appear distorted in one image compared the other.
That's apparently because the focal lengths of my digital camera (6mm) and the camera used to capture the panoramic images (~2mm) are very different, causing a perspective distortion. Or so my profs surmised when we last met. This would mean that one of the images would have to be modified to undo the distortion. It also means that I will be using only images from my digital camera to test the matching code I'm working on (at least for now).
12 comments:
Is there any meta data in the image file to describe the focal length? You could automate the un-distortion phase that way...
Yup, a lot of devices do, but not all. Using that info was the plan, though I'm not sure what to do for those that don't embed it. Note this table for example.
Any body of research focused on automagically extracting the focal length from an arbitrary image?
Haven't really looked yet to be honest. Vaguely remember from the computer vision class that there are ways of finding camera calibration, but it may require, say, taking a picture of a known calibration pattern to do so. I'll obviously have to look into this, but I'm pushing it aside for now to concentrate on the first steps.
There are several packages that will help you do ortho-rectification. The hugin download site has a link for their lense database which may help. Hugin itself allows you to use control points to try curve fitting the undistort coefficients. You would want to save those coefficients because you will need them to compute the camera location.
You can actually do a distort by mapping the image as a texture onto a curved surface in OpenGL. You could try a simple sphere at first and then use NURBS to match the curve fitting.
I have been looking for a alter-to-match-histogram algorithm for a couple of years now with no success. That should be irrelevant to your needs though as you will likely process with edge detect for feature extraction. You may consider something more like Hausdorff distance to register your image.
You may also want to play with Sketchup Photomatch with multiple photos in order to explore the relevant issues. For example don't crop photos before processing them because it will confuse focal point analysis.
I'd have thought the biggest difference is that regardless of focal length the second looks to have been taken from much closer to the building. From having played with hugin/panotools a fair bit I know that a smallish change in position can throw the detection off, so something as large as that I'd expect to break it.
@arbaboon: Thanks for the ideas, great wealth of knowledge you have there!
@john: I suppose it's possible, though I have been able to successfully match images from different distances provided they were taken with my own camera. I'm currently setting up matching with SURF descriptors instead of SIFT and will have more opportunity to play with that later on.
You might take a look at Hugin. It's an open source project that does a lot of this panoramic work, including enblend and panotools that do the actual blending between pictures.
There's also working going into building an open source lens database. As far as I know today it is still being discussed exclusively on the create mailing list, but I imagine it'll get it's own as it grows.
Hugin is definitely the popular choice of the day, it seems! I will definitely be looking into that.
Did some experimenting this weekend, and actually the autopano-sift did work pretty darn well at picking up points for pretty varying views taken on the same day with the same camera, but panotools couldnt find a transform to map them without distorting it all crazily.
The shots I was playing with were all taken at pretty much the same time so lighting etc was consistant, I'd guess thats the other obvious difference to consider.
Interesting... I haven't downloaded that stuff to try out yet, but am definitely looking forward to it. Thanks for letting me know about your experiments! It's times like these that make me love the Internet more and more :D
This is something that may be of interest if your into this stuff, although you may have already seen it
http://labs.live.com/photosynth/default.html
Post a Comment
Comments are moderated - please be patient while I approve yours.
Note: Only a member of this blog may post a comment.