Thursday, November 6, 2008

Thesis Update: Geometry Between Cubes and Photos

This post is for anyone wondering how my thesis has been going. It's a bit more technical than some of my other stuff. I am presenting some of the more important results with explanations of the methodology on my personal website.

First off, I'd like to share a description of my research that I used when applying for a PhD scholarship to ensure we're all on the same page (because the exact research goals and purpose change often in my mind):
Spherical panoramas have been used in such high-profile applications as Google's Street View to allow users to naturally explore real-world images from the comfort of their own homes. In the case of Google, crude street information is augmented onto panoramic images to aid navigation both while viewing the panorama
and, in theory, while driving or walking on location. It may be useful to augment a photograph (taken with a cell phone for example) of an intersection or tourist attraction while a person is actually standing there, but an exact camera location would be required to do this.

One way to obtain this camera location is to compare the photograph with nearby spherical panoramas (which can be found using a rough GPS location estimate). If the panoramas have been captured and saved with positional information, then the scene geometry between the user's camera and the panoramas will help recover information about that camera's position, thereby allowing for an accurate augmentation.

Previous work [1] has established a method to recover position information between two panoramas, and the theory established there may be applicable to this case of comparing a photograph with a panorama. This will be verified during the course of the research, but the thesis will mainly investigate the best way to efficiently
obtain a large number of match correspondences between the photograph and the panorama, as this is the first step to finding the mathematical structures that describe the geometry. The format of the panoramas in [1] is that of a cube. Because of the 'seams' along adjacent faces, some feature points may not find correspondences to the same features visible in the planar photograph. As such, this
format will be compared with a cylindrical representation of the panoramas, which has no seams but must deal with curvature issues, to see if more correspondences might be found.

In addition, it must be determined what information should be stored on a central server along with the set of pre-captured panoramas. As much work as possible should be pre-computed to ensure the user's photograph is sent back with an augmentation as soon as possible.

While road information such as that augmented onto images in Street View may not need to be highly accurate, there are many other applications that would require more precision. For example, virtual objects or textual information could be added to the photograph before it is sent back to a tourist learning about a historically significant area. In a case like this, a natural augmentation obtained with an accurate camera location is all but essential.

[1] Kangni, F. and Laganiere, R. (2007) Orientation and Pose recovery from Spherical Panoramas. ICCV

Basically, I'm trying to figure out the best way to find matches and/or an essential matrix between a cubic panorama and a photograph.

So far, I have taken the theory from [1], which explained how to find an essential matrix between two cubic panoramas, and modified it to work with a cubic panorama and a photograph. The trick here was that the photograph would not be calibrated (i.e. we don't know the camera's properties like its focal length). Usually, this would mean that we'd want to find a fundamental matrix instead. However, this would force us to abandon the advantageous ability to consider all faces of the cubic panorama at the same time (by using the normalized 3D coordinates of points on the face images). We would have to match each face individually, and once we found the face with the most matches, find a fundamental matrix between it and the photo. It would seem that using points on more than one face would help us get more matches and a more accurate result.

Instead, I looked into the possibility of using calibrated points from the cubic panorama and uncalibrated image points. The resulting matrix to find would be a cross between the calibrated essential matrix and the uncalibrated fundamental matrix. The basic idea is informally presented here, and I call it a "pseudo-essential matrix."

By hand picking some matches between some panoramas and photos, I was able to ensure that the pseudo-essential matrix idea was sound. Some initial results showing this are available here. The only major issue seen here was the instability around the epipoles.

I am currently working on improving the ability to find a pseudo-essential matrix automatically. The early progress can be seen here. Many of the matches found after the nearest-neighbour thresholding appear to be correct, but a pseudo-essential matrix is not found. I need to check whether the quality measures are too strict, and perhaps evaluate how my RANSAC algorithm is working.

By the time I am finished my research, I don't think that the concept will be ready for using on consumer mobile devices. But it would be really cool to see it used as a starting point for the next great mobile app! With the resources available at places like Google, I'm convinced it's doable.


Post a Comment

Comments are moderated - please be patient while I approve yours.