In the video below (originally seen here), John Gonzalez, a director for the NFL, says that "the greatest thing that's happened to football coverage is the yellow line. I really love watching it and knowing - and not guessing, but knowing - where they have to go to get a first down."
This video describes the technology behind the magic yellow line. The most important part of the technique is the ability to determine the orientation of the field relative to the camera. Knowing the perspective of the field allows the line to be placed directly on the surface at the desired position. (Then, of course, it's important to show the line underneath the players, but that's not the part I want to focus on here.)
The cameras used by the NFL are loaded with various sensors that help determine their pose relative to the field. For example, the tilt of the camera can be determined using basic accelerometers. The features of the camera itself, like its current zoom and its physical location in the stadium, can also be used. Once all this data is sent for processing, it can be determined exactly where the field is in the video captured, and therefore where to augment it with the yellow line.
In a sense, this is similar to what I'm trying to achieve with my thesis. In my case, the "football field" is some scene in an urban outdoor area. Let's say it's the National Art Gallery here in Ottawa. The video camera becomes any old camera on a mobile device that a tourist is carrying. Instead of painting yellow lines onto field, I want to be able to augment arbitrary models onto a photograph of the gallery. Maybe I want to add little baby spiders all around the famous Maman sculpture out front.
Unlike the football cameras, though, I don't have the luxury of knowing much about the position of the mobile device. Sure, GPS can get me within a few meters of my 2D position on the ground, but that's not good enough for making a natural augmentation. I need a more accurate position and additional information about how high above the ground the camera is held.
So, instead of using sensors and a priori knowledge, I am trying to match the photos with panoramas that were created with this information. That way, when I know the transformation between the two, I can then project models set up for the panorama into the photo and send the result back to the tourist, just for fun. (I think there are lots of cool applications of being able to augment photos from mobile phones, but I'll save that for another day.)