In the video below (originally seen here), John Gonzalez, a director for the NFL, says that "the greatest thing that's happened to football coverage is the yellow line. I really love watching it and knowing - and not guessing, but knowing - where they have to go to get a first down."
This video describes the technology behind the magic yellow line. The most important part of the technique is the ability to determine the orientation of the field relative to the camera. Knowing the perspective of the field allows the line to be placed directly on the surface at the desired position. (Then, of course, it's important to show the line underneath the players, but that's not the part I want to focus on here.)
The cameras used by the NFL are loaded with various sensors that help determine their pose relative to the field. For example, the tilt of the camera can be determined using basic accelerometers. The features of the camera itself, like its current zoom and its physical location in the stadium, can also be used. Once all this data is sent for processing, it can be determined exactly where the field is in the video captured, and therefore where to augment it with the yellow line.
In a sense, this is similar to what I'm trying to achieve with my thesis. In my case, the "football field" is some scene in an urban outdoor area. Let's say it's the National Art Gallery here in Ottawa. The video camera becomes any old camera on a mobile device that a tourist is carrying. Instead of painting yellow lines onto field, I want to be able to augment arbitrary models onto a photograph of the gallery. Maybe I want to add little baby spiders all around the famous Maman sculpture out front.
Unlike the football cameras, though, I don't have the luxury of knowing much about the position of the mobile device. Sure, GPS can get me within a few meters of my 2D position on the ground, but that's not good enough for making a natural augmentation. I need a more accurate position and additional information about how high above the ground the camera is held.
So, instead of using sensors and a priori knowledge, I am trying to match the photos with panoramas that were created with this information. That way, when I know the transformation between the two, I can then project models set up for the panorama into the photo and send the result back to the tourist, just for fun. (I think there are lots of cool applications of being able to augment photos from mobile phones, but I'll save that for another day.)
Hi Gail. That's amazing! There'd be a ton of applications for your work. I think that would be a cool way for people to surf though information in a contextual kinda way, if your technique could be used to "recognize" the scene. Like, if you were looking at the National Art Gallery from a certain angle, maybe a little history database could tell you if anything historically significant happened from your vantage point. Or... heh, maybe you'd put the system in some magic glasses and you could choose your theme for the day. "Today, I want everyone I see to be wearing clown noses!"
ReplyDeleteDefinitely! The panoramas themselves are actually part of a larger project called NAVIRE, based at the University of Ottawa (our computer science programs are joint). A lot of these ideas should fit into this project, so as I learn more about future directions, I'll try to post information here. Love your enthusiasm! :)
ReplyDeleteWahoo, me again! Gail, I thought of you when I stumbled across this today. "New Insight Into How Bees See Could Improve Artificial Intelligence Systems". http://www.sciencedaily.com/releases/2009/01/090123101211.htm I thought of you because the 2 faces viewed from different angles reminded me of the panoramas you're working with. I guess it's kinda different, but, it follows the theme of "extrapolating a 3D model from a 2D picture" which I have attributed to your work in my own mind. Or am I way off? lol
ReplyDeleteAnyways, take care!
Frozo (Steph)
Cool article! It's amazing that they somehow know how bees are doing facial recognition, but the fact that a relatively simply neural system could recognize faces from different viewing angles is promising.
ReplyDeleteIn my case, I'm not using artificial intelligence techniques, nor am I solving a recognition problem. Instead, I'm trying to match keypoints between the two images to try and get relative camera positions (so we can assume that we already know we are looking at images of the same thing). I guess in some ways this is easier, and some ways harder. I am wondering to myself now, though, whether I should take another look into various AI strategies to see if any of them might be of use...