Talk at USC's IRIS (2004): "Temporal Reasoning from Video to Temporal Synthesis of Video"
- Irfan Essa (2004), “Temporal Reasoning from Video to Temporal Synthesis of Video” Talk at USC’s IRIS-Vision Seminars (Fall 2004).
Temporal Reasoning from Video to Temporal Synthesis of Video
Abstract
In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex activities. I will start off with some of our earlier work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some of our extensions to this approach that allow for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe our new approach for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our method allows for iterative refinement and extends to synthesis of both images and video from very limited samples. In the next part of my talk, I will describe how a similar analysis of video can be used to recognize what a person is doing in a scene. Such an analysis of video, aimed at recognition, requires more contextual information about the environment. I will show how we leverage contextual information shared between actions and objects to recognize what is happening in complex environments. I will also show that by adding some form of grammar (we use Stochastic Context Free Grammar) we can recognize very complex, multi-tasked activities.
If time permits, I will describe (very briefly) the Aware Home project at Georgia Tech, which is one primary area of ongoing and future research for me and my group. Further information on my work with videos is available from my webpage at http://www.cc.gatech.edu/~irfan