|
Nebojsa Jojic
Understanding multimedia using generative models
Most of the research on understanding natural signals is based on some sort of a model of the world. These models have typically been highly specific about one aspect of the world, for instance, the appearance of a human face, or the motion type of a layer, or the spectral characteristic of speech but addressing other, "non-interesting" parts of the scene is avoided, or left to a separate integration module. The limited flow of information and limited adaptivity of such systems make them very brittle in realistic applications. In order to build more robust understanding algorithms, models need to be capable of capturing various aspects of the data at the same time, be fairly simple, but adapt to the data.
Generative models, as defined by the machine learning community, are flexible models that describe the data of interest through a feasible generation process, starting only from a minimal number of parameters and using sampling from appropriate probability distributions to introduce variability. While the generative process itself is rarely used directly, the descriptive power of the model is used for inference, classification, and data manipulation.
In this talk, I will overview the generative approach to multimedia understanding, and report some of our recent results on audio-visual tracking; multimedia clustering, search and retrieval; and video editing, such as object extraction, illumination correction, stabilization, etc. This is joint work with Brendan Frey and Hagai Attias.
|