Learning Active Basis Model for Object Detection and Recognition

Yingnian Wu, Zhangzhang Si, Haifeng Gong, Charles Fleming and Song-Chun Zhu

We proposes an active basis model, a shared sketch algorithm, and a computational architecture of sum-max maps for representing, learning, and recognizing deformable templates. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at selected locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate the observed image. The active basis model, in particular, the locations and the orientations of the basis elements, can be learned from training images by the shared sketch algorithm ... [project page]

From Image Parsing to Painterly Rendering

Mingtian Zhao, Kun Zeng, Caiming Xiong and Song-Chun Zhu

We present a semantics-driven approach for stroke-based painterly rendering, based on recent image parsing techniques [Tu et al. 2005; Tu and Zhu 2006] in computer vision. Image parsing integrates segmentation for regions, sketching for curves, and recognition for object categories. In an interactive manner, we decompose an input image into a hierarchy of its constituent components in a parse tree representation with occlusion relations among the nodes in the tree. To paint the image, we build a brush dictionary containing a large set (760) of brush examples of four shape/appearance categories, which are collected from professional artists, then we select ... [project page]

Learning Animated Basis Model for Action Detection & Recognition

Benjamin Z. Yao and Song-Chun Zhu

We present an animated basis model that is learnable from cluttered real-world videos. In our generative model, an action template is a sequence of image templates each of which consists of a set of shape and motion primitives (Gabor bases and optical-flow patches) at selected orientations and locations. These primitives are allowed to slightly perturb their locations and orientations to account for spatial deformations. We use a semi-supervised learning procedure to learn from weakly labeled videos with cluttered background ... [project page]

Layered Graph Matching with Composite Cluster Sampling

Liang Lin, Xiaobai Liu and Song-Chun Zhu

We study a framework of layered graph matching for integrating graph partition and matching with graph editing. The objective is to find an unknown number of corresponding graph structures in two images. We extract discriminative local primitives from both images and construct a candidacy graph whose vertices are match candidates (i.e., a pair of primitives) and whose edges are either negative for mutual exclusion or a positive for mutual consistence. Then we pose layered graph matching as a multi-coloring problem on the candidacy graph. We adapt a composite cluster sampling algorithm to work with both positive and negative edges.

A Hierarchical and Contextual Model for Aerial Image Parsing

Jacob Porway, Qiongchen Wang and Song-Chun Zhu

We present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a non-recursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference.

Learning mixed image templates for object recognition

Zhangzhang Si, Haifeng Gong, Ying Nian Wu and Song-Chun Zhu

This project explores both local structure and local texture features in learning image templates, which are maximum likelihood representations of observed images from specified object categories. We also investigates the relative importance of the sketches and textures for different object categories. Local sketches and local textures in the object templates account for shapes and appearances respectively. Both local sketches and local textures are extracted from the maps of Gabor filter responses. The local sketches are captured by the local maxima of Gabor responses, where the local maximum pooling accounts for shape deformations in objects. The local textures are captured by the local averages of Gabor filter responses, where the local average pooling extracts texture information for appearances. The selection of local sketch variables and local texture variables can be accomplished by a projection pursuit type of learning process, where both types of variables can be compared and merged within a common framework. The learning process returns a generative model for image intensities from a relatively small number of training images. The recognition or classification by template matching can then be based on log-likelihood ratio scores. We apply the learning method to a variety of object and texture categories.

© 2002-2009 UCLA CIVS