Google Nets Major Video Object Recognition Patent

August 29, 2012, by Mandour & Associates, APC

Google Inc. was issued a patent on Tuesday covering technology allowing for automatic recognition of large objects in videos without any need for a user’s assistance, pointing the way to possible new online video applications on YouTube or elsewhere.

U.S. Patent Number 8,254,699, titled “Automatic large scale video object recognition,” involves an object recognition system that performs a number of rounds of dimensionality reduction and consistency learning on visual content items such as videos and still images. This results in a set of “feature vectors” that accurately predict the presence of a visual object represented by a given name within a visual content item.

The feature vectors are stored in association with the object name which they represent and with an indication of the number of rounds of dimensionality reduction and consistency learning that produced them. Consistency learning involves comparing a feature vector to other feature vectors, such as those for the same object name, and those for different object names, and calculating a score based on the comparisons.

The feature vectors and the indication can be used for various purposes, such as quickly determining a visual content item containing a visual representation of a given object name, according to the patent description.

“Currently, automated recognition within a digital video of images of real-world objects of interest to a user, such as people, animals, automobiles, consumer products, buildings, and the like, is a difficult problem,” the patent description says. “The use of unsupervised learning techniques, in which the explicit input of human operators is not required to learn to recognize objects, has not yet been achieved for large-scale image recognition systems.”

“Conventional systems rely on direct human input to provide object exemplars explicitly labeled as representing the object, such as a set of images known to include, for example, dogs, based on prior human examination,” the patent description says. “However, such human input is expensive, time-consuming, and cannot scale up to handle very large data sets comprising hundreds of thousands of objects and millions of images.”

This is particularly a problem in the context of video hosting systems, such as Google Video or YouTube, in which users submit millions of videos, each containing numerous distinct visual objects over the length of the video, according to the patent.

Inventors Ming Zhao of San Jose, California and Jay Yagnik of Mountain View, California filed the application for the patent in February 2009.