Google Can Now Describe Your Cat Photos

By Rolfe Winkler

Google’s trained computers recognized that this is a photo of “two pizzas sitting on top of a stove top oven” *Google*
Google’s trained computers recognized that this is a photo of “two pizzas sitting on top of a stove top oven” Google

Google ’s computers learned to recognize cats in photos. Now, they’re learning to describe cats playing with a ball of string.

Computer scientists in the search giant’s research division, and a separate team working at Stanford University, independently developed artificial-intelligence software that can decipher the action in a photo, and write a caption to describe it. That’s a big advance over previous software that was mostly limited to recognizing objects.

In a blog post, Google described how it is using advanced “machine-learning” techniques that mimic the human brain to recognize a photo of “a person riding a motorcycle on a dirt road,” or “a herd of elephants walking across a dry grass field.” The new software can “capture the whole scene and generate corresponding natural-looking text,” says Yoshua Bengio, a professor of computer science at the University of Montreal and a leading expert in the field. That defies predictions that software would be limited to recognizing objects, he said.

The new technology could lead to big improvements in the accuracy of Google’s image-search results, which today often rely on text found near a photo on a web page. One day it might help people search vast libraries of untagged photos or videos stored on smartphones, says David Bader, a professor of computer science at Georgia Tech. A startup called Viblio is using similar research out of Simon Fraser University to automatically categorize videos.

In 2012, a Google/Stanford team famously taught a computer to learn how to recognize cats. The computer was shown millions of images from YouTube videos, and used then-state-of-the-art machinelearning algorithms to teach itself to spot felines.

Similar advances are helping improve other Google services. Earlier this year, Google researchers disclosed how its computers had learned to read house numbers from images captured by its Street View cars, making it quicker and easier to locate buildings in Google Maps, for instance.

Google is making big bets on artificial-intelligence technology. Earlier this year, it paid hundreds of millions of dollars to acquire Deep Mind Technologies, a London-based startup that employs many specialists in advanced machine learning. Earlier, it bought DNNResearch, a small company started at the University of Toronto, in order to hire a top academic in machine learning, Geoffrey Hinton.

Artificial-intelligence research also helps speech-recognition software, used by smartphone assistants like Apple ’s Siri or Google voice search.

Others also are investing in the field. Facebook scooped up a top artificial-intelligence academic late last year. Meanwhile, Chinese search engine Baidu has said it will invest $300 million in an artificial intelligence lab in Silicon Valley. To lead the lab, Baidu hired the head of Stanford’s artificial-intelligence lab, Andrew Ng, who helped build the computer that taught itself to recognize cats from YouTube videos.

David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.