Jay Taylor's notes

back to listing index

How to Classify Images with TensorFlow

[web search]

Original source (googleresearch.blogspot.com)

Tags: tensorflow googleresearch.blogspot.com

Clipped on: 2016-03-23

Google Research Blog

The latest news from Research at Google

How to Classify Images with TensorFlow

Monday, December 07, 2015

Posted by Pete Warden, Software Engineer

Prior to joining Google, I spent a lot of time trying to get computers to recognize objects in images. At Jetpac my colleagues and I built mustache detectors to recognize bars full of hipsters, blue sky detectors to find pubs with beer gardens, and dog detectors to spot canine-friendly cafes. At first, we used the traditional computer vision approaches that I'd used my whole career, writing a big ball of custom logic to laboriously recognize one object at a time. For example, to spot sky I'd first run a color detection filter over the whole image looking for shades of blue, and then look at the upper third. If it was mostly blue, and the lower portion of the image wasn't, then I'd classify that as probably a photo of the outdoors.

I'd been an engineer working on vision problems since the late 90's, and the sad truth was that unless you had a research team and plenty of time behind you, this sort of hand-tailored hack was the only way to get usable results. As you can imagine, the results were far from perfect and each detector I wrote was a custom job, and didn't help me with the next thing I needed to recognize. This probably seems laughable to anybody who didn't work in computer vision in the recent past! It's such a primitive way of solving the problem, it sounds like it should have been superseded long ago.

That's why I was so excited when I started to play around with deep learning. It became clear as I tried them out that the latest approaches using convolutional neural networks were producing far better results than my hand-tuned code on similar problems. Not only that, the process of training a detector for a new class of object was much easier. I didn't have to think about what features to detect, I'd just supply a network with new training examples and it would take it from there.

Those experiences converted me into a deep learning enthusiast, and so when Jetpac was acquired and I had the chance to join Google and work with many of the stars of the field, I couldn't resist. What impressed me more than anything was the team's willingness to share their knowledge with the rest of the world.

I'm especially happy that we've just managed to release TensorFlow, our internal machine learning framework, because it gives me a chance to show practical, usable examples of why I'm so convinced deep learning is an essential tool for anybody working with images, speech, or text in ML.

Given my background, my favorite first example is using a deep network to spot objects in an image. One of the early showcases for the new approach to neural networks was an annual competition to recognize 1,000 different classes of objects, from the Imagenet data set, and TensorFlow includes a pre-trained network for that task. If you look inside the examples folder in the source code, you'll see “label_image”, which is a small C++ application for using that network.

The README has the instructions for building TensorFlow on your machine, downloading the binary files defining the network, and compiling the sample code. Once it's all built, just run it with no arguments, and you should see a list of results showing "Military Uniform" at the top. This is running on the default image of Admiral Grace Hopper, and correctly spots her attire.

Image via Wikipedia

After that, try pointing it at your own images using the “--image” command line flag, and you should see a set of labels for each. If you want to know more about what's going on under the hood, the C++ section of the TensorFlow Inception tutorial goes into a lot more detail.

The only things it will spot are those that are in the original 1,000 Imagenet classes, and it will always try to find something, which can lead to some funny results. There are no people categories, so on portraits you'll often see objects that are associated with people like seat belts or oxygen masks, or in Lincoln’s case, a bow tie!

Image via U.S History Images

If the image is poorly lit, then “nematode” is usually the top pick since most training photos of those are taken in very dim surroundings. It's also not perfect in its identification, with an error rate of 5.6% for getting the right label in the top five results. However, that’s not all that bad considering Stanford’s Andrej Karpathy found that even someone who was trained at the job could only achieve a slightly-better 5.1% error doing the same task manually. We can do even better if we combine the outputs of four trained models into an "ensemble", with an error rate of just 3.5%.

It's unlikely that the set of labels it produces is exactly what you need for your application, so the next step would be to train your own network. That is a much bigger task than running a pre-trained one like this, but one of the things I like about TensorFlow is that it spans the whole lifecycle of a machine learning model, from experimentation, to training, and into production, as this example shows. To get started training, I'd recommend looking at this simple tutorial on recognizing hand-drawn digits from the MNIST data set.

I hope that sharing this framework will help developers build amazing user experiences we’d never even think of. We’ve been having a massive amount of fun with TensorFlow, and I can’t wait to see what interesting image tools you build using it!

75 comments

Add a comment as Jay Taylor

Top comments

Stream

Research at Google via Google+

3 months ago - Shared publicly

Check out the Google Research blog, linked below, where Software Engineer +Pete Warden explains how you can use a TensorFlow pre-trained deep network to identify objects in an image. And if spotting objects like “bow tie”, “military uniform” and “nematode” do not suit your application, you can use TensorFlow to train your very own network.

7
8
7

View all 6 replies

Jon Van Oast

3 months ago

0
1
0

thanks for the post +Pete Warden and team! appreciate the clear overview. on the top of my (infinite) todo list is to apply tensorflow to the wildlife detection/identification we are doing at wildme.org -- any help and hints we can get are appreciated! (not our day jobs and it is a 501c3 right?) ... hope to follow these leads very soon.

Mike2020able

2 months ago

0
1
0

https://www.youtube.com/watch?v=aI-_lWjfejc&index=14&list=FLqYdSe9tJbwRXRAKDIOfEVw

Vincent Vanhoucke

3 months ago - Deep Learning (Discussion)

We've just posted our latest pretrained ImageNet classifier on the TensorFlow web site. Enjoy!

7
8
7

View all 4 replies

Vincent Vanhoucke

3 months ago

1
2
1

+Vahid Bastani 1) not directly, but there are several published approaches to deriving segmentation from a recognition network. 2) To add a new class, you might get away with taking the features produced just before the classifier layer and train your own classifier for that class, but usually nothing beats training end-to-end with all the data combined.

Vahid Bastani

3 months ago

0
1
0

+Vincent Vanhoucke Thanks!

Vincent Vanhoucke via Google+

3 months ago - Shared publicly

We've just posted our latest pretrained ImageNet classifier on the TensorFlow web site. Enjoy!

5
6
5

View all 3 replies

Vincent Vanhoucke

3 months ago

1
2
1

+Valentin Pletzer the API has a different vocabulary from ImageNet.

Valentin Pletzer

3 months ago

0
1
0

Thanks for the clearification

Ankit Pati

3 months ago - The Great Operating System Debate (+1 Linux)

Notice the output fragments below each of the images. Now, anyone who has used Ubuntu even once will recognise the colours and the typography used. Yes, they are of the defaults of the Ubuntu terminal.

I wonder why Google is using Linux, instead of an ostensibly storm-inducing OS?

1
2
1