TensorFlow Object Detection (TFOD) API Setup

Mar 25, 2021


In the computer vision field, three most common operations which we perform (i.e) image classification, object detection and image segmentation. In the computer vision field, people usually confused with these three terms. Let’s start with understanding what is image classification, Object detection and image segmentation.

Image Classification: Image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. “Contextual” means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood.

Image Classification

You will have instantly recognized it. It’s a dog or cat. Take a step back and analyse how you came to this conclusion. You were shown an image and you classified the class it belonged to (a dog, in this instance or a cat). And that, in a nutshell, is what Image Classification is all about.

Object Detection: Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection.

As you saw, there’s only one object here: a dog. We can easily use image classification model and predict that there’s a dog in the given image. But what if we have both a cat and a dog in a single image? That’s where Image Localization comes into the picture. It helps us to identify the location of a single object in the given image. In case we have multiple objects present, we then rely on the concept of Object Detection. We can predict the location along with the class for each object using OD.

Image segmentation: We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how Image Segmentation works. An image, as you must have known, is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation

Object Detection

By applying Object Detection models, we will only be able to build a bounding box corresponding to each class in the image. But it will not tell anything about the shape of the object as the bounding boxes are either rectangular or square in shape. Image Segmentation models on the other hand will create a pixel-wise mask for each object in the image. This technique gives us a far more granular understanding of the object(s) in the image.

I hope you now have a clear understanding of what is Image Classification, Image Localization, Object Detection and Image Segmentation, now comes over TFOD API.

What is an API? Why do we need an API?

API stands for Application Programming Interface. An API provides developers a set of common operations so that they don’t have to write code from scratch.

TensorFlow Object Detection API:

The TensorFlow object detection API is the framework for creating a deep learning network that solves object detection problems.

There are already pretrained models in their framework which they refer to as Model Zoo

This includes a collection of pretrained models trained on the COCO dataset, and the Open Images Dataset. These models can be used for inference if we are interested in categories only in this dataset.

How to setup the TFOD framework?

Below is the step-by-step process to follow on local system for you to just visualize object detection easily with the help of TFOD.

STEP-1 Download the following content

  • Download the model : Download the faster_rcnn_inception_v2_coco_2018_01_28 model from the model zoo or any other model of your choice from TensorFlow 1 Detection Model Zoo.
  • Download utils file: Download Dataset & utils.
  • Download the labelImg tool: Download labelImg tool for labelling images.

before extraction, you should have the following compressed files in a single folder.

STEP-2 Extract all the above zip files into a TFOD folder and remove the compressed files-

After extracting all the zip files now, you should have the following folders -

STEP-3 Creating virtual env using conda-


  • for specific python version: conda create -n your_env_name python=3.6
  • for latest python version: conda activate your_env_name

STEP-4 Install the following packages in your new environment-

for GPU

pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow-gpu==1.14.0

for CPU only

pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow==1.14.0

STEP-5 Install protobuf using conda package manager-

conda install -c anaconda protobuf

STEP-6 Change protobuff to .py file-

we convert protobuf file into python file becasue python compiler does not understand protobuf files. In this object detction we have written most of the file into a prtobuf file so we covert that into a python file.

Open command prompt and cd to research folder.

Now in the research folder run the following command

  • For Linux

protoc object_detection/protos/*.proto — python_out=.

  • For Windows

protoc object_detection/protos/*.proto — python_out=.

Python file will be created for each proto file .

STEP-7 Install setup.py for object detection-

Install setup.py file which is available in your research folder.For this go over your anaconda prompt change your directory to research and run below command:

python setup.py install

STEP-8 verify your object detection model-

To verify your object detection model, you have to run. ipynb file which is reside in your models/research/object detection folder i.e. object_detection_tutorial.ipynb

STEP-9 Paste all content present in utils into research folder-

Following are the files and folder present in the utils folder-

STEP-10 Paste faster_rcnn_inception_v2_coco_2018_01_28 model or any other model downloaded from model zoo into research folder-

Now cd to the research folder and run the following python file-

python xml_to_csv.py

STEP-11 Run the following to generate train and test records-

from the research folder-

python generate_tfrecord.py — csv_input=images/train_labels.csv — image_dir=images/train — output_path=train.record

python generate_tfrecord.py — csv_input=images/test_labels.csv — image_dir=images/test — output_path=test.record

STEP-12 Copy from research/object_detection/samples/config/ YOURMODEL.config file into research/training-

The following config file shown here is with respect to faster_rcnn_inception_v2_coco_2018_01_28. So if you have downloaded it for any other model apart from faster_rcnn_inception_v2_coco_2018_01_28 you’ll see config file with YOUR_MODEL_NAME as shown below-

STEP-13 Update num_classes, fine_tune_checkpoint ,and num_steps plus update input_path and label_map_path for both train_input_reader and eval_input_reader-

Changes to be made in the config file are highlighted below. You must update the value of those keys in the config file.

Hence always verify YOUR_MODEL_NAME before using the config file.

STEP-14 From research/object_detection/legacy/ copy train.py to research folder-

legacy folder contains train.py as shown below -

STEP-15 Copy deployment and nets folder from research/slim into the research folder-

slim folder contains the following folders

STEP-16 Now Run the following command from the research folder. This will start the training in your local system-

copy the command and replace YOUR_MODEL.config with your own model’s name for example faster_rcnn_inception_v2_coco_2018_01_28 and then run it in cmd prompt or terminal. And make sure you are in research folder.

python train.py — logtostderr — train_dir=training/ — pipeline_config_path=training/YOUR_MODEL.config

Note : Always run all the commands in the research folder

