ML MODEL TO DETECT THE BIGGEST OBJECT IN AN IMAGE — PART 1

1 - Drawing the bounding box around the largest object in an Image. It is about getting the Image Data ready for analysis.

(Read Part 2 here)

Welcome to the Part 2 of fast.ai. where we will deal with Single Object Detection . Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI.

This part assumes you to have good understanding of the Part 1. Here are the links , feel free to explore the first Part of this Series in the following order.

  1. Dog Vs Cat Image Classification
  2. Dog Breed Image Classification
  3. Multi-label Image Classification
  4. Time Series Analysis using Neural Network
  5. NLP- Sentiment Analysis on IMDB Movie Dataset
  6. Basic of Movie Recommendation System
  7. Collaborative Filtering from Scratch
  8. Collaborative Filtering using Neural Network
  9. Writing Philosophy like Nietzsche
  10. Performance of Different Neural Network on Cifar-10 dataset

This blog post has been divided into two parts.

The dataset we will be using is PASCAL VOC (2007 version).

Lets get our hands dirty with the coding part.

<a href="https://medium.com/media/586dd25078ac6b3ad2f46329d0baeb14/href">https://medium.com/media/586dd25078ac6b3ad2f46329d0baeb14/href</a>

As in the case of all Machine Learning projects , there are three things to focus on :-

  1. Provide Data.
  2. Pick some suitable Architecture .
  3. Choose a Loss function.

Step 1 will focus on getting the data in proper shape so as to do analysis on top of it.

STEP 1:- It involves classifying and localizing the largest object in each image. The step involves:-

1.1. INSTALL THE PACKAGES

Lets install the packages and download the data using the commands as shown below.

# Install the packages
# !pip install https://github.com/fastai/fastai/archive/master.zip
!pip install fastai==0.7.0
!pip install torchtext==0.2.3
!pip install opencv-python
!apt update && apt install -y libsm6 libxext6
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 
!pip3 install torchvision

# Download the Data to the required folder
!mkdir data
!wget http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar -P data/
!wget https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip -P data/
!tar -xf data/VOCtrainval_06-Nov-2007.tar -C data/
!unzip data/PASCAL_VOC.zip -d data/
!rm -rf data/PASCAL_VOC.zip data/VOCtrainval_06-Nov-2007.tar

%matplotlib inline
%reload_ext autoreload
%autoreload 2

!pip install Pillow

from fastai.conv_learner import *
from fastai.dataset import *

from pathlib import Path
import json
import PIL
from matplotlib import patches, patheffects

Lets check what’s present in our data. We will be using the python 3 standard library pathlib for our paths and file access .

1.2. KNOW YOUR DATA USING Pathlib OBJECT.

The data folder contains different versions of Pascal VOC .

PATH = Path('data')
list((PATH/'PASCAL_VOC').iterdir())
# iterdir() helps in iterating through the directory of PASCAL_VOC

This file contains the Images , Type , Annotations and Categories. For making use of Tab Completion , save it in appropriate variable name.

IMAGES,ANNOTATIONS,CATEGORIES = ['images', 'annotations', 'categories']

Lets see in detail what each of these has in detail:-

For easy access to all of these , lets convert the important stuffs into dictionary comprehension and list comprehension.

FILE_NAME,ID,IMG_ID,CATEGORY_ID,BBOX = 'file_name','id','image_id','category_id','bbox'

categories = {o[ID]:o['name'] for o in training_json[CATEGORIES]}
# The categories is a dictionary having  class and an ID associated with # it.
# Lets check out all of the 20 categories using the command below
categories

training_filenames = {o[ID]:o[FILE_NAME] for o in training_json[IMAGES]}
training_filenames 
# contains the id and the filename of the images.

training_ids = [o[ID] for o in training_json[IMAGES]]
training_ids 
# This is a list comprehension.

Now , lets check out the folder where we have all the images .

list((PATH/'VOCdevkit'/'VOC2007').iterdir())
# The JPEGImages in red is the one with all the Images in it.

JPEGS = 'VOCdevkit/VOC2007/JPEGImages'
IMG_PATH = PATH/JPEGS
# Set the path of the Images as IMG_PATH
list(IMG_PATH.iterdir())[:5]
# Check out all the Images in the Path

Note:- Each image has an unique id associated with it as shown above.

1.3. BOUNDING BOX

The main objective here is to bring our bounding box to proper format such that which can be used for plotting purpose. The bounding box coordinates are present in the annotations.

A bounding box is a box around the objects in an Image.

Earlier the Bounding box coordinates represents (column, rows, height, width). Check out the image below.

SUMMARY OF THE USEFUL IMAGE RELATED INFORMATION

Lets get into the details of the annotations of a particular image. As we can see in the snapshot below .

Some libraries take VOC format bounding boxes, so the bb_hw() function helps in resetting the dimension into original format:

bb_voc = [155, 96, 196, 174]
bb_fastai = hw_bb(bb_voc) 

# We won't be using the below function for now .
def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])

1.4. PLOTTING OF THE BOUNDING BOX AROUND THE OBJECT

Now we will focus on creating a bounding box around an image . For that we will create plots in steps or in separate functions . Each step serves a definite purpose towards creating a plot. Lets see the purpose of each and every step . Post that we will focus on the flow .

<a href="https://medium.com/media/b1dc0c5f2173a92ff9a9e3a2f9af368f/href">https://medium.com/media/b1dc0c5f2173a92ff9a9e3a2f9af368f/href</a>

Let’s wrap up the flow steps, in functions as shown below:-

def draw_im(im, ann):
    ax = show_img(im, figsize=(16,8))
    for b,c in ann: # Destructure the annotations into bbox and class
        b = bb_hw(b) # Convert it into appropriate coordinates
        draw_rect(ax, b) # Draw rectangle bbox around it.
        draw_text(ax, b[:2], categories[c], sz=16) 
        # Write some text around it

def draw_idx(i):
    im_a = training_annotations[i] # Grab the annotations with the help of the image id.
    im = open_image(IMG_PATH/training_filenames[i]) # Open that Image
    print(im.shape) # Print its shape
    draw_im(im, im_a) # Call the draw and print its text

draw_idx(17) 
# Draw an image of a particular index.

Let’s wrap up of the flow in detail here:-

This is how we are locating the objects in the Images. The next step is to Classify the Largest Item in the Image. We will discuss the next step in detail in the next blog Post .

A Big Shout-out to Anwesh Satapathy and Sharwon Pius for illustrating this problem in a simple way . Please check out his github Repo and the simplified roadmap to Single object Detection .

<a href="https://medium.com/media/2b59a46fe51290a60be31d0a7b0c59db/href">https://medium.com/media/2b59a46fe51290a60be31d0a7b0c59db/href</a>

If you have any queries feel free to shoot them @ashiskumarpanda on twitter or please check it out on fastai forums.

If you see the 👏 👏 button and you like this post , feel free to do the needful 😄😄😄😄😄 .

It is a really good feeling to get appreciated by Jeremy Howard. Check out what he has to say about the Fast.ai Part 1 blog of mine . Make sure to have a look at it.

body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

Great summary of the 2018 version of https://t.co/aQsW5afov6 - thanks for sharing @ashiskumarpanda ! https://t.co/jVUzpzp4EO

— @jeremyphoward

function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height); resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500 < maxWidth) {window.frameElement.setAttribute("width", "500");}}