You can find me on Twitter @bhutanisanyam1, connect with me on Linkedin hereHere and Here are two articles on my Learning Path to Self Driving Cars

You can find the Markdown File Here

You can find the Lecture 1 Notes hereLecture 2 Notes can be found hereLecture 3 Notes can be found hereLecture 5 Notes can be found here

These are the Lecture 4 notes for the MIT 6.S094: Deep Learning for Self-Driving Cars Course (2018), Taught by Lex Fridman.

All Images are from the Lecture Slides.

Computer Vision: Teaching Computers to See.

Computer Vision, as of Today is Deep Learning. Majority of the successes of our understanding of images, utilise Neural Networks.

Raw Sensory data: For the machine, images are in the form of numbers.

The images in the form of 1 channel or 3 channel numerical arrays, are taken as input by the NN, the output is produced by regressing or by classifying the image into various categories.

We must be careful about our assumptions for what is easy and hard with Perception.

Human Vision Vs Computer Vision.

  1. Structure of the Visual Cortex is in layers. As information is passed from our eyes to the brain, higher and higher order representation are formed. This is the inspiration behind Deep NN for images. Higher and higher representations are formed through the layers. The early layers, taking in the raw pixels, finding edges. Further finding more abstract features by connecting these edges. Finally, finding higher order semantic meaning.

  2. Deep Learning is hard for Computer Vision:

Image classification Pipeline:

There is a bin with different categories inside each class. Those bins have a lot of examples of each. Task: Bin a new image into one of these classes.

Famous Datasets:

Trivial Example:

Working of Neural Networks:

Convolutional Neural Networks

When the NN is tasked to learning a complex task with large data and large number of objects, CNNs work efficiently.

‘Trick-Spatial Invarince’:An Object in the top left corner is the same as the object in the bottom right corner of an image. So we learn the same features across the image.

Convolution operation: Instead of the Fully connected layers; Here a 3rd dimension of depth is present. So the block take 3 input volumes and produce 3D output volumes

They take a slice of the image, ‘a window’ and slide it through the image. They apply the same weights to slice/window of an image to generate outputs. We can make many such filters.

Parameters on each of these filters are shared. (If a feature is useful in one place, it’s useful everywhere) This allows parameter reduction significantly. The re-use of spatial features.

Example:

Convolution

ImageNet Case Study

Object Detection

Note: CNN produce a pixel level heat map of activations based on convolutions

Scene understanding

Key Aspects of Segmentation

ResNet-DUC 2017:

FlowNet

The methods discussed here, disregard the temporal dynamics, which is relevant in the case of Robotics.

Challenge: Segmentation of images through time.

FlowNet 2 2016:

SegFuse

Dataset:

Task:

You can find me on Twitter @bhutanisanyam1, connect with me on Linkedin hereHere and Here are two articles on my Learning Path to Self Driving Cars

Subscribe to my Newsletter for a Weekly curated list of Deep learning, Computer Vision Articles