You can find me on Twitter @bhutanisanyam1, connect with me on Linkedin hereHere and Here are two articles on my Learning Path to Self Driving Cars

If you want to read more Tutorials/Notes, please check this post out

You can find the Markdown File Here

These are the Lecture 1 notes for the MIT 6.S094: Deep Learning for Self-Driving Cars Course (2018), Taught by Lex Fridman

Lecture 2 Notes can be found hereLecture 3 Notes can be found hereLecture 4 Notes can be found hereLecture 5 Notes can be found here

All images are from the Lecture slides.

Deep learning: Set of techniques that have worked well for AI techniques in recent years due to advancement in research and GPU capabilities. SDC are Systems that can utilize these.

Instructors are working on devoloping cars that understand environment inside and outside the car.

Competitions:

Why Self Driving Cars?

Goal: Applying Data Driven learning methods to autonomous vehicle.

It’s a biggest integration of personal robots.

Autonomous vehicle: It’s a Personal robot rather than a Perception-Control. The systems will need assistance from humans via a Transfer of control during situations. A truly Perceptional System having a dynamic nature with human equivalence maybe a few decades away.

Cognitive load: A fully connected CNN takes in the raw 3D input to analyse the cognitive load,body pose estimation, drowsiness of the driver.

Argument: Full Autonomy requires achieving human level Intelligence for some fields.

Human-Centred Artifical Intelligence Approach

Proposal: Consider human presence in design of every algorithm.

Why Deep Learning?

Deep learning Perform really well with huge data. Since human lives are directly dependent on the machines, techniques that learn from real world data are needed.

What is Deep Learning?

AI: Ability to accomplish complex goals.

Understanding/Reasoning: Ability to turn complex information into simple and useful information.

Deep learning (Representation Learning or Feature Learning) is able to take raw information without any meaning and is able to construct hierarchical representation to allow insights to be generated.

The most capable branch of AI that is capable of deriving structure from the data to allow to derivation of insights.

Representation Learning

Neural Networks.

Inspired loosely by biological human neurons.

  1. Human NN have no stacking, ANN are stacked.
  2. No Order Vs Ordered
  3. Synchronous Learning Vs Asynchronous Learning
  4. Unknown Learning Vs Backprop
  5. Slower Processing Vs Faster Processing
  6. Less Power consumption Vs Inefficient

Similiarity: Both are distributed computation on a large scale.

A basic Neuron is simple, connected units allow much complex use cases.

Neuron:

  1. Neuron consists of inputs of a set of edges with weights
  2. The weights are multiplied
  3. A bias is added
  4. A non-linear function determines if the NN is activated.

Combination of NN:

  1. Feed Forward NN: Successful in Computer Vision.
  2. Recurrent NN: Feed back into itself, have memory. Successful in Time Series related data, much similar to Human (hence are harder to train).

Universality: Mutiple Neural nets can learn to approximate any function with just 1 hidden network layer*

*Given good algorithms.

Limitation: Not in power of the Networks, but in the methods.

Categories of DL

  1. Supervised Learning: Human annotation of data is needed.
  2. Augmented Supervised Learning: Human + Machine approach.
  3. Semi-Supervised
  4. Unsupervised Learning: Machine Input.
  5. Reinforcement Learning: Machine Input.

Currently being used: 1,2

Future and better categories: 3,4,5.

DL Impact Spaces:

  1. Defining and Solving a Particular Problem. ex: Boston Housing Price Estimation.
  2. General Purpose Intelligence (Or Almost): Reinforcement and Unsupervised Learning.

Supervised Learning

Training phases: 1. Input Data 2. Labels 3. Training of Data

Testing Stage: 1. New data 2. Inputted to learning system 3. Produce Output

Learning

What can we do DL?

  1. One to One Mapping.
  2. One to Many
  3. Many to Many.
  4. Asynchronous Many to Many.

Terminologies:

Neural Network Operations:

Activation Functions

  1. Sigmoid. Cons: Vanishing Gradient, Not Zero Centred
  2. Tanh. Cons: Vanishing Gradients.
  3. ReLu. Cons: Not Zero Centred

Vanishing Gradients: When the output or gradient of the NN is very low and results in slow learning.

Backpropagation

Learning Process of the NN. Goal: Update the weights and biases to decrease loss function.

Subtasks:

  1. Forward pass to compute network output and error.
  2. Backward pass to compute gradients.
  3. A fraction of the weight’s gradient is subtracted from the weight.

Since the process is modular, it’s parallelizable.

Learning

Learning is an optimization process.

Goal: To minimize the Loss Function by updating weights and biases.

Techniques used: Mini-batch Gradient Descent and Stochastic Gradient Descent.

Learning Challenge

Regularization:

Techniques that help in generalising.

Dropout: Randomly Remove some of the nodes (along with the incoming and outgoing nodes)

Goal: To help generalise better.

Regularization Weight Penalty:

  1. Keeps smaller weights unless error derivative is high.
  2. Prevents from fitting sampling error.
  3. Smoother Model.
  4. For 2 similar inputs, the weight gets distributed.

  1. Allows weights to remain large.

Neural Network Playground: To play around with techniques and practise

Deep Learning Breakthroughs

What Changed?

  1. Compute Power increased.
  2. Large Organised Datasets available.
  3. Algorithms and Research in GPU Utilization.
  4. Software and Infrastructure.
  5. Financial Backing.

DL is hard

Human Comparision:

  1. Human Vision: Devoloped 540,000,000 years of data.
  2. Bipedal Movement: 230,000,000 years of data.
  3. Abstract Thoughts: 100,000 years of data.

Neural Net:

  1. Add Distortion to pixel data, causes incorrect predictions.
  2. Vision Problems: Illuminabilty, Poses, Occlusions, Intra class variations.

Object Recognition/Classifcation:

Goal: Input an image and predict an output

ImageNet: 14M+ categories with 21.8k+ Categories

Competition: ILSVRC:

AlexNet (2012) made a significant jump in accuracy.

Resnet (2015): Human Level Performance was defeated.

Subtle example: DL is still distant from ‘Human Generalisation Ability’

Same Architecture, Many Applications: We can change the predict layer to make predictions to as many number of classes as per requirements.

FCNN:

Every pixel is assigned a class and it inputs an image and produces another image as output.

Goal: Image to Image Mapping.

Use-Cases:

Use-Cases:

Major Break Throughs

Current Drawbacks

Current Challenges:

Re: Why DL?

It’s an oppurtunity to apply techniques to real world problems effectively. (And DL is the most effective at these).

You can find me on Twitter @bhutanisanyam1, connect with me on Linkedin hereHere and Here are two articles on my Learning Path to Self Driving Cars

Subscribe to my Newsletter for a Weekly curated list of Deep learning, Computer Vision Articles