I have written articles where we can classify different types of images such as:

Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com

How good is ‘your’ selfie?_Check out how good is your selfie here!_medium.com

How To Make AI That Classifies Dog Breeds,DeepLearningStudio_Love dogs?_hackernoon.com

Or.. like generating stories using RNNs…

Generate stories using RNNs |pure Mathematics with code|:_Hi reader!_hackernoon.com

So I thought of writing an article which explains how to classify different sounds using AI.

In this article, we’ll see how to prepare a dataset for sound classification and how to use it for our Deep Learning model.

DATA SET:

We are going to use dataset from Urban Sound Classification Challenge. This dataset consist of 8700+ excerpts of sounds from 10 different sources in ‘.wav’ format.

Sources of Sound:

The size of this dataset is around 5.6GB which made me a bit reluctant to train a model on it. So I have written a python script which can be used to decrease the size of dataset. Basically, the dataset contains around 600 excerpts of sound from each source. I have reduced them to 170, making dataset to around 1.1GB. You can download the script from my repo.

Manik9/Urban-Sound_Urban-Sound - Urban Sound classification using Neural Nets_github.com

Platform to train the program

Training a program which requires high computational power such as Deep Learning model, I prefer to use Deep Learning Studio’s(DLS) jupyter notebooks. It provides Amazon Deep Learning Instances with GPU which can be used to train the model.

Check it out here.

Home_We would like to invite you to join Deep Cognition's team at Booth# 1035 at the GPU Technology Conference March 26-29…_deepcognition.ai

Upload Dataset

Upload the ‘Urban sound’ dataset in datasets folder

Start DLS’s jupyter notebooks

Audio processing

In case of images, we generally pass the values of pixels to our model. In case of Audio too, we need to pass some sort of numerical values which represents our audio.

•librosa is a library in python which can be used for audio pre-processing.

source: Wikipedia

Model Architecture

Each example contains 40 columns i.e(1x40). So 1150 example contains 1150x40. Transpose of this is passed to the model.

Architecture of our model

We get a 1x10 output which represents score of each class

The architecture shown above is replicated in code below.

Training

After 100 epochs:

Training Results

Our trained model obtained an accuracy of 83.04% on validation set, which is quite good as we even reduced the size of dataset to (1/3)rd. We can still improve the accuracy of this model by using CNNs. We’ll see that in another article.

Thanks for Reading 😄

If you have liked this article do 👏 and share this. Follow me on LinkedIn and Medium

Manik Soni - Machine Learning Intern - HEAD Infotech India Pvt ltd - Ace2three.com | LinkedIn_View Manik Soni's profile on LinkedIn, the world's largest professional community. Manik has 3 jobs listed on their…_www.linkedin.com

Manik Soni - Medium_Read writing from Manik Soni on Medium. Machine Learning Researcher. Every day, Manik Soni and thousands of other…_medium.com