In this Research project we have used different machine learning techniques to classify 15 different frog species from their call recordings. Our purposed approach has successfully achieved 96% percent accuracy in classification of dataset containing 3667 recorded frog calls.
SVM For Audio Segmentation:
We have trained SVM to automatically detect frog calls form silence or noise. The output of the SVM will be the start and stop time of the predicted frog call. This information is used to cut the part of the signal that is most probibly the frog call and send it to the CNN to classify the type of the frog.
CNN for classification :
The architecture used in this network have two convolution layers with max pooling following one fully connected layer and soft max layer for converting scores to probabilities.
We have trained the network 5 times and randomly picked 10 percent of the data set as a validation set and 25% of the data set as a test set.
Average test accuracy for 5 training was 96%.