Open-Source Audio-based Data Science Projects (With Tutorials)

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

If you are a student or a professional looking for various open-source audio-based data science projects, then, this article is here to help you.

The audio-based data science projects listed below are categorized in an experience-wise manner. All of these projects can be implemented using Python.

Audio-based Data Science Projects

1. Speech Emotion Recognition – Tutorial

Our voices often reflect our emotions through tone and pitch. This project uses the same concept and attempts to recognize human emotion and affective states from his/her speech.

Summary of the steps involved in this project are:
1. Download and Load the data from here
2. Extract important features
3. Split the dataset into training and testing sets
4. Initialize MLPClassifier and train the model.

Read the full article by Data-Flair here.

Full code for Speech Emotion Recognition in Python.

import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

#Extract features (mfcc, chroma, mel) from a sound file
def extract_feature(file_name, mfcc, chroma, mel):
    with soundfile.SoundFile(file_name) as sound_file:
        X = sound_file.read(dtype="float32")
        sample_rate=sound_file.samplerate
        if chroma:
            stft=np.abs(librosa.stft(X))
        result=np.array([])
        if mfcc:
            mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result=np.hstack((result, mfccs))
        if chroma:
            chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            result=np.hstack((result, chroma))
        if mel:
            mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
            result=np.hstack((result, mel))
    return result

#Emotions in the RAVDESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

#Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']

#Load the data and extract features for each sound file
def load_data(test_size=0.2):
    x,y=[],[]
    for file in glob.glob("path_to_folder\\Actor_*\\*.wav"):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        if emotion not in observed_emotions:
            continue
        feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

#Split the dataset
x_train,x_test,y_train,y_test=load_data(test_size=0.25)

#Initialize the Multi Layer Perceptron Classifier
model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)

#Train the model
model.fit(x_train,y_train)

#Predict for test score
y_pred=model.predict(x_test)

#Calculate the accuracy of our model
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)

#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))

2. Speech Recognizer – Video Tutorial, GitHub

Recognizing speech plays an important role in how we communicate with our devices. Technologies such as Google Assistant, Siri, Alexa heavily rely upon speech recognition. This project builds a simple speech recognizer to identify spoken digits. In this project, a neural network is built using the TFLearn high-level Tensorflow-based library which is then trained on a labeled dataset of spoken digits. Finally, it is tested on spoken digits. Watch the explanation by Siraj Rawal:

Full code for speech recognizer in python:

from __future__ import division, print_function, absolute_import
import tflearn
import speech_data
import tensorflow as tf

learning_rate = 0.0001
training_iters = 300000  # steps
batch_size = 64

width = 20  # mfcc features
height = 80  # (max) length of utterance
classes = 10  # digits

batch = word_batch = speech_data.mfcc_batch_generator(batch_size)
X, Y = next(batch)
trainX, trainY = X, Y
testX, testY = X, Y #overfit for now

# Network building
net = tflearn.input_data([None, width, height])
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, classes, activation='softmax')
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='categorical_crossentropy')
# Training

### add this "fix" for tensorflow version errors
col = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
for x in col:
    tf.add_to_collection(tf.GraphKeys.VARIABLES, x ) 


model = tflearn.DNN(net, tensorboard_verbose=0)
while 1: #training_iters
  model.fit(trainX, trainY, n_epoch=10, validation_set=(testX, testY), show_metric=True,
          batch_size=batch_size)
  _y=model.predict(X)
model.save("tflearn.lstm.model")
print (_y)
print (y)

3. Music Instrument Classifier – Video Tutorial, GitHub

Classifying musical instruments might seem like a simple task for a human ear, but is far more complicated when it comes to computers. In this project, Seth Adams has built a musical instrument classifier is built which can identify 10 different musical instruments. It uses Convolutional Neural Networks and Recurrent Neural Networks for the classification purpose.

Watch the entire playlist.

4. Neural Network Voices – Video Explanation, GitHub

What if you could imitate a famous celebrity’s voice or sing like a famous singer? This project started with a goal to convert someone’s voice to the voice of a famous celebrity: Kate Winslet. Simply put, it’s voice style transfer. This project implements deep neural networks trained on more than 2 hours of audiobook sentences read by Kate Winslet. This project was made by Dabi Ahn and Kyubyong Park, and the following video by Siraj Rawal explains how the working of the project:

5. Music Unmixing – Written Tutorial, Colab Notebook

Music unmixing is the process of separating different instruments from a piece of music. It is one of the most interesting fields in the Audio Analysis. Music separation is a cornerstone problem for many applications in the entertainment industry. This tutorial was presented at EUSIPCO 2019 and it covers the topic from both, a theoretical perspective, as well as an interactive demonstration regarding how to implement the described ideas in practice.

Open-Source Audio-based Data Science Projects (With Tutorials)

In Conclusion:

How many of the above projects have you tried? Do you have any recommendations for us to include in the above list? Let us know.

Also, if you are trying to start or advance your career in the field of Computer Vision, you might like this article on “Open-Source Computer Vision Projects (With Tutorials)“.


Open-Source Audio-based Data Science Projects (With Tutorials)Open-Source Audio-based Data Science Projects (With Tutorials)

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment