WebRTC Voice Activity Detection using Python

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

Voice Activity Detection (VAD) is used to detect changes in speech audio patterns to classify audio as voiced or unvoiced.

Using Web Real-Time Communication (WebRTC), you can perform real-time voice activity detection via Python and in this article, you will learn to perform WebRTC Voice Activity Detection using Python.

WebRTC Voice Activity Detection using Python

Ready to get started? Let us start by installing the necessary Python library for this project.

Getting Started with WebRTC Voice Activity Detection in Python

To perform WebRTC Voice Activity Detection using Python, you can install the py-webrtcvad library using Python’s Package Manager (pip).

The py-webrtcvad package/library is a Python interface to the WebRTC Voice Activity Detector from Google and is compatible with Python 2 and Python 3. This library can be used for telephony and speech recognition free of charge.

To install the py-webrtcvad library, open up your command line/terminal and run the following command:

pip install webrtcvad

This will install the latest version of the py-webrtcvad library on your machine. You can then import it in Python using a Python IDE or Python Shell by writing the following line of code:

import webrtcvad

If running this line of code doesn’t give an error, then, you’ve successfully installed and imported py-webrtcvad in Python. Note that the py-webrtcvad library is written as webrtcvad in Python.



WebRTC Voice Activity Detection in Python Example

Here’s an example of how Voice Activity Detection can be performed in Python:

# Import the py-webrtcvad library
import webrtcvad

# Initialize a vad object
vad = webrtcvad.Vad()

# Run the VAD on 10 ms of silence and 16000 sampling rate 
sample_rate = 16000
frame_duration = 10  # in ms

# Creating an audio frame of silence
frame = b'\x00\x00' * int(sample_rate * frame_duration / 1000)

# Detecting speech
print(f'Contains speech: {vad.is_speech(frame, sample_rate)}')
Contains speech: False

As you can see, the is_speech() method from webrtcvad library/package can be used to detect voice in Python. You can use this method in any of your projects to start detecting voice in recorded audio frames.

See example.py from the library’s GitHub repository for a more detailed example that will process a .wav file, find the voiced segments, and write each one as a separate .wav.

In Conclusion

You now know how to perform WebRTC Voice Activity Detection using Python. If you have any questions, please feel free to comment down below and we will get back to you.


WebRTC Voice Activity Detection using PythonWebRTC Voice Activity Detection using Python

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment