Phonation features¶

Created on Jul 21 2017

@author: J. C. Vasquez-Correa

class phonation.Phonation¶

Compute phonation features from sustained vowels and continuous speech.

For continuous speech, the features are computed over voiced segments

Seven descriptors are computed:

First derivative of the fundamental Frequency
Second derivative of the fundamental Frequency
Jitter
Shimmer
Amplitude perturbation quotient
Pitch perturbation quotient
Logaritmic Energy

Static or dynamic matrices can be computed:

Static matrix is formed with 29 features formed with (seven descriptors) x (4 functionals: mean, std, skewness, kurtosis) + degree of Unvoiced

Dynamic matrix is formed with the seven descriptors computed for frames of 40 ms.

Notes:

In dynamic features the first 11 frames of each recording are not considered to be able to stack the APQ and PPQ descriptors with the remaining ones.
The fundamental frequency is computed the RAPT algorithm. To use the PRAAT method, change the “self.pitch method” variable in the class constructor.

Script is called as follows

>>> python phonation.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python phonation.py "../audios/001_a1_PCGITA.wav" "phonationfeaturesAst.txt" "true" "true" "txt"
>>> python phonation.py "../audios/098_u1_PCGITA.wav" "phonationfeaturesUst.csv" "true" "true" "csv"
>>> python phonation.py "../audios/098_u1_PCGITA.wav" "phonationfeaturesUdyn.pt" "false" "true" "torch"

>>> python phonation.py "../audios/" "phonationfeaturesst.txt" "true" "false" "txt"
>>> python phonation.py "../audios/" "phonationfeaturesst.csv" "true" "false" "csv"
>>> python phonation.py "../audios/" "phonationfeaturesdyn.pt" "false" "false" "torch"

Examples directly in Python

>>> from disvoice.phonation import Phonation
>>> phonation=Phonation()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features=phonation.extract_features_file(file_audio, static, plots=True, fmt="numpy")
>>> features2=phonation.extract_features_file(file_audio, static, plots=True, fmt="dataframe")
>>> features3=phonation.extract_features_file(file_audio, dynamic, plots=True, fmt="torch")

>>> path_audios="../audios/"
>>> features1=phonation.extract_features_path(path_audios, static, plots=False, fmt="numpy")
>>> features2=phonation.extract_features_path(path_audios, static, plots=False, fmt="torch")
>>> features3=phonation.extract_features_path(path_audios, static, plots=False, fmt="dataframe")

extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')¶

Extract the phonation features from an audio file

Parameters:	audio – .wav audio file. static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames plots – timeshift to extract the features fmt – format to return the features (npy, dataframe, torch, kaldi) kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:	features computed from the audio file.

>>> phonation=Phonation()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features1=phonation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=phonation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=phonation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> phonation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")

extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')¶

Extract the phonation features for audios inside a path

Parameters:

path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
plots – timeshift to extract the features
fmt – format to return the features (npy, dataframe, torch, kaldi)
kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”

Returns:

features computed from the audio file.

>>> phonation=Phonation()
>>> path_audio="../audios/"
>>> features1=phonation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=phonation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=phonation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> phonation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")

plot_phon(data_audio, fs, F0, logE)¶

Plots of the phonation features

Parameters:	data_audio – speech signal. fs – sampling frequency F0 – contour of the fundamental frequency logE – contour of the log-energy
Returns:	plots of the phonation features.