Articulation features¶

Created on Jul 21 2017

@author: J. C. Vasquez-Correa

class articulation.Articulation¶

Compute articulation features from continuous speech.

122 descriptors are computed:

1-22. Bark band energies in onset transitions (22 BBE).

23-34. Mel frequency cepstral coefficients in onset transitions (12 MFCC onset)

35-46. First derivative of the MFCCs in onset transitions (12 DMFCC onset)

47-58. Second derivative of the MFCCs in onset transitions (12 DDMFCC onset)

59-80. Bark band energies in offset transitions (22 BBE).

81-92. MFCCC in offset transitions (12 MFCC offset)

93-104. First derivative of the MFCCs in offset transitions (12 DMFCC offset)

105-116. Second derivative of the MFCCs in offset transitions (12 DMFCC offset)

First formant Frequency
First Derivative of the first formant frequency
Second Derivative of the first formant frequency
Second formant Frequency
First derivative of the Second formant Frequency
Second derivative of the Second formant Frequency

Static or dynamic matrices can be computed:

Static matrix is formed with 488 features formed with (122 descriptors) x (4 functionals: mean, std, skewness, kurtosis)

Dynamic matrix are formed with the 58 descriptors (22 BBEs, 12 MFCC, 12DMFCC, 12 DDMFCC ) computed for frames of 40 ms with a time-shift of 20 ms in onset transitions.

The first two frames of each recording are not considered for dynamic analysis to be able to stack the derivatives of MFCCs

Notes: 1. The first two frames of each recording are not considered for dynamic analysis to be able to stack the derivatives of MFCCs 2. The fundamental frequency is computed the PRAAT algorithm. To use the RAPT method, change the “self.pitch method” variable in the class constructor.

Script is called as follows

>>> python articulation.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.txt" "true" "true" txt
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.csv" "true" "true" csv
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.pt" "true" "true" torch
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.txt" "false" "true" txt
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.csv" "false" "true" csv
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.pt" "false" "true" torch

Examples directly in Python

>>> articulation=Articulation()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=articulation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")

extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')¶

Extract the articulation features from an audio file

Parameters:	audio – .wav audio file. static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames plots – timeshift to extract the features fmt – format to return the features (npy, dataframe, torch, kaldi) kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:	features computed from the audio file.

>>> articulation=Articulation()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=articulation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")

>>> path_audio="../audios/"
>>> features1=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=articulation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")

extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')¶

Extract the articulation features for audios inside a path

Parameters:

path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
plots – timeshift to extract the features
fmt – format to return the features (npy, dataframe, torch, kaldi)
kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”

Returns:

features computed from the audio file.

>>> articulation=Articulation()
>>> path_audio="../audios/"
>>> features1=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=articulation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")

plot_art(data_audio, fs, F0, F1, F2, segmentsOn, segmentsOff)¶

Plots of the articulation features

Parameters:	data_audio – speech signal. fs – sampling frequency F0 – contour of the fundamental frequency F1 – contour of the 1st formant F2 – contour of the 2nd formant segmentsOn – list with the onset segments segmentsOff – list with the offset segments
Returns:	plots of the articulation features.