Articulation features

Articulation features

Created on Jul 21 2017

@author: J. C. Vasquez-Correa

class articulation.Articulation

Compute articulation features from continuous speech.

122 descriptors are computed:

1-22. Bark band energies in onset transitions (22 BBE).

23-34. Mel frequency cepstral coefficients in onset transitions (12 MFCC onset)

35-46. First derivative of the MFCCs in onset transitions (12 DMFCC onset)

47-58. Second derivative of the MFCCs in onset transitions (12 DDMFCC onset)

59-80. Bark band energies in offset transitions (22 BBE).

81-92. MFCCC in offset transitions (12 MFCC offset)

93-104. First derivative of the MFCCs in offset transitions (12 DMFCC offset)

105-116. Second derivative of the MFCCs in offset transitions (12 DMFCC offset)

  1. First formant Frequency
  2. First Derivative of the first formant frequency
  3. Second Derivative of the first formant frequency
  4. Second formant Frequency
  5. First derivative of the Second formant Frequency
  6. Second derivative of the Second formant Frequency

Static or dynamic matrices can be computed:

Static matrix is formed with 488 features formed with (122 descriptors) x (4 functionals: mean, std, skewness, kurtosis)

Dynamic matrix are formed with the 58 descriptors (22 BBEs, 12 MFCC, 12DMFCC, 12 DDMFCC ) computed for frames of 40 ms with a time-shift of 20 ms in onset transitions.

The first two frames of each recording are not considered for dynamic analysis to be able to stack the derivatives of MFCCs

Notes: 1. The first two frames of each recording are not considered for dynamic analysis to be able to stack the derivatives of MFCCs 2. The fundamental frequency is computed the PRAAT algorithm. To use the RAPT method, change the “self.pitch method” variable in the class constructor.

Script is called as follows

>>> python articulation.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.txt" "true" "true" txt
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.csv" "true" "true" csv
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKst.pt" "true" "true" torch
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.txt" "false" "true" txt
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.csv" "false" "true" csv
>>> python articulation.py "../audios/001_ddk1_PCGITA.wav" "articulation_featuresDDKdyn.pt" "false" "true" torch

Examples directly in Python

>>> articulation=Articulation()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=articulation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the articulation features from an audio file

Parameters:
  • audio – .wav audio file.
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> articulation=Articulation()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=articulation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=articulation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
>>> path_audio="../audios/"
>>> features1=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=articulation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the articulation features for audios inside a path

Parameters:
  • path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> articulation=Articulation()
>>> path_audio="../audios/"
>>> features1=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=articulation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=articulation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> articulation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
plot_art(data_audio, fs, F0, F1, F2, segmentsOn, segmentsOff)

Plots of the articulation features

Parameters:
  • data_audio – speech signal.
  • fs – sampling frequency
  • F0 – contour of the fundamental frequency
  • F1 – contour of the 1st formant
  • F2 – contour of the 2nd formant
  • segmentsOn – list with the onset segments
  • segmentsOff – list with the offset segments
Returns:

plots of the articulation features.