Phonation features

Phonation features

Created on Jul 21 2017

@author: J. C. Vasquez-Correa

class phonation.Phonation

Compute phonation features from sustained vowels and continuous speech.

For continuous speech, the features are computed over voiced segments

Seven descriptors are computed:

  1. First derivative of the fundamental Frequency
  2. Second derivative of the fundamental Frequency
  3. Jitter
  4. Shimmer
  5. Amplitude perturbation quotient
  6. Pitch perturbation quotient
  7. Logaritmic Energy

Static or dynamic matrices can be computed:

Static matrix is formed with 29 features formed with (seven descriptors) x (4 functionals: mean, std, skewness, kurtosis) + degree of Unvoiced

Dynamic matrix is formed with the seven descriptors computed for frames of 40 ms.

Notes:

  1. In dynamic features the first 11 frames of each recording are not considered to be able to stack the APQ and PPQ descriptors with the remaining ones.
  2. The fundamental frequency is computed the RAPT algorithm. To use the PRAAT method, change the “self.pitch method” variable in the class constructor.

Script is called as follows

>>> python phonation.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python phonation.py "../audios/001_a1_PCGITA.wav" "phonationfeaturesAst.txt" "true" "true" "txt"
>>> python phonation.py "../audios/098_u1_PCGITA.wav" "phonationfeaturesUst.csv" "true" "true" "csv"
>>> python phonation.py "../audios/098_u1_PCGITA.wav" "phonationfeaturesUdyn.pt" "false" "true" "torch"
>>> python phonation.py "../audios/" "phonationfeaturesst.txt" "true" "false" "txt"
>>> python phonation.py "../audios/" "phonationfeaturesst.csv" "true" "false" "csv"
>>> python phonation.py "../audios/" "phonationfeaturesdyn.pt" "false" "false" "torch"

Examples directly in Python

>>> from disvoice.phonation import Phonation
>>> phonation=Phonation()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features=phonation.extract_features_file(file_audio, static, plots=True, fmt="numpy")
>>> features2=phonation.extract_features_file(file_audio, static, plots=True, fmt="dataframe")
>>> features3=phonation.extract_features_file(file_audio, dynamic, plots=True, fmt="torch")
>>> path_audios="../audios/"
>>> features1=phonation.extract_features_path(path_audios, static, plots=False, fmt="numpy")
>>> features2=phonation.extract_features_path(path_audios, static, plots=False, fmt="torch")
>>> features3=phonation.extract_features_path(path_audios, static, plots=False, fmt="dataframe")
extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the phonation features from an audio file

Parameters:
  • audio – .wav audio file.
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> phonation=Phonation()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features1=phonation.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=phonation.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=phonation.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> phonation.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the phonation features for audios inside a path

Parameters:
  • path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> phonation=Phonation()
>>> path_audio="../audios/"
>>> features1=phonation.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=phonation.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=phonation.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> phonation.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
plot_phon(data_audio, fs, F0, logE)

Plots of the phonation features

Parameters:
  • data_audio – speech signal.
  • fs – sampling frequency
  • F0 – contour of the fundamental frequency
  • logE – contour of the log-energy
Returns:

plots of the phonation features.