Prosody features

Prosody features

Created on Jul 21 2017, Modified Apr 10 2018.

@author: J. C. Vasquez-Correa, T. Arias-Vergara, J. S. Guerrero

class prosody.Prosody

Compute prosody features from continuous speech based on duration, fundamental frequency and energy. Static or dynamic matrices can be computed: Static matrix is formed with 103 features and include

1-6 F0-contour: Avg., Std., Max., Min., Skewness, Kurtosis

7-12 Tilt of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

13-18 MSE of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

19-24 F0 on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

25-30 F0 on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

31-34 energy-contour for voiced segments: Avg., Std., Skewness, Kurtosis

35-38 Tilt of a linear estimation of energy contour for V segments: Avg., Std., Skewness, Kurtosis

39-42 MSE of a linear estimation of energy contour for V segment: Avg., Std., Skewness, Kurtosis

43-48 energy on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

49-54 energy on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

55-58 energy-contour for unvoiced segments: Avg., Std., Skewness, Kurtosis

59-62 Tilt of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis

63-66 MSE of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis

67-72 energy on the first unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

73-78 energy on the last unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis

79 Voiced rate: Number of voiced segments per second

80-85 Duration of Voiced: Avg., Std., Max., Min., Skewness, Kurtosis

86-91 Duration of Unvoiced: Avg., Std., Max., Min., Skewness, Kurtosis

92-97 Duration of Pauses: Avg., Std., Max., Min., Skewness, Kurtosis

98-103 Duration ratios: Pause/(Voiced+Unvoiced), Pause/Unvoiced, Unvoiced/(Voiced+Unvoiced),Voiced/(Voiced+Unvoiced), Voiced/Puase, Unvoiced/Pause

Dynamic matrix is formed with 13 features computed for each voiced segment and contains

1-6. Coefficients of 5-degree Lagrange polynomial to model F0 contour

7-12. Coefficients of 5-degree Lagrange polynomial to model energy contour

  1. Duration of the voiced segment

Dynamic prosody features are based on Najim Dehak, “Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification”, 2007

Script is called as follows

>>> python prosody.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesAst.txt" "true" "true" "txt"
>>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUst.csv" "true" "true" "csv"
>>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn.pt" "false" "true" "torch"
>>> python prosody.py "../audios/" "prosodyfeaturesst.txt" "true" "false" "txt"
>>> python prosody.py "../audios/" "prosodyfeaturesst.csv" "true" "false" "csv"
>>> python prosody.py "../audios/" "prosodyfeaturesdyn.pt" "false" "false" "torch"
>>> python prosody.py "../audios/" "prosodyfeaturesdyn.csv" "false" "false" "csv"

Examples directly in Python

>>> prosody=Prosody()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
>>> path_audio="../audios/"
>>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the prosody features from an audio file

Parameters:
  • audio – .wav audio file.
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> prosody=Prosody()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the prosody features for audios inside a path

Parameters:
  • path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> prosody=Prosody()
>>> path_audio="../audios/"
>>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
plot_pros(data_audio, fs, F0, segmentsV, segmentsU, F0_features)

Plots of the prosody features

Parameters:
  • data_audio – speech signal.
  • fs – sampling frequency
  • F0 – contour of the fundamental frequency
  • segmentsV – list with the voiced segments
  • segmentsU – list with the unvoiced segments
  • F0_features – vector with f0-based features
Returns:

plots of the prosody features.

prosody_dynamic(audio)

Extract the dynamic prosody features from an audio file

Parameters:audio – .wav audio file.
Returns:array (N,13) with the prosody features extracted from an audio file. N= number of voiced segments
>>> prosody=Prosody()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features=prosody.prosody_dynamic(file_audio)
prosody_static(audio, plots)

Extract the static prosody features from an audio file

Parameters:
  • audio – .wav audio file.
  • plots – timeshift to extract the features
Returns:

array with the 103 prosody features

>>> prosody=Prosody()
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features=prosody.prosody_static(file_audio, plots=True)