Prosody features
Created on Jul 21 2017, Modified Apr 10 2018.
@author: J. C. Vasquez-Correa, T. Arias-Vergara, J. S. Guerrero
- class prosody.Prosody
Compute prosody features from continuous speech based on duration, fundamental frequency and energy. Static or dynamic matrices can be computed: Static matrix is formed with 103 features and include
1-6 F0-contour: Avg., Std., Max., Min., Skewness, Kurtosis
7-12 Tilt of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
13-18 MSE of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
19-24 F0 on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
25-30 F0 on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
31-34 energy-contour for voiced segments: Avg., Std., Skewness, Kurtosis
35-38 Tilt of a linear estimation of energy contour for V segments: Avg., Std., Skewness, Kurtosis
39-42 MSE of a linear estimation of energy contour for V segment: Avg., Std., Skewness, Kurtosis
43-48 energy on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
49-54 energy on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
55-58 energy-contour for unvoiced segments: Avg., Std., Skewness, Kurtosis
59-62 Tilt of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis
63-66 MSE of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis
67-72 energy on the first unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
73-78 energy on the last unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
79 Voiced rate: Number of voiced segments per second
80-85 Duration of Voiced: Avg., Std., Max., Min., Skewness, Kurtosis
86-91 Duration of Unvoiced: Avg., Std., Max., Min., Skewness, Kurtosis
92-97 Duration of Pauses: Avg., Std., Max., Min., Skewness, Kurtosis
98-103 Duration ratios: Pause/(Voiced+Unvoiced), Pause/Unvoiced, Unvoiced/(Voiced+Unvoiced),Voiced/(Voiced+Unvoiced), Voiced/Puase, Unvoiced/Pause
Dynamic matrix is formed with 13 features computed for each voiced segment and contains
1-6. Coefficients of 5-degree Lagrange polynomial to model F0 contour
7-12. Coefficients of 5-degree Lagrange polynomial to model energy contour
Duration of the voiced segment
Dynamic prosody features are based on Najim Dehak, “Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification”, 2007
Script is called as follows
>>> python prosody.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>
Examples command line:
>>> python Prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesAst.txt" "true" "true" "txt" >>> python Prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUst.csv" "true" "true" "csv" >>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn.pt" "false" "true" "torch"
>>> python Prosody.py "../audios/" "prosodyfeaturesst.txt" "true" "false" "txt" >>> python Prosody.py "../audios/" "prosodyfeaturesst.csv" "true" "false" "csv" >>> python Prosody.py "../audios/" "prosodyfeaturesdyn.pt" "false" "false" "torch" >>> python Prosody.py "../audios/" "prosodyfeaturesdyn.csv" "false" "false" "csv"
Examples directly in Python
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy") >>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe") >>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
>>> path_audio="../audios/" >>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy") >>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv") >>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
- extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')
Extract the prosody features from an audio file
- Parameters:
audio – .wav audio file.
static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
plots – timeshift to extract the features
fmt – format to return the features (npy, dataframe, torch, kaldi)
kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
- Returns:
features computed from the audio file.
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy") >>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe") >>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
- extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')
Extract the prosody features for audios inside a path
- Parameters:
path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
plots – timeshift to extract the features
fmt – format to return the features (npy, dataframe, torch, kaldi)
kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
- Returns:
features computed from the audio file.
>>> prosody=Prosody() >>> path_audio="../audios/" >>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy") >>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv") >>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
- plot_pros(data_audio, fs, F0, segmentsV, segmentsU, F0_features)
Plots of the prosody features
- Parameters:
data_audio – speech signal.
fs – sampling frequency
F0 – contour of the fundamental frequency
segmentsV – list with the voiced segments
segmentsU – list with the unvoiced segments
F0_features – vector with f0-based features
- Returns:
plots of the prosody features.
- prosody_dynamic(audio)
Extract the dynamic prosody features from an audio file
- Parameters:
audio – .wav audio file.
- Returns:
array (N,13) with the prosody features extracted from an audio file. N= number of voiced segments
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features=prosody.prosody_dynamic(file_audio)
- prosody_static(audio, plots)
Extract the static prosody features from an audio file
- Parameters:
audio – .wav audio file.
plots – timeshift to extract the features
- Returns:
array with the 103 prosody features
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features=prosody.prosody_static(file_audio, plots=True)