Prosody features¶
Created on Jul 21 2017, Modified Apr 10 2018.
@author: J. C. Vasquez-Correa, T. Arias-Vergara, J. S. Guerrero
-
class
prosody.
Prosody
¶ Compute prosody features from continuous speech based on duration, fundamental frequency and energy. Static or dynamic matrices can be computed: Static matrix is formed with 103 features and include
1-6 F0-contour: Avg., Std., Max., Min., Skewness, Kurtosis
7-12 Tilt of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
13-18 MSE of a linear estimation of F0 for each voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
19-24 F0 on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
25-30 F0 on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
31-34 energy-contour for voiced segments: Avg., Std., Skewness, Kurtosis
35-38 Tilt of a linear estimation of energy contour for V segments: Avg., Std., Skewness, Kurtosis
39-42 MSE of a linear estimation of energy contour for V segment: Avg., Std., Skewness, Kurtosis
43-48 energy on the first voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
49-54 energy on the last voiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
55-58 energy-contour for unvoiced segments: Avg., Std., Skewness, Kurtosis
59-62 Tilt of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis
63-66 MSE of a linear estimation of energy contour for U segments: Avg., Std., Skewness, Kurtosis
67-72 energy on the first unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
73-78 energy on the last unvoiced segment: Avg., Std., Max., Min., Skewness, Kurtosis
79 Voiced rate: Number of voiced segments per second
80-85 Duration of Voiced: Avg., Std., Max., Min., Skewness, Kurtosis
86-91 Duration of Unvoiced: Avg., Std., Max., Min., Skewness, Kurtosis
92-97 Duration of Pauses: Avg., Std., Max., Min., Skewness, Kurtosis
98-103 Duration ratios: Pause/(Voiced+Unvoiced), Pause/Unvoiced, Unvoiced/(Voiced+Unvoiced),Voiced/(Voiced+Unvoiced), Voiced/Puase, Unvoiced/Pause
Dynamic matrix is formed with 13 features computed for each voiced segment and contains
1-6. Coefficients of 5-degree Lagrange polynomial to model F0 contour
7-12. Coefficients of 5-degree Lagrange polynomial to model energy contour
- Duration of the voiced segment
Dynamic prosody features are based on Najim Dehak, “Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification”, 2007
Script is called as follows
>>> python prosody.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)>
Examples command line:
>>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesAst.txt" "true" "true" "txt" >>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUst.csv" "true" "true" "csv" >>> python prosody.py "../audios/001_ddk1_PCGITA.wav" "prosodyfeaturesUdyn.pt" "false" "true" "torch"
>>> python prosody.py "../audios/" "prosodyfeaturesst.txt" "true" "false" "txt" >>> python prosody.py "../audios/" "prosodyfeaturesst.csv" "true" "false" "csv" >>> python prosody.py "../audios/" "prosodyfeaturesdyn.pt" "false" "false" "torch" >>> python prosody.py "../audios/" "prosodyfeaturesdyn.csv" "false" "false" "csv"
Examples directly in Python
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy") >>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe") >>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
>>> path_audio="../audios/" >>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy") >>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv") >>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
-
extract_features_file
(audio, static=True, plots=False, fmt='npy', kaldi_file='')¶ Extract the prosody features from an audio file
Parameters: - audio – .wav audio file.
- static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
- plots – timeshift to extract the features
- fmt – format to return the features (npy, dataframe, torch, kaldi)
- kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns: features computed from the audio file.
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features1=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="npy") >>> features2=prosody.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe") >>> features3=prosody.extract_features_file(file_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
-
extract_features_path
(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')¶ Extract the prosody features for audios inside a path
Parameters: - path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
- static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
- plots – timeshift to extract the features
- fmt – format to return the features (npy, dataframe, torch, kaldi)
- kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns: features computed from the audio file.
>>> prosody=Prosody() >>> path_audio="../audios/" >>> features1=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="npy") >>> features2=prosody.extract_features_path(path_audio, static=True, plots=False, fmt="csv") >>> features3=prosody.extract_features_path(path_audio, static=False, plots=True, fmt="torch") >>> prosody.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
-
plot_pros
(data_audio, fs, F0, segmentsV, segmentsU, F0_features)¶ Plots of the prosody features
Parameters: - data_audio – speech signal.
- fs – sampling frequency
- F0 – contour of the fundamental frequency
- segmentsV – list with the voiced segments
- segmentsU – list with the unvoiced segments
- F0_features – vector with f0-based features
Returns: plots of the prosody features.
-
prosody_dynamic
(audio)¶ Extract the dynamic prosody features from an audio file
Parameters: audio – .wav audio file. Returns: array (N,13) with the prosody features extracted from an audio file. N= number of voiced segments >>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features=prosody.prosody_dynamic(file_audio)
-
prosody_static
(audio, plots)¶ Extract the static prosody features from an audio file
Parameters: - audio – .wav audio file.
- plots – timeshift to extract the features
Returns: array with the 103 prosody features
>>> prosody=Prosody() >>> file_audio="../audios/001_ddk1_PCGITA.wav" >>> features=prosody.prosody_static(file_audio, plots=True)