Glottal features

Glottal features
class glottal.Glottal

Compute features based on the glottal source reconstruction from sustained vowels and continuous speech.

For continuous speech, the features are computed over voiced segments

Nine descriptors are computed:

  1. Variability of time between consecutive glottal closure instants (GCI)
  2. Average opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration / duration of glottal cycle
  3. Variability of opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration /duration of glottal cycle
  4. Average normalized amplitude quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
  5. Variability of normalized amplitude quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
  6. Average H1H2: Difference between the first two harmonics of the glottal flow signal
  7. Variability H1H2: Difference between the first two harmonics of the glottal flow signal
  8. Average of Harmonic richness factor (HRF): ratio of the sum of the harmonics amplitude and the amplitude of the fundamental frequency
  9. Variability of HRF

Static or dynamic matrices can be computed:

Static matrix is formed with 36 features formed with (9 descriptors) x (4 functionals: mean, std, skewness, kurtosis)

Dynamic matrix is formed with the 9 descriptors computed for frames of 200 ms length with a time-shift of 50 ms.

Notes:

  1. The fundamental frequency is computed using the RAPT algorithm.
>>> python glottal.py <file_or_folder_audio> <file_features> <dynamic_or_static> <plots (true,  false)> <format (csv, txt, npy, kaldi, torch)>

Examples command line:

>>> python glottal.py "../audios/001_a1_PCGITA.wav" "glottalfeaturesAst.txt" "static" "true" "txt"
>>> python glottal.py "../audios/098_u1_PCGITA.wav" "glottalfeaturesUst.csv" "static" "true" "csv"
>>> python glottal.py "../audios/098_u1_PCGITA.wav" "glottalfeaturesUst.ark" "dynamic" "true" "kaldi"
>>> python glottal.py "../audios/098_u1_PCGITA.wav" "glottalfeaturesUst.pt" "dynamic" "true" "torch"

Examples directly in Python

>>> from disvoice.glottal import Glottal
>>> glottal=Glottal()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features=glottal.extract_features_file(file_audio, static, plots=True, fmt="numpy")
>>> features2=glottal.extract_features_file(file_audio, static, plots=True, fmt="dataframe")
>>> features3=glottal.extract_features_file(file_audio, dynamic, plots=True, fmt="torch")
>>> path_audios="../audios/"
>>> features1=glottal.extract_features_path(path_audios, static, plots=False, fmt="numpy")
>>> features2=glottal.extract_features_path(path_audios, static, plots=False, fmt="torch")
>>> features3=glottal.extract_features_path(path_audios, static, plots=False, fmt="dataframe")
extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the glottal features from an audio file

Parameters:
  • audio – .wav audio file.
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> glottal=Glottal()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features1=glottal.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=glottal.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=glottal.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> glottal.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the glottal features for audios inside a path

Parameters:
  • path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> glottal=Glottal()
>>> path_audio="../audios/"
>>> features1=glottal.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=glottal.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=glottal.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> glottal.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
extract_glottal_signal(x, fs)

Extract the glottal flow and the glottal flow derivative signals

Parameters:
  • x – data from the speech signal.
  • fs – sampling frequency
Returns:

glottal signal

Returns:

derivative of the glottal signal

Returns:

glottal closure instants

>>> from scipy.io.wavfile import read
>>> glottal=Glottal()
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> fs, data_audio=read(audio)
>>> glottal, g_iaif, GCIs=glottal.extract_glottal_signal(data_audio, fs)
plot_glottal(data_audio, fs, GCI, glottal_flow, glottal_sig)

Plots of the glottal features

Parameters:
  • data_audio – speech signal.
  • fs – sampling frequency
  • GCI – glottal closure instants
  • glottal_flow – glottal flow
  • glottal_sig – reconstructed glottal signal
Returns:

plots of the glottal features.

glottal.SE_VQ_varF0(x, fs, f0=None)

Function to extract GCIs using an adapted version of the SEDREAMS algorithm which is optimised for non-modal voice qualities (SE-VQ). Ncand maximum peaks are selected from the LP-residual signal in the interval defined by the mean-based signal.

A dynamic programming algorithm is then used to select the optimal path of GCI locations. Then a post-processing method, using the output of a resonator applied to the residual signal, is carried out to remove false positives occurring in creaky speech regions.

Note that this method is slightly different from the standard SE-VQ algorithm as the mean based signal is calculated using a variable window length.

This is set using an f0 contour interpolated over unvoiced regions and heavily smoothed. This is particularly useful for speech involving large f0 excursions (i.e. very expressive speech).

Parameters:
  • x – speech signal (in samples)
  • fs – sampling frequency (Hz)
  • f0 – f0 contour (optional), otherwise its computed using the RAPT algorithm
Returns:

GCI Glottal closure instants (in samples)

References:
Kane, J., Gobl, C., (2013) `Evaluation of glottal closure instant detection in a range of voice qualities’, Speech Communication 55(2), pp. 295-314.

ORIGINAL FUNCTION WAS CODED BY JOHN KANE AT THE PHONETICS AND SPEECH LAB IN TRINITY COLLEGE DUBLIN ON 2013.

THE SEDREAMS FUNCTION WAS CODED BY THOMAS DRUGMAN OF THE UNIVERSITY OF MONS

THE CODE WAS TRANSLATED TO PYTHON AND ADAPTED BY J. C. Vasquez-Correa AT PATTERN RECOGNITION LAB UNIVERSITY OF ERLANGEN NUREMBER- GERMANY AND UNIVERSTY OF ANTIOQUIA, COLOMBIA JCAMILO.VASQUEZ@UDEA.EDU.CO https//jcvasquezc.github.io

glottal.IAIF(x, fs, GCI)

Function to carry out iterative and adaptive inverse filtering (Alku et al 1992).

Parameters:
  • x – speech signal (in samples)
  • fs – sampling frequency (in Hz)
  • GCI – Glottal closure instants (in samples)
Returns:

glottal flow derivative estimate

Function Coded by John Kane @ The Phonetics and Speech Lab Trinity College Dublin, August 2012

THE CODE WAS TRANSLATED TO PYTHON AND ADAPTED BY J. C. Vasquez-Correa AT PATTERN RECOGNITION LAB UNIVERSITY OF ERLANGEN NUREMBER- GERMANY AND UNIVERSTY OF ANTIOQUIA, COLOMBIA JCAMILO.VASQUEZ@UDEA.EDU.CO https//jcvasquezc.github.io

glottal.get_vq_params(gf, gfd, fs, GCI)

Function to estimate the glottal parameters: NAQ, QOQ, H1-H2, and HRF

This function can be used to estimate a range of conventional glottal source parameters often used in the literature. This includes: the normalized amplitude quotient (NAQ), the quasi-open quotient (QOQ), the difference in amplitude of the first two harmonics of the differentiated glottal source spectrum (H1-H2), and the harmonic richness factor (HRF)

Parameters:
  • gf – [samples] [N] Glottal flow estimation
  • gfd – [samples] [N] Glottal flow derivative estimation
  • fs – [Hz] [1] sampling frequency
  • GCI – [samples] [M] Glottal closure instants
Returns:

NAQ [s,samples] [Mx2] Normalised amplitude quotient

Returns:

QOQ[s,samples] [Mx2] Quasi-open quotient

Returns:

H1H2[s,dB] [Mx2] Difference in glottal harmonic amplitude

Returns:

HRF[s,samples] [Mx2] Harmonic richness factor

References:

[1] Alku, P., B ackstrom, T., and Vilkman, E. Normalized amplitude quotient for parameterization of the glottal flow. Journal of the Acoustical Society of America, 112(2):701-710, 2002.

[2] Hacki, T. Klassifizierung von glottisdysfunktionen mit hilfe der elektroglottographie. Folia Phoniatrica, pages 43-48, 1989.

[3] Alku, P., Strik, H., and Vilkman, E. Parabolic spectral parameter - A new method for quantification of the glottal flow. Speech Communication, 22(1):67-79, 1997.

[4] Hanson, H. M. Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 10(1):466-481, 1997.

[5] Childers, D. G. and Lee, C. K. Voice quality factors: Analysis, synthesis and perception. Journal of the Acoustical Society of America, 90(5):2394-2410, 1991.

Function Coded by John Kane @ The Phonetics and Speech Lab Trinity College Dublin, August 2012

THE CODE WAS TRANSLATED TO PYTHON AND ADAPTED BY J. C. Vasquez-Correa AT PATTERN RECOGNITION LAB UNIVERSITY OF ERLANGEN NUREMBERGER- GERMANY AND UNIVERSTY OF ANTIOQUIA, COLOMBIA JCAMILO.VASQUEZ@UDEA.EDU.CO https//jcvasquezc.github.io