Edinburgh Speech Tools Library
	Prev	Chapter 3. Executable Programs	Next

pitchmark Find instants of glottal closure in Largynograph file

* @toc

Synopsis

pitchmark [input file] -o [output file] [options]Summary: pitchmark laryngograph (lx) files [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-otype string " {ascii}"] [-S float] [-o ofile] [-lx_lf int] [-lx_lo int] [-lx_hf int] [-lx_ho int] [-df_lf int] [-df_lo int] [-med_o int] [-mean_o int] [-inv ] [-fill ] [-min float] [-max float] [-def float] [-pm ifile] [-f0 ofile] [-end float] [-wave_end ] [-inter ] [-style string]

pitchmark locates instants of glottal closure in a laryngograph waveform, and performs post-processing to produce even pitchmarks. EST does not currently provide any means of pitchmarking a speech waveform. Pitchmarking is performed by calling the pitchmark() function, which carries out the following operations:

Double low pass filter the signal. This removes noise in the signal. The parameter lx_lf specifies the low pass cutoff frequency, and lx_lo specifies the order. Double filtering (feeding the waveform through the filter, then reversing the waveform and feeding it through again) is performed to reduce any phase shift beween the input and output of the filtering operation.
Double high pass filter the signal. This removes the very low freqency swell that is often observed in laryngograph waveforms. The parameter lx_hf specifies the high pass cutoff frequency, and lx_ho specifies the order. Double filtering is performed to reduce any phase shift beween the input and output of the filtering operation.
Calculate the delta signal. The filtered waveform is differentiated using the delta() function.
Low pass filter the delta signal. Some noise may still be present in the signal, and this is removed by further low pass filtering. Experimentation has shown that simple mean smoothing is often more effective than FIR smoothing at this point. The parameter mo is used to specify the size of the mean smoothing window. If FIR smoothing is chosen, the parameter df_lf specifies the low pass cutoff frequency, and df_lo specifies the order. Double filtering is again used to avoid phase distortion.
Pick zero crossings. Now simple zero-crossing is used to find the pitchmarks themselves.

pitchmark also performs post-processing on the pitchmarks. This can be used to eliminate pitchmarks which occur too closely together, or to provide estimated evenly spaced pitchmarks during unvoiced regions. The -fill option switches this facility on, and -min, -max, -def, -end and -wave_end control its operation.

OPTIONS

-h
Options help
-itype
string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
-n
int Number of channels in an unheadered input file
-f
int Sample rate in Hertz for an unheadered input file
-ibo
string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
-iswap
Swap bytes. (For use on an unheadered input file)
-istype
string Sample type in an unheadered input file: short, mulaw, byte, ascii
-c
string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
-start
float Extract sub-wave starting at this time, specified in seconds
-end
float Extract sub-wave ending at this time, specified in seconds
-from
int Extract sub-wave starting at this sample point
-to
int Extract sub-wave ending at this sample point
-otype
string " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label
-S
float Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
-o
ofile Output filename, defaults to stdout
-lx_lf
int lx low frequency cutoff
-lx_lo
int lx low order
-lx_hf
int lx high frequency cutoff
-lx_ho
int lx high order
-df_lf
int df low frequeny cutoff
-df_lo
int df low order
-med_o
int median smoothing order
-mean_o
int mean smoothing order
-inv
Invert polarity of lx signal. Often the lx signal is upside down. This option inverts the signal prior to processing.
-fill
Insert and remove pitchmarks according to min, max and def period values. Often it is desirable to place limits on the values of the pitchmarks. This option enforces a minimum and maximum pitch period (specified by -man and -max). If the maximum pitch setting is low enough, this will esnure that unvoiced regions have evenly spaced pitchmarks
-min
float Minimum allowed pitch period, in seconds
-max
float Maximum allowed pitch period, in seconds
-def
float Default pitch period in seconds, used for a guide as to what length pitch periods should be in unvoiced sections
-pm
ifile Input is raw pitchmark file. This option is used to perform filling operations on an already existing set of pitchmarks
-f0
ofile Calculate F0 from pitchmarks and save to file
-end
float Specify the end time of the last pitchmark, for use with the -fill option
-wave_end
Use the end of a waveform to specify when the last pitchmark position should be. The waveform file is only read to determine its end, no processing is performed
-inter
Output intermediate waveforms. This will output the signal at various stages of processing. Examination of these waveforms is extremely useful in setting the parameters for similar waveforms
-style
string "track" or "lab"

-h	Options help
-itype	`string` Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
-n	`int` Number of channels in an unheadered input file
-f	`int` Sample rate in Hertz for an unheadered input file
-ibo	`string` Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
-iswap	Swap bytes. (For use on an unheadered input file)
-istype	`string` Sample type in an unheadered input file: short, mulaw, byte, ascii
-c	`string` Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
-start	`float` Extract sub-wave starting at this time, specified in seconds
-end	`float` Extract sub-wave ending at this time, specified in seconds
-from	`int` Extract sub-wave starting at this sample point
-to	`int` Extract sub-wave ending at this sample point
-otype	`string` " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label
-S	`float` Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
-o	`ofile` Output filename, defaults to stdout
-lx_lf	`int` lx low frequency cutoff
-lx_lo	`int` lx low order
-lx_hf	`int` lx high frequency cutoff
-lx_ho	`int` lx high order
-df_lf	`int` df low frequeny cutoff
-df_lo	`int` df low order
-med_o	`int` median smoothing order
-mean_o	`int` mean smoothing order
-inv	Invert polarity of lx signal. Often the lx signal is upside down. This option inverts the signal prior to processing.
-fill	Insert and remove pitchmarks according to min, max and def period values. Often it is desirable to place limits on the values of the pitchmarks. This option enforces a minimum and maximum pitch period (specified by -man and -max). If the maximum pitch setting is low enough, this will esnure that unvoiced regions have evenly spaced pitchmarks
-min	`float` Minimum allowed pitch period, in seconds
-max	`float` Maximum allowed pitch period, in seconds
-def	`float` Default pitch period in seconds, used for a guide as to what length pitch periods should be in unvoiced sections
-pm	`ifile` Input is raw pitchmark file. This option is used to perform filling operations on an already existing set of pitchmarks
-f0	`ofile` Calculate F0 from pitchmarks and save to file
-end	`float` Specify the end time of the last pitchmark, for use with the -fill option
-wave_end	Use the end of a waveform to specify when the last pitchmark position should be. The waveform file is only read to determine its end, no processing is performed
-inter	Output intermediate waveforms. This will output the signal at various stages of processing. Examination of these waveforms is extremely useful in setting the parameters for similar waveforms
-style	`string` "track" or "lab"

Examples

Basic Pitchmarking

$ pitchmark kdt_010.lar -o kdt_010.pm -otype est


Pitchmarking with unvoiced regions filled: The following fills unvoiced regions with pitch periods that are about 0.01 seconds long. It also post-processes the set of pitchmarks and ensures that noe are above 0.02 seconds long and none below 0.003. A final unvoiced region extending to the end of the wave is specified by using the -wave_end option.

$ pitchmark kdt_010.lar -o kdt_010.pm -otype est -fill -min 0.003 \ -max 0.02 -def 0.01 -wave_end

Prev	Home	Next
pda Pitch Detection Algorithm	Up	dp Perform dynamic programming on label sequences