sig2fv [input file] -o [output file] [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-otype string " {ascii}"] [-S float] [-o ofile] [-shift float] [-factor float] [-pm ifile] [-coefs string] [-delta string] [-acc string] [-window_type string] [-lpc_order int] [-ref_order int] [-cep_order int] [-melcep_order int] [-fbank_order int] [-preemph float] [-lifter float] [-usepower ] [-include_c0 ] [-order string]
sig2fv is used to create signal processing feature vector analysis on speech waveforms. The following types of analysis are provided:
Linear prediction (LPC)
Cepstrum coding from lpc coefficients
Mel scale cepstrum coding via fbank
Mel scale log filterbank analysis
Line spectral frequencies
Linear prediction reflection coefficients
Root mean square energy
Power
fundamental frequency (pitch)
calculation of delta and acceleration coefficients of all of the above
-h Options help
-itype string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
-n int Number of channels in an unheadered input file
-f int Sample rate in Hertz for an unheadered input file
-ibo string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
-iswap Swap bytes. (For use on an unheadered input file)
-istype string Sample type in an unheadered input file: short, mulaw, byte, ascii
-c string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
-start float Extract sub-wave starting at this time, specified in seconds
-end float Extract sub-wave ending at this time, specified in seconds
-from int Extract sub-wave starting at this sample point
-to int Extract sub-wave ending at this sample point
-otype string " {ascii}" Output file type, if unspecified ascii is assumed, types are: none, esps, est, est_binary, htk, htk_fbank, htk_mfcc, htk_user, htk_discrete, xmg, xgraph, ema, ema_swapped, ascii, label
-S float Frame spacing of output in seconds. If this is different from the internal spacing, the contour is resampled at this spacing
-o ofile Output filename, defaults to stdout
-shift float frame spacing in seconds for fixed frame analysis. This doesn't have to be the same as the output file spacing - the S option can be used to resample the track before saving default: 0.010
-factor float Frames lengths will be FACTOR times the local pitch period. default: 2.000
-pm ifile Pitch mark file name. This is used to specify the positions of the analysis frames for pitch synchronous analysis. Pitchmark files are just standard track files, but the channel information is ignored and only the time positions are used
-coefs string list of basic types of processing required. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-delta string list of delta types of processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-acc string list of acceleration (delta delta) processing required. Basic processing does not need to be specfied for this option to work. Permissable types are: lpc linear predictive coding cep cepstrum coding from lpc coefficients melcep Mel scale cepstrum coding via fbank fbank Mel scale log filterbank analysis lsf line spectral frequencies ref Linear prediction reflection coefficients power f0 energy: root mean square energy
-window_type string Type of window used on waveform. Permissable types are: none unknown window type rectangle Rectangular window triangle Triangular window hanning Hanning window hamming Hamming window default: hamming
-lpc_order int Order of lpc analysis.
-ref_order int Order of lpc reflection coefficient analysis.
-cep_order int Order of lpc cepstral analysis.
-melcep_order int Order of Mel cepstral analysis.
-fbank_order int Order of filter bank analysis.
-preemph float Perform pre-emphasis with this factor.
-lifter float lifter coefficient.
-usepower use power rather than energy in filter bank analysis
-include_c0 include cepstral coefficient 0
-order string order of analyses
Fixed frame basic linear prediction: To produce a set of linear prediction coefficients at every 10ms, using pre-emphasis and saving in EST format:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5 |
$ sig2fv kdt_010.wav -pm kdt_010.pm -o kdt_010.lpc -coefs "lpc" -otype est -shift 0.01 -preemph 0.5 |
F0, Linear prediction and cepstral coefficients:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -shift 0.01 |
Energy, Linear Prediction and Cepstral coefficients, with a 10ms frame shift during analis but a 5ms frame shift in the output file:
$ sig2fv kdt_010.wav -o kdt_010.lpc -coefs "f0 lpc cep" -otype est -S 0.005 -shift 0.01 |
Delta and acc coefficients can be calculated even if ther base form is not required. This produces normal energy coefficients and cepstral delta coeficients:
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "energy" -delta "cep" -otype est |
Mel-scaled cepstra, Delta and acc coefficients, as is common in speech recognition:
$ sig2fv ../kdt_010.wav -o kdt_010.lpc -coefs "melcep" -delta "melcep" -acc "melcep" -otype est -preemph 0.96 |