dp Usage: [-vocab string] [-place_holder string] [-show_cost ] [-o string] [-p int] [-i float] [-d float] [-s float] [-cost_matrix string]
dp provides simple dynamic programming to find the lowest cost alignment of two symbol sequences. Possible uses include:
Label alignment (e.g. speech recogniser output scoring)
-vocab string file containing vocabulary
-place_holder string which vocab item is the place holder (default is )
-show_cost show cost of matching path
-o string output file
-p int 'beam' width Either:
-i float insertion cost
-d float deletion cost
-s float substitution cost Or:
-cost_matrix string file containing cost matrix
Align two symbol sequences:
$ dp -vocab vocab.file "a b c" "a b d c" -i 1 -d 2 -s 3 |
where vocab.file contains "a b c d"
Or, using a full cost matrix:
$ dp -vocab vocab2.file -cost_matrix foo "a b c" "a b d c" |
where vocab2.file contains "a b c d " and the file foo contains:
0 3 3 3 2 3 0 3 3 2 3 3 0 3 2 3 3 3 0 2 1 1 1 1 0 |
Each row of foo shows the cost of replacing an input symbol with each symbol in the vocabulary to match an output symbol. Each row corresponds to an item in the vocabulary (in the order they appear in the vocabulary file). In the example, replacing 'a' with 'a' costs 0, replacing 'a' with any of 'b' 'c' or 'd' costs 3 (a substitution), and replacing 'a' with the place holder symbol 'null' costs 2 (a deletion). The cost of replacing 'null' with anything other than 'null' costs 1 (an insertion). The costs of 1,2 and 3 used here are only for illustration. The cost matrix meed not have the form above - for example, replacing 'a' with 'a' need not cost 0. The entries in foo are read as floats.