dp Perform dynamic programming on label sequences

Table of Contents
Synopsis
Options
Examples

Synopsis

dp Usage: [-vocab string] [-place_holder string] [-show_cost ] [-o string] [-p int] [-i float] [-d float] [-s float] [-cost_matrix string]

dp provides simple dynamic programming to find the lowest cost alignment of two symbol sequences. Possible uses include:

  • Label alignment (e.g. speech recogniser output scoring)

The costs of inserting/deleting/substituting symbols can be given either with command line options, or as a file containing a full matrix of costs. In the former case, all insertions will have the same cost. In the latter case, each row of the file contains the cost of replacing a symbol in sequence 1 with each possible symbol in the vocabulary to align with sequence 2, including a special "place holder" symbol used for insertions/deletions. See the examples below. The place holder can be redefined. The output is an EST utterance with three Relations: the first two are the input sequences, and the third shows the alignment found by dynamic programming.

Options

-vocab

string file containing vocabulary

-place_holder

string which vocab item is the place holder (default is )

-show_cost

show cost of matching path

-o

string output file

-p

int 'beam' width Either:

-i

float insertion cost

-d

float deletion cost

-s

float substitution cost Or:

-cost_matrix

string file containing cost matrix

Examples

Align two symbol sequences:

$ dp -vocab vocab.file "a b c" "a b d c" -i 1 -d 2 -s 3

where vocab.file contains "a b c d"

Or, using a full cost matrix:

$ dp -vocab vocab2.file -cost_matrix foo "a b c" "a b d c"

where vocab2.file contains "a b c d " and the file foo contains:

0 3 3 3 2

3 0 3 3 2

3 3 0 3 2

3 3 3 0 2

1 1 1 1 0

Each row of foo shows the cost of replacing an input symbol with each symbol in the vocabulary to match an output symbol. Each row corresponds to an item in the vocabulary (in the order they appear in the vocabulary file). In the example, replacing 'a' with 'a' costs 0, replacing 'a' with any of 'b' 'c' or 'd' costs 3 (a substitution), and replacing 'a' with the place holder symbol 'null' costs 2 (a deletion). The cost of replacing 'null' with anything other than 'null' costs 1 (an insertion). The costs of 1,2 and 3 used here are only for illustration. The cost matrix meed not have the form above - for example, replacing 'a' with 'a' need not cost 0. The entries in foo are read as floats.