class | EST_SCFG |
A class representing a stochastic context free grammar (SCFG).
This class includes the representation of the grammar itself and methods for training and testing it against some corpus.
At presnet of grammars in Chomsky Normal Form are supported. That is rules may be binary or unary. If binary the mother an two daughters are nonterminals, if unary the mother must be nonterminal and daughter a terminal symbol.
The terminals and nonterminals symbol sets are derived automatically from the LISP representation of the rules at initialization time and are represented as \Ref{EST_Discrete}s. The distinguished symbol is assumed to be the first mother of the first rule in the given grammar.
find_terms_nonterms()
void find_terms_nonterms ( EST_StrList &nt, EST_StrList &t, LISP rules) Find the terminals and nonterminals in the given grammar, adding them to the appropriate given string lists
class | EST_SCFG_Rule |
A stochastic context free grammar rule.
At present only two types of rule are supported: {\tt est\_scfg\_binary\_rule} and {\tt est\_scfg\_unary\_rule}. This is sufficient for the representation of grammars in Chomsky Normal Form. Each rule also has a probability associated with it. Terminals and noterminals are represented as ints using the \Ref{EST_Discrete}s in \Ref{EST_SCFG} to reference the actual alphabets.
Although this class includes a ``probability'' nothing in the rule itself enforces it to be a true probability. It is responsibility of the classes that use this rule to enforce that condition if desired.
EST_SCFG_Rule | ( | const EST_SCFG_Rule &r) |
int | mother | ( | ) const |
int | daughter1 | ( | ) const |
In a unary rule this is a terminal, in a binary rule it is a nonterminal
void | set_rule | ( | double prob, int p, int m) |
void | set_rule | ( | double prob, int p, int q, int r) |
class | EST_SCFG_traintest | ( | : public EST_SCFG |
A class used to train (and test) SCFGs is an extention of \Ref{EST_SCFG}.
This offers an implementation of Pereira and Schabes ``Inside-Outside reestimation from partially bracket corpora.'' ACL 1992.
A SCFG maybe trained from a corpus (optionally) containing brackets over a series of passes reestimating the grammar probabilities after each pass. This basically extends the \Ref{EST_SCFG} class adding support for a bracket corpus and various indexes for efficient use of the grammar.
void | test_corpus | ( | ) |
Test the current grammar against the current corpus print summary.
Cross entropy measure only is given.
void | test_crossbrackets | ( | ) |
Test the current grammar against the current corpus.
Sumamry includes percentage of cross bracketing accuracy and percentage of fully correct parses.
void | load_corpus | ( | const EST_String &filename) |
Load a corpus from the given file.
Each setence in the corpus should be contained in parentheses. Additional paranethesis may be used to denote phrasing within a sentence. The corpus is read using the LISP reader so LISP conventions shold apply, notable single quotes should appear within double quotes.
void | train_inout | ( | int passes, int startpass, int checkpoint, int spread, const EST_String &outfile) |
Train a grammar using the loaded corpus.
Parameters passes the number of training passes desired.
startpass from which pass to start from
checkpoint save the grammar every n passes
spread Percentage of corpus to use on each pass, this cycles through the corpus on each pass.
class | EST_WFST |
a call representing a weighted finite-state transducer
transition()
int transition ( int state, int in, int out) const Find (first) new state given in and out symbols
transition()
int transition ( int state, const EST_String &in, const EST_String &out) const Find (first) new state given in and out strings
transition()
int transition ( int state, const EST_String &inout) const Find (first) new state given in/out string
transduce()
int transduce ( int state, const EST_String &in, EST_String &out) const Transduce in to out (strings) from state
transduce()
void transduce ( int state, int in, wfst_translist &out) const Transduce in to list of transitions
transition_all()
void transition_all ( int state, int in, int out, EST_WFST_MultiState *ms) const Find all possible transitions for given state/input/output
for transitions from data
const;
enum wfst_state_type ms_type EST_WFST_MultiState *ms const Given a multi-state return type (final, ok, error)
build_and_transition()
void build_and_transition ( int start, int end, LISP conjunctions) Basic conjunction constructor
build_or_transition()
void build_or_transition ( int start, int end, LISP disjunctions) Basic disjunction constructor
intersection()
void intersection ( EST_TList<EST_WFST> &wl) Build intersection of all WFSTs in given list. The new WFST recognizes the only the strings that are recognized by all WFSTs in the given list
intersection()
void intersection ( const EST_WFST &a, const EST_WFST &b) Build intersection of WFSTs a and b The new WFST recognizes the only the strings that are recognized by both a and b list
uunion()
void uunion ( EST_TList<EST_WFST> &wl) Build union of all WFSTs in given list. The new WFST recognizes the only the strings that are recognized by at least one WFSTs in the given list
uunion()
void uunion ( const EST_WFST &a, const EST_WFST &b) Build union of WFSTs a and b. The new WFST recognizes the only the strings that are recognized by either a or b
compose()
void compose ( const EST_WFST &a, const EST_WFST &b) Build new WFST by composition of a and b. Outputs of a are fed to b, given a new WFSTs which has a's input language and b's output set. a's output and b's input alphabets must be the same
difference()
void difference ( const EST_WFST &a, const EST_WFST &b) Build WFST that accepts only strings in a that aren't also accepted by strings in b
apply_multistate()
EST_WFST_MultiState* apply_multistate ( const EST_WFST &wfst, EST_WFST_MultiState *ms, int in, int out) const Transduce a multi-state given n and out
add_epsilon_reachable()
void add_epsilon_reachable ( EST_WFST_MultiState *ms) const Extend multi-state with epsilon reachable states
operator = ()
EST_WFST& operator = ( const EST_WFST &a)