OpenFst Advanced Usage
Below are a variety of topics covered in greater depth or of more specialized interest than found in the Quick Tour. Reading the
Quick Tour first is recommended.
Arc Iterators
An arc iterator
is used to access the transitions leaving an FST state. It has the form:
template <class F>
class ArcIterator {
typedef typename F::Arc Arc;
typedef typename Arc::StateId StateId;
public:
ArcIterator(const &F fst, StateId s);
// End of iterator?
bool Done() const;
// Current arc (when !Done)
const Arc& Value() const;
// Advance to next arc (when !Done)
void Next();
// Return current position
size_t Position();
// Return to initial position
void Reset();
// Arc access by position
void Seek(size_t pos);
};
It is templated on the Fst class
F
to allow efficient specializations but defaults to a generic version on the abstract
base
Fst class.
See
here for conventions that arc iterator use must respect.
All current OpenFst library
Seek()
methods are constant time.
An example use of an arc iterator is shown
here.
A
MutableArcIterator
is similar to an
ArcIterator
except its
constructor takes a pointer to a
MutableFst
and it additionally has a
SetValue()
method.
Arc Filters
Arc filters are accepted by various operations to control which arcs are transitioned. An arc filter has the form:
template <class Arc>
class SomeArcFilter {
public:
// Return true iff arc is to be transitioned.
bool operator()(const Arc &arc) const;
};
Pre-defined arc filters include:
Name |
Description |
|
AnyArcFilter |
Accept all arcs |
|
EpsilonArcFilter |
Accept only arcs with input and output epsilons |
|
InputEpsilonArcFilter |
Accept only arcs with input epsilons |
|
OutputEpsilonArcFilter |
Accept only arcs with output epsilons |
|
Arcs
An
Arc
is a type that represents an FST transition from a given source state. It specifies an input label, an output label, a weight, and a destination state ID and it has a type name. In particular, it has the following form:
struct SomeArc {
typedef W Weight;
typedef L Label;
typedef S StateId;
static const string &Type();
Label ilabel;
Label olabel;
Weight weight;
StateId nextstate;
};
where
W
is a valid
weight type, and
L
and
S
are signed integral types.
The following arc types are defined in the OpenFst library:
Name |
Label Type |
State ID Type |
Weight Type |
Registered |
GallicArc<A, S> |
A::Label |
A::StateId |
GallicWeight<A::Label, A::Weight, S> |
|
LexicographicArc<W1, W2> |
int |
int |
LexicographicWeight<W1, W2> |
|
LogArc |
int |
int |
LogWeight |
|
MinMaxArc |
int |
int |
MinMaxWeight |
|
ProductArc<W1, W2> |
int |
int |
ProductWeight<W1, W2> |
|
StdArc |
int |
int |
TropicalWeight |
|
StringArc<S> |
int |
int |
StringWeight<int, S> |
|
Additional arc information:
Base Fsts
Every
Fst
must specify an initial state, the final weights, arc and epsilon counts per states, an Fst type name, the Fst's
properties, how to copy, read and write the Fst, and the input and output symbol tables (if any). In particular, the base
Fst
class has the inteface:
template <class A>
class Fst {
public:
typedef A Arc;
typedef typename A::Weight Weight;
typedef typename A::StateId StateId;
// Initial state
virtual StateId Start() const = 0;
// States's final weight
virtual Final(StateId) const = 0:
// State's arc count
virtual NumArcs(StateId) const = 0;
// States's input epsilon count
virtual NumInputEpsilons(StateId) const = 0;
// State's output epsilon count
virtual NumOutputEpsilons(StateId) const = 0;
// If test=false, return stored properties bits for mask (some poss. unknown)
// If test=true, return property bits for mask (computing o.w. unknown)
virtual Properties(uint64 mask, bool test) const = 0;
// Fst type name
virtual const string& Type() const = 0;
// Get a copy of this Fst
virtual Fst<A> *Copy() const = 0;
// Read an Fst from an input stream; returns NULL on error
static Fst<A> *Read(istream &strm, const FstReadOptions &opts);
// Read an Fst from a file; return NULL on error
// Empty filename reads from standard input
static Fst<A> *Read(const string &filename);
// Write an Fst to an output stream; return false on error
virtual bool Write(ostream &strm, const FstWriteOptions &opts);
// Write an Fst to a file; return false on error
// Empty filename writes to standard output
virtual bool Write(const string &filename);
// Return input label symbol table; return NULL if not specified
virtual const SymbolTable* InputSymbols() const = 0;
// Return output label symbol table; return NULL if not specified
virtual const SymbolTable* OutputSymbols() const = 0;
};
Fst
is an abstract class (note the pure virtual methods). All OpenFst FSTs must meet this interface.
The companion
state iterator and
arc iterator classes
provide access to the states and transitions of the FST.
Caching
Command Line Flags
OpenFst has several global options in the library proper that most users can ignore, leaving them with their default values:
Option |
Type |
Default |
Description |
FLAGS_fst_compat_symbols |
bool |
true |
Require symbol tables to match when appropriate |
FLAGS_fst_default_cache_gc |
bool |
true |
Enable garbage collection of cached Fsts |
FLAGS_fst_default_cache_gc_limit |
int64 |
1048576 |
Byte size that triggers garbage collection of cached Fsts |
FLAGS_fst_pair_parentheses |
string |
"" |
Characters enclosing the first weight of a printed pair weight (and derived classes) to ensure proper I/O of nested pair weights; must have size 0 (none) or 2 (open and close parenthesis) |
FLAGS_fst_pair_separator |
string |
"," |
Character separator between printed pair weights; must be a single character |
FLAGS_fst_verify_properties |
bool |
false |
Verify Fst properites are correctly set when queried |
The first ensures the arguments of binary FST operations (e.g.
composition) have compatible symbol tables (e..g output symbol table matches
input symbol table for composition).
The second two are used to control the
caching of expanded state and arc information found in most of the on-the-fly Fst classes; the default values should normally be satisfactory. The next two are used
to control the text formating of
ProductWeight and other weight pairs. The last is used to ensure that the
properties of an FST have been correctly set; is is used for debugging only, since it incurs considerable computational cost.
In each of the Fst distribution installed binaries, the above options, as well as any of those defined specific to the binary, can be set from the command line using e.g.
--fst_default_cache_gc=false
or
--fst_pair_parenthesis="("
. Additionally, the option
--help
and
--v=N
(where N = 0,1,2,..) will print out usage information and
set the verbosity level of logging, respectively. The flag processing is modeled after the Google
gflags package.
In a user-defined binary, the command line options processing will all also work if the user calls:
SetFlags(usage, &argc, &argv, true);
In that case, the user can set his own flags as well, following the conventions in
<fst/flags.h>.
Alternatively, the user can process options in his own way and directly assign to any of the above global options if he wishes to modify their defaults.
Composition Filters
A
composition filter determines which matches are allowed to proceed in
composition. The basic filters handle correct epsilon matching. Their interface is:
template <class A>
class SomeComposeFilter {
public:
typedef A Arc;
typedef ... FilterState;
// Required constructor.
SomeComposeFilter(const Fst<A> &fst1, const Fst<A> &fst2);
// Return start state of filter.
FilterState Start() const;
// Specifies current composition state.
void SetState(StateId s1, StateId s2, const FilterState &f);
// Apply filter at current composition state to these transitions.
// If an arc label to be matched is kNolabel, then that side does not consume a symbol.
// Returns the new filter state or, if disallowed, FilterState::NoState().
// The filter is permitted to modify its inputs, e.g. for optimizations.
FilterState FilterArc(A *arc1, A *arc2) const;
// Apply filter at current composition state to these final weights
// (cf. superfinal transitions). The filter may modify its inputs,
// e.g. for optimizations.
void FilterFinal(Weight *final1, Weight *final2) const;
};
The filter's state is represented by the type
SomeComposeFilter::FilterState
, defined by the filter and stored in the composition
state table tuple. It has the form:
class SomeFilterState {
public:
// Required constructors
SomeFilterState();
SomeFilterState(const SomeFilterState &f);
// An invalid filter state.
static const SomeFilterState NoState();
// Maps state to integer for hashing.
size_t Hash() const;
// Equality of filter states.
bool operator==(const SomeFilterState &f) const;
// Inequality of filter states.
bool operator!=(const SomeFilterState &f) const;
// Assignment to filter states.
SomeFilterState& operator=(const SomeFilterState& f);
};
The following composition filters are defined in the OpenFst library:
Name |
Description |
|
SequenceComposeFilter |
Requires epsilons on FST1 to be read before epsilons on FST2 |
|
AltSequenceComposeFilter |
Requires epsilons on FST2 to be read before epsilons on FST1 |
|
MatchComposeFilter |
Requires epsilons on FST1 to be matched with epsilons on FST2 whenever possible |
|
SequenceComposeFilter
is the default composition filter. It can be changed by using the version of
ComposeFst
that accepts
ComposeFstOptions
.
[bad link?]
Expanded Fsts
An
ExpandedFst
is an
Fst that has an additional method that specifies the state count as well as methods to copy and read the expanded FST. In particular, an
ExpandedFst
class has the interface:
template <class A>
class ExpandedFst : public Fst<class A> {
public:
typedef A Arc;
typedef typename A::StateId StateId;
// State count
StateId NumStates();
// Get a copy of this ExpandedFst
virtual ExpandedFst<A> *Copy() const = 0;
// Read an ExpandedFst from an input stream; returns NULL on error
static ExpandedFst<A> *Read(istream &strm, const FstReadOptions &opts);
// Read an ExpandedFst from a file; return NULL on error
// Empty filename reads from standard input
static ExpandedFst<A> *Read(const string &filename);
};
ExpandedFst
is an abstract class (note the pure virtual methods). Examples are
VectorFst
and
ConstFst
FST Input/Output
The following code:
VectorFst<A> ifst;
...
ifst.Write("a.fst");
VectorFst<A> *ofst = VectorFst<A>::Read("a.fst");
writes and reads a defined Fst type (
VectorFst
) and arc type (
A
) to and from a file in a straight-forward way.
Library Registration
The call:
Fst<Arc> *fst = Fst<A>::Read("a.fst");
reads the same
VectorFst
from the file as above, but returns a base
Fst
. This form, useful for code that works generically for different Fst types,
can not work unless the Fst and arc type are
library-registered. Some arc types (see
here) are already registered
for all the Fst types defined in the OpenFst library. Other arc type
A
and Fst type
F
pairs can be registered with the following call:
REGISTER_FST(F, A);
To avoid code bloat in a given program, registering arc types, in particular, should be used sparingly.
Main Registration
In the above examples, the user provided the arc type as a template parameter. However, the call:
$ fstinfo a.fst
works e.g. for both
StdArc
and
LogArc
arcs. This is accomplished by calling in
main(argc, argv)
:
return CALL_FST_MAIN(InfoMain, argc, argv);
where:
template <class Arc>
int InfoMain(int argc; char **argv, istream &strm, const FstReadOptions &opts) {
Fst<Arc> *fst = Fst<Arc>::Read(strm, opts);
...
return 0;
}
is a templated `main' function that does the arc-specific work.
CALL_FST_MAIN
passes to
InfoMain
the command line arguments and an opened stream to an
Fst
(opened from the first argument or standard input if no arguments).
CALL_FST_MAIN
does the type dispatch by examining the Fst's header and then passing on the (partially-read) input stream, which can be used by
InfoMain
to read in the actual
Fst
. This dispatch works only with arc and main function pairs that have been
main-registered. Each OpenFst distribution binary registers its templated main function with the arc types marked registered
here. An arc type
A
and main function
M
pair can be registered with the following call:
REGISTER_FST_MAIN(M, A);
To avoid code bloat in a given program, registering arc types should be used sparingly.
To use main registration in your own program, you need to include additionally
<fst/main.h>
and link additionally to
libfstmain.so
.
FST Dynamic Shared Objects
The examples above show how users can modify programs to be able to read new arc and Fst types. However, it would not be ideal to have to do so
for all the distribution binaries or other existing programs. Instead, this can be done more easily with
dynamic shared objects (DSOs).
To add a new Fst type,
MyFst
with
MyFst::Type()
=
"my_fst"
, use the code:
extern "C" {
void my_fst_init() {
// Register some arc types with this Fst type
REGISTER_FST(MyFst, StdArc);
REGISTER_FST(MyFst, LogArc);
}
}
compiled into a dynamic shared object
my_fst.so
. If
my_fst.so
can be found in the
LD_LIBRARY_PATH
(or equivalent), you should
be able to read the new Fst type with existing programs.
To add a new arc type,
MyArc
with
MyArc::Type()
=
"my_arc"
, use the code:
extern "C" {
void my_arc_init() {
// Register some Fst types with this arc type
REGISTER_FST(VectorFst, MyArc);
REGISTER_FST(ConstFst, MyArc);
// Register the OpenFst binaries with this arc type
REGISTER_FST_MAINS(MyArc);
// Register some other main with this arc type
REGISTER_FST_MAIN(SomeMain, MyArc);
}
}
compiled into a dynamic shared object
my_arc.so
. If can be found in
LD_LIBRARY_PATH
(or equivalent), you should
be able to read the new arc type with existing programs.
Mappers
Matchers
Matchers can find and iterate through requested labels at
FST states. In the simplest form, these are just some associative
map or search keyed on labels. More generally, they may
implement matching special labels that represent sets of labels
such as ρ (rest), σ (all) or φ (fail).
The Matcher interface is:
// Specifies matcher action.
enum MatchType {
MATCH_INPUT, // Match input label.
MATCH_OUTPUT, // Match output label.
MATCH_NONE, // Match nothing.
MATCH_UNKNOWN, // Match type unknown.
};
template <class F>
class SomeMatcher {
public:
typedef F FST;
typedef F::Arc Arc;
typedef typename Arc::StateId StateId;
typedef typename Arc::Label Label;
typedef typename Arc::Weight Weight;
// Required constructors.
SomeMatcher(const F &fst, MatchType type);
SomeMatcher(const SomeMatcher &matcher);
// Returns the match type that can be provided (depending on
// compatibility of the input FST). It is either
// the requested match type, MATCH_NONE, or MATCH_UNKNOWN.
// If 'test' is false, a constant time test is performed, but
// MATCH_UNKNOWN may be returned. If 'test' is true,
// a definite answer is returned, but may involve more costly
// computation (e.g., visiting the Fst).
MatchType Type(bool test) const;
// Specifies the current state.
void SetState(StateId s);
// This finds matches to a label at the current state.
// Returns true if a match found. kNoLabel matches any
// 'non-consuming' transitions, e.g., epsilon transitions,
// which do not require a matching symbol.
bool Find(Label label);
// These iterate through any matches found:
// No more matches.
bool Done() const;
// Current arc (when !Done)
const A& Value() const;
// Advance to next arc (when !Done)
void Next();
// This specifies the known Fst properties as viewed from this
// mapper. It takes as argument the input Fst's known properties.
uint64 Properties(uint64 props) const;
};
The following matchers are defined in the OpenFst library:
Name |
Description |
|
SortedMatcher |
Binary search on sorted input |
|
RhoMatcher<M> |
ρ symbol handling; templated on underlying matcher |
|
SigmaMatcher<M> |
σ symbol handling; templated on underlying matcher |
|
PhiMatcher<M> |
φ symbol handling; templated on underlying matcher |
|
SortedMatcher
requires the underlying Fst be sorted on the appropriate side.
It matches epsilons explicitly (as if it were any other symbol) and as if the current FST state had a self-loop
Arc(kNoLabel, 0, Weight::One(), current_state)
if the
match_type
is
MATCH_INPUT
and
Arc(0, kNoLabel, Weight::One(), current_state)
if the
match_type
is
MATCH_OUTPUT
. This and
composition filter implement composition epsilon handling.
The special symbols referenced above are explained by this table:
|
Consumes no symbol |
Consumes symbol |
Matches all |
ε |
σ |
Matches rest |
φ |
ρ |
The template
Matcher<F>
selects the pre-designated matcher for
Fst
type
F
; it is typically
SortedMatcher
.
Composition uses this matcher by default. It can be changed by using the version of
ComposeFst
that accepts
ComposeFstOptions
.
[bad link?]
An example use of a matcher is
here.
Mutable Fsts
A
MutableFst
is an
ExpandedFst that has additional methods that specifiy
how to set the start state, final weights,
properties and the input and output symbols, how to add and delete states and arcs, as well as methods to copy and read the mutable FST. In particular, a
MutableFst
class has the interface:
template <class A>
class MutableFst : public ExpandedFst<class A> {
public:
typedef A Arc;
typedef typename A::StateId StateId;
typedef typename A::Weight Weight;
// Set the initial state
virtual void SetStart(StateId) = 0;
// Set the initial state
virtual void SetFinal(StateId, Weight) = 0;
// Set property bits wrt mask
virtual void SetProperties(uint64 props, uint64 mask) = 0;
// Add a state, return its ID
virtual StateId AddState() = 0;
// Add an arc to state
virtual void AddArc(StateId, const A &arc) = 0;
// Delete some states
virtual void DeleteStates(const vector<StateId>&) = 0;
// Delete all states
virtual void DeleteStates() = 0;
// Delete some arcs at state
virtual void DeleteArcs(StateId, size_t n) = 0;
// Delete all arcs at state
virtual void DeleteArcs(StateId) = 0;
// Get a copy of this MutableFst
virtual MutableFst<A> *Copy() const = 0;
// Read an MutableFst from an input stream; returns NULL on error
static MutableFst<A> *Read(istream &strm, const FstReadOptions &opts);
// Read an MutableFst from a file; return NULL on error
// Empty filename reads from standard input
static MutableFst<A> *Read(const string &filename);
// Set input label symbol table; NULL signifies not unspecified
virtual void SetInputSymbols(const SymbolTable* isyms) = 0;
// Set output label symbol table; NULL signifies not unspecified
virtual void SetOutputSymbols(const SymbolTable* osyms) = 0;
};
MutableFst
is an abstract class (note the pure virtual methods). An example is
VectorFst
.
The companion
mutable arc iterator class provides access to and modification of the transitions of the FST
Natural Orders
The
natural order ≤ associated with a
semiring is defined as a ≤ b iff a ⊕ b = a.
In the OpenFst library, we define the strict version of this order as:
template <class W>
NaturalLess() {
bool operator()(const W &w1, const W &w2) const {
return (Plus(w1, w2) == w1) && w1 != w2;
}
};
An order is
left monotonic w.r.t a semring iff
a ≤ b ⇒ ∀c,
c ⊕ a ≤ c ⊕ b and
c ⊗ a ≤ c ⊗ b;
right monotonic is defined similarly.
An order is negative iff
1 ≤
0.
The natural order is a left (right) monotonic and negative partial order iff the semiring is
idempotent
and left (right)
distributive. It is a
total order iff the semiring has the
path property. See Mohri, "Semiring Framework and Algorithms for Shortest-Distance Problems",
Journal of Automata, Languages and Combinatorics 7(3):321-350, 2002.
This is the default total order (under the requirements above) that we use for the shortest path and pruning algorithms. This order is the
natural one to use given that it generally needs to be total, monotonic and. negative:
total so that all weights can be compared,
monotonic so there is a practical algorithm, and
negative so that the "free" weight
1
is preferred to the "disallowed" weight
0.
Properties
Each
Fst
has associated with it a set of stored properties that assert facts about it. These are queried in an FST with the
Properties()
method and set in a
MutableFst
with the
SetProperties()
method. OpenFst library operations use these properties to optimize their performance. OpenFst library operations and mutable FSTs attempt to preserve as much
property information in their results as possible without significant added computation.
Some properties are binary - they are either true or false. For each such property, there is a single stored bit that is set
if true and not set if false. The binary
Fst
properties are:
Other properties are trinary - they are either true, false or unknown. For each such property, there are two stored bits;
one is set if true, the other is set if false and neither is set if unknown.
Type |
Name |
Description |
Acceptor |
kAcceptor |
Input and output label are equal for each arc |
|
kNotAcceptor |
Input and output label are not equal for some arc |
Accessible |
kAccessible |
All states reachable from the initial state |
|
kNotAccessible |
Not all states reachable from the initial state |
|
kCoAccessible |
All states can reach a final state |
|
kNotCoAccessible |
Not all states can reach a final state |
Cyclic |
kCyclic |
Has cycles |
|
kAcyclic |
Has no cycles |
|
kInitialCyclic |
Has cycles containing the initial state |
|
KInitialAcyclic |
Has no cycles containing the initial state |
Deterministic |
kIDeterministic |
Input labels are unique leaving each state |
|
kNonIDeterministic |
Input labels are not unique leaving some state |
|
kODeterministic |
Output labels are unique leaving each state |
|
kNonODeterministic |
Output labels are not unique leaving some state |
Epsilons |
kEpsilons |
Has input/output epsilons |
|
KNoEpsilons |
Has no input/output epsilons |
|
kIEpsilons |
Has input epsilons |
|
KNoIEpsilons |
Has no input epsilons |
|
kOEpsilons |
Has output epsilons |
|
KNoOEpsilons |
Has no output epsilons |
Sorted |
kILabelSorted |
Input labels sorted for each state |
|
kNotILabelSorted |
Input labels not sorted for each state |
|
kOLabelSorted |
Output labels sorted for each state |
|
kNotOLabelSorted |
Output labels not sorted for each state |
|
kTopSorted |
States topologically sorted |
|
kNotTopSorted |
States not topologically sorted |
Weighted |
kWeighted |
Non-trivial arc or final weights |
|
kNotWeighted |
Only trivial arc and final weights |
The call
fst.Properties(mask, false)
returns the stored property bits set in the mask bits; some properties
may be unknown.
The call
fst.Properties(mask, true)
returns the stored property bits set in the mask bits after
computing and updating any of those set in the mask that are unknown.
State Iterators
A state iterator
is used to access the states of an FST. It has the form:
template <class F>
class StateIterator {
typedef typename F::Arc Arc;
typedef typename Arc::StateId StateId;
public:
StateIterator(const &F fst);
// End of iterator?
bool Done() const;
// Current state ID (when !Done)
StateId Value() const;
// Advance to next state (when !Done)
void Next();
// Return to initial position
void Reset();
};
It is templated on the Fst class
F
to allow efficient specializations but defaults to a generic version on the abstract
base
Fst class.
See
here for conventions that state iterator use must respect.
An example use of a state iterator is shown
here.
State Queues
State queues are used by, among others, the
shortest path and
shortest distance algorithms and by the
Visit operation. A
state queue
has the form:
template <class StateId>
class SomeQueue {
public:
// Ctr: may need args (e.g., Fst, comparator) for some queues
SomeQueue(...);
// Returns the head of the queue
StateId Head() const;
// Inserts a state
void Enqueue(StateId s);
// Removes the head of the queue
void Dequeue();
// Updates ordering of state s when weight changes, if necessary
void Update(StateId s);
// Does the queue contain no elements?
bool Empty() const;
// Remove all states from queue
void Clear();
};
Pre-defined state queues include:
Queue |
Description |
|
AutoQueue |
Automatically-selected from Fst properties |
|
FifoQueue |
First-In, first-Out |
|
LifoQueue |
Last-In, first-Out |
|
SccQueue |
Component graph top-ordered meta-queue |
|
ShortestFirstQueue |
Priority (least weight) |
|
StateOrderQueue |
State-ID ordered |
|
TopOrderQueue |
Topologically ordered |
|
Some queues accept
arc filters to control which transitions are explored.
State Tables
State tables determine the bijective mapping between state
tuples (e.g. in
composition triples of two FST states and a
composition filter state) and their corresponding state IDs.
They are classes, templated on state tuples, of the form:
template <class T>
class SomeStateTable {
typedef typename T StateTuple;
// Required constructors.
SomeStateTable();
// Lookup state ID by tuple. If it doesn't exist, then add it.
StateId FindState(const StateTuple &);
// Lookup state tuple by state ID.
const StateTuple<StateId> &Tuple(StateId) const;
};
A state tuple has the form:
template <class S>
struct SomeStateTuple {
typedef typename S StateId;
// Required constructor.
SomeStateTuple();
// Data
...
};
A specific state tuple is a
ComposeStateTuple
[bad link?] that has data members
StateId state_id1
,
StateId state_id2
, and
FilterState filter_state
.
The following state tables are defined in the OpenFst library:
Name |
Description |
|
HashStateTable |
Hash map implementation |
|
CompactHashStateTable |
Hash set implementation |
|
VectorStateTable |
Vector implementation |
|
VectorHashStateTable |
Vector and hash set implementation |
|
ErasableStateTable |
Deque implementation - permits erasures |
|
Different state tables provide different time and space tradeoffs for applications.
Composition state tables are defined using state tables with
ComposeStateTuple
. They are the principal data structure used by
composition other than the result
cache.
The following composition state tables are defined in the OpenFst library:
Name |
State Table |
Description |
|
GenericComposeStateTable |
CompactHashStateTable |
General-purpose choice |
|
ProductComposeStateTable |
VectorStateTable |
Efficient when the composition state space is densely populated |
|
StringDetComposeStateTable |
VectorStateTable |
Efficient when FST1 is a string and FST2 is deterministic |
|
DetStringComposeStateTable |
VectorStateTable |
Efficient when FST1 is deterministic and FST2 is a string |
|
EraseableComposeStateTable |
ErasableStateTable |
Allows composition state tuple erasure |
|
GenericComposeStateTable
is the default composition state table. It can be changed by using the version of
ComposeFst
that accepts
ComposeFstOptions
.
[bad link?]
Symbol Tables
User-defined Fst Arcs and Weights
A user may define his own weight type so long as it meets the necessary
requirements.
A user may define his own arc type so long as has the right
form. Some Fst I/O with new arc types requires
registration.
User-defined Fst Classes
Visitors
The simplest way to traverse an FST is in state order using a
state iterator.
A very general traversal method is to use:
Visit(fst, visitor, queue); [bad link?]
where the
visitor
object specfies the actions taken in the traversal while the
state queue
object specifies the traversal order. A
visitor
has the form:
// Visitor Interface - class determines actions taken during a visit.
// If any of the boolean member functions return false, the visit is
// aborted by first calling FinishState() on all unfinished (grey)
// states and then calling FinishVisit().
template <class Arc>
class SomeVisitor {
public:
typedef typename Arc::StateId StateId;
SomeVisitor(T *return_data);
// Invoked before visit
void InitVisit(const Fst &fst);
// Invoked when state discovered (2nd arg is visitation root)
bool InitState(StateId s, StateId root);
// Invoked when arc to white/undiscovered state examined
bool WhiteArc(StateId s, const Arc &a);
// Invoked when arc to grey/unfinished state examined
bool GreyArc(StateId s, const Arc &a);
// Invoked when arc to black/finished state examined
bool BlackArc(StateId s, const Arc &a);
// Invoked when state finished
void FinishState(StateId s);
// Invoked after visit
void FinishVisit();
};
While a depth-first search can be implemented
using
Visit()
with the
LifoQueue()
, it is often better to use the more specialized
DFSVisit() [bad link?]
in
<fst/dfs-visit.h> since it is somewhat more space-efficient and the specialized visitor interface described there has additional funcitionality for a DFS.
Pre-defined FST visitors include:
The visit operations optionally accept
arc filters to control which transitions are explored.
Weights
A
Weight
is a type that is used to represent the cost of taking transitions in an FST.
The following basic weight templates are defined in the OpenFst library:
Semiring |
Name |
Set |
⊕ (Plus) |
⊗ (Times) |
0 (Zero) |
1 (One) |
Notes |
|
Lexicographic |
LexicographicWeight<W1, W2> |
W1 X W2 |
min |
⊗W1 X ⊗W2 |
(0W1,0W2) |
(1W1,1W2) |
min: lexicographic order w.r.t. W1 and W2 natural orders |
|
Log |
LogWeightTpl<T> |
[-∞, ∞] |
-log(e-x + e-y) |
+ |
∞ |
0 |
T : floating point |
|
MinMax |
MinMaxWeightTpl<T> |
[-∞, ∞] |
min |
max |
∞ |
-∞ |
T : floating point |
|
Product |
ProductWeight<W1, W2> |
W1 X W2 |
⊕W1 X ⊕W2 |
⊗W1 X ⊗W2 |
(0W1,0W2) |
(1W1,1W2) |
|
|
String |
StringWeight<L, STRING_LEFT> |
L* ∪ {∞} |
longest com. prefix |
⋅ |
∞ |
ε |
L : signed integral |
|
|
StringWeight<L, STRING_RIGHT> |
L* ∪ {∞} |
longest com. suffix |
⋅ |
∞ |
ε |
L : signed integral |
|
Tropical |
TropicalWeightTpl<T> |
[-∞, ∞] |
min |
+ |
∞ |
0 |
T : floating point |
|
The following weight types have been defined in the OpenFst library in terms of the above:
Name |
Type |
GallicWeight<L, W, S> |
ProductWeight<StringWeight<L, S>, W> |
LogWeight |
LogWeightTpl<float> |
MinMaxWeight |
MinMaxWeightTpl<float> |
TropicalWeight |
TropicalWeightTpl<float> |
Weight pairs, such as
ProductWeight
and
LexicographicWeight
, can use
command line flags to control
their textual formatting.
FLAGS_fst_pair_weight_separator
is printed between the weights (default: ",").
FLAGS_fst_piar_parentheses
(default: "")
brackets the pair; if you create complex nested pairs (i.e.,
tuples), they may need to be printed with non-empty brackets (e.g. "()") to ensure correct parsing if read back in.
Additional weight information:
--
MichaelRiley - 27 Feb 2009