There are several ways to "optimize" a weighted finite-state transducer (WFST).
The C++ template function `fst::Optimize`

underlies Pynini's `optimize`

instance method and Thrax's `Optimize`

function. But what exactly does this make
more optimal [sic]? How does it accomplish that? And what, if anything, can we
assert about the resulting transducer?

Tolstoy wrote that "All happy families are alike; each unhappy family is unhappy in its own way", and that's true of WFSTs too.

- WFSTs may have identically labeled
*multi-arcs*, arcs with the same source state, destination state, input label, and output-label.**Arc-sum mapping**merges such arcs using ⊕ to combine their weights, making the WFST*multi-arc-free*. - WFST operations (including composition, concatenation, and union) introduce
*epsilon-transitions*, arcs with epsilon input and output labels (e.g., all but two arcs here).**Epsilon-removal**removes these transitions. - A WFST is
*deterministic*if for each state, there is never more than one arc with a given label leaving that state (e.g., state 1 here).**Determinization**makes an WFST deterministic. While this can actually increase the size of the WFST, it is a prerequisite for the next algorithm. - A WFST is
*minimal*if it has the minimum number of states possible for expressing its relation.**Minimization**modifies a deterministic WFST's topology so that it becomes minimal (e.g., the examples here).

`fst::Optimize`

checks whether the WFST is known to be epsilon-free; if not,
epsilon-removal is performed. Then, arc-sum mapping is performed. Then, it
attempts to perform minimization, determinizing along the way, if necessary, as
follows. If the WFST is (known to be) an acyclic acceptor but not (known to be)
deterministic, we determinize it, then minimize it. This works because any
acyclic transducer (over a *zero-sum-free* semiring) is determinizable (Mohri
2009).

However, if the WFST is a transducer, it may be non-functional (i.e., it
describes a one-to-many relation); non-functional transducers can be
determinized, but this is quite inefficient. And, if the WFST has weighted
cycles (whether or not it's an acceptor or a transducer), it may lack the
*twins property*, in which case the determinization algorithm will never
terminate. However, in such a case, it is still usually possible to reduce the
number of arcs and states in the WFST (Allauzen et al. 2004): the WFST is
determinized and minimized *as if it were an acceptor* (and if the WFST has
weighted cycles, *as if it were an unweighted acceptor*). (Alternatively, we
could have checked for the twins property and/or for non-functionalism, but both
checks are themselves quite expensive, so we have chosen not to.) In that case,
it is also necessary to perform another round of arc-sum mapping as the
aforementioned trick may may introduce new multi-arcs.

Once we have performed optimization on a WFST, it is free of multi-arcs and unnecessary epsilon-epsilon arcs. And, if the WFST was an acyclic acceptor, it is also deterministic and minimal.

We **cannot** be sure that it is smaller (in terms of the number of states and
arcs) than the input.

Pynini and Thrax silently perform two other types of WFST optimization. A
cross-product transducer—what you get when you do `t("foo", "bar")`

in
Pynini or `"foo" : "bar"`

in Thrax—is constructed as follows:

- Replace the output labels of the left-hand side argument with epsilons,
- replace the input labels of the right-hand side argument with epsilons, and
- concatenate the result.

When both arguments are in fact strings represented as WFSTs, this will produce as much as twice as many states and arcs as strictly necessary. Pynini and Thrax therefore perform label-pushing and epsilon-removal to remove these unnecessary arcs and states. (If either of the components in the cross-product isn't a string, this isn't guaranteed to help, so we don't bother.)

When computing the difference of two WFSTs, the right-hand side must be epsilon-free and deterministic. If it is not known to have these properties, Thrax and Pynini perform epsilon-removal and attempt determinization.

Allauzen, C., Mohri, M., Riley, M., and Roark, B. 2004. A generalized
construction of integrated speech recognition transducers. In *ICASSP*, 761-764.

Mohri, M. 2009. Weighted automata algorithms. In M. Droste, W. Kuich, and H.
Vogler, *Handbook of weighted automata*, pages 213-254. New York: Springer.

Edit | Attach | ~~Watch~~ | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions

Topic revision: r2 - 2017-07-05 - KyleGorman

Copyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback