Specialty operators
This describes specialty FST functions for grammar compilation.
fst::Containment
The containment (Beesley & Karttunen 2003: 48f.) of a relation A (with
respect to some alphabet) is the union of all relations that contain some path
through A. We can implement this in Thrax as follows:
func Containment[A, sigma_star] {
return sigma_star A sigma_star;
}
where sigma_star represents the closure over the alphabet.
If A is a transducer, than its containment can be thought of as a single
unconditioned rewrite operation. The fst::CDRewrite function is a
generalization of this rewriting operation which permits repeated rewrites
and conditional restrictions.
fst::CrossProduct
The cross-product operation generates a transducer from two acceptors A,
B. The resulting transducer represents a relation mapping from any string in
A to any string in B (hence the name "cross-product").
In the case that A is a transducer, it is implicitly reinterpreted as the
acceptor given by its input projection; similarly, in the case that B is a
transducer, it is implicitly reintrepreted as the acceptor given by its output
projection.
The user may also specify a super-final weight to be concatenated to the
resulting transducer to assign a non-Zero cost to the transduction.
The algorithm is as follows:
- Replace all output labels of A with epsilon.
- Replace all input labels of B with epsilon.
- Concatenate (1) and (2).
- If a final weight (other than semiring One, the default), is specified, create a superfinal state with the desired final weight and concatenate it with (3).
- Push the labels of (3-4) towards the initial state.
- Perform epsilon-removal on (5).
fst::LenientlyCompose
The lenient composition (Karttunen 1998) of two FSTs A, B is simply
the priority union (to be defined) of (A ∘ B) and A. The priority union
of two FSTs C, D is similar to the
union
of C and D
except that the relations in C have "priority" over those in D. For
example, let it be the case that:
C(a) → b
D(a) → c
The ("vanilla") union of C, D is then the relation
a → { b, c }, whereas the
priority union of C, D is the narrower relation a → c. We
can implement this in Thrax as follows:
func PriorityUnion[C, D, sigma_star] {
return C | ((sigma_star - RmEpsilon[Project[C, 'input']]) @ D);
}
where sigma_star represents the closure over the alphabet.
The lenient composition of FSTs A, B is simply the priority union of
(A ∘ B), A. We can implement this in Thrax as follows:
func LenientlyCompose[A, B, sigma_start] {
return PriorityUnion[A @ B, A, sigma_star];
}
where sigma_star represents the closure over the alphabet.
fst::Repeat
The repeat operation is a generalization of
closure. A path
through the closure of an FST A is valid if it can be
generated by taking zero or more paths through A.
The first argument to fst::Repeat represents a lower bound on the number
of paths through the input FST required to form a valid path in the output FST.
The second argument represents the upper bound on the number of paths through
the input FST; by convention, 0 is used to indicate an infinite upper bound.
Naturally, the lower bound must be less than or equal to the upper bound.
We can then derive the following familiar operations as special cases:
- To compute "star"-closure, use arguments
0, 0 .
- To compute "plus"-closure, use arguments
1, 0 .
- To generate exactly k concatenations of an FST, use arguments
k, k .
References
Beesley, K. R., and Karttunen, L. 2003. Finite state morphology. Stanford, CA:
CSLI.
Karttunen, L. 1998. The proper treatment of Optimality Theory in computational
phonology. In Proc. FSMNLP, 1-12. |