Symbol table operations
This describes functions for symbol table operations.
Each FST arc has an input (
ilabel
) and output (
olabel
) label. Symbol tables
can be used to map between these labels and actual strings (which may be bytes,
Unicode codepoints, phones, words, etc.). See the
symbol table documentation
for more information.
fst::MergeSymbols
The function
fst::MergeSymbols
takes two mutable FST arguments and an enum
specifying how the tables are to be merged:
-
MERGE_INPUT_SYMBOLS
: merges the input tables of the input FSTs.
-
MERGE_OUTPUT_SYMBOLS
: merges the output tables of the input FSTs.
-
MERGE_INPUT_AND_OUTPUT_SYMBOLS
: merges both input and output tables of the input FSTs (i.e., for intersection, union, etc.).
-
MERGE_LEFT_OUTPUT_AND_RIGHT_INPUT_SYMBOLS
: merges the left-hand side's input symbols with the right-hand side's output symbols (e.g., for composition).
Asked to merge two tables (which themselves may be null), the algorithm proceeds
as follows:
- If one or both both tables are null, do no work.
- If the tables' ("labeled") checksums match, do not work.
- Otherwise, merge the two tables by adding symbols from the first to the second table, and the second table to the first.
Only in the last case is there the possibility of a labeling conflict (i.e., the
two tables map separate labels to the same symbol, or separate symbols to the
same label). In the case of conflict, the second FST may require relabeling. The
fst::MergeSymbols
function does this automatically so long as the flag
--fst_relabel_symbol_conflicts
is set to true (the default). However, if
relabeling is required to resolve a conflict but this flag is set to false,
fst::MergeSymbols
logs a warning and returns false to indicate failure.
Symbol table merging in Pynini
The above function is used extensively in Pynini to ensure symbol table
compatibility for core rational operations like composition, intersection, and
union. This is done automatically.