OpenFst Forum 2015 Archive

converting standard lattice format into and FST file

Hi i have Standard lattice format files, and i want to convert them into WFST to use them in Kaldi each file contains more than one NULL node, which were used in htk to decrease number of arcs, now if i want to convert those files to FST What should those null nodes be in the symbol file Not that i am working with the specific symbol file as input symbol file and output symbol file in order to finish the process

Minimization for fsts with negative weights

--

I get weird results when I try to minimize a tropical weight fst with negative weights as shown below. Additionally, the minimization takes quite long (around 20 seconds) although the fst only has 2 states.

$cat bar.txt 0 1 1 1 -5 1 1 2 2 -1 1 0$ fstcompile bar.txt bar.ofst
$fstminimize bar.ofst bar.min.ofst$ fstprint bar.min.ofst
0      1      1      1      -16777220
1      1      2      2
1      16777216


As far as I understand, this fst is already minimal so fstminimize shouldn't really do anything.

If I convert all weights to positive weights, the problem vanishes

$cat baz.txt 0 1 1 1 5 1 1 2 2 1 1 0$ fstcompile baz.txt baz.ofst
$fstminimize baz.ofst baz.min.ofst$ fstprint baz.min.ofst
0      1      1      1      5
1      1      2      2      1
1


DoganCan - 2015-09-02 - 22:29

Hi Miikka,

The minimization algorithm first applies weight pushing to normalize the distribution of weights along the paths of the input automaton. Weight pushing algorithm requires shortest distance from any state to final states to be defined. The example automaton with negative weights violates this condition due to the negative weight cycle.

You can look into Jason Eisner's WFSA minimization algorithm if you need to minimize automata with negative weights, particularly if they have negative weight cycles.

Cheers, Dogan

MiikkaSilfverberg - 2015-09-03 - 06:59

Thanks for the explanation Dogan I'll look into Eisner's minimization algorithm. Do you happen to know about freely available implementations?

DoganCan - 2015-09-03 - 22:34

Hi Miikka

I don't know if there is a publicly available implementation of Eisner's algorithm.

Cheers, Dogan

Gentle Intro to Lazy FST Operations

KennethRBeesley - 2015-08-13 - 16:18

Where can I find a gentle introduction to lazy FST operations? When and how to use them? Tradeoffs in size and performance?

MichaelRiley - 2016-05-27 - 10:18

there is teh OpenFSt library paper ref'ed under background material topic here and there is some comments on efficiency in the efficiency topic.

Wildcard arcs

RussellMoore - 2015-08-10 - 09:58

Hi, sorry if this has been asked before. I am using OpenFST via the command line interface, so I want to build my FSTs as text files before compiling them. I would like to know if it's possible to add an arc that will:

• accept any as-yet unspecified symbol as input
• output the same symbol

So in my head I imagine something like:

0 1 a a [wgt_a]
0 1 * * [wgt_other]


Where the asterisks represent "any other symbol", i.e. that arc operates as a catch-all for any symbol that is not "a".

(Update 17:35 - I think what I want is a "rho" matcher. Can I specify this inside a text file? Or is there another way to do it?)

Thanks!

DoganCan - 2015-08-11 - 22:59

Hi Russell,

In OpenFst, special symbol information is not stored in the FSTs themselves, but are handled by the FST operations that accept these symbols as arguments. Unfortunately, special symbol matching operations are not available through the command line interface of standard OpenFst binaries. What you can do instead is to use a custom FST composition binary that can handle rho symbols.

Check out https://github.com/dogancan/lexicon-unk. There you will find a tool called fstphirhocompose which allows bare bones matching of phi and rho symbols on the second FST. This tool won't do what you need out of the box since you want matched symbols to be copied to the output side. To fix that, simply change each occurrence of MATCHER_REWRITE_NEVER with MATCHER_REWRITE_ALWAYS in fstphirhocompose.cc before compiling the code. You use this tool exactly the same way you would use fstcompose, however you can also specify the labels corresponding to the phi and rho symbols in the second FST through the command line interface. Take a look at the run.sh script in that repo to see how this tool can be used to match special symbols.

Cheers, Dogan

TopSort not working on FST

VivekRangarajan - 2015-07-27 - 15:33

I run into a weird issue when I create a fst in C++ from a string by adding states and arcs. I have a function that does this work. When I write the FST using Write(), the final state is printed at the top. It looks like this:

5
0   1   66   66
0   2   81   81
0   3   2837   2837
1   2   10   10
1   3   749   749
1   4   930   930
2   3   66   66
2   4   324   324
2   5   864   864
3   4   8   8
3   5   146   146
4   5   30   30


fsttopsort on this fst does not sort the states either. I overcame this by just piping fstprint X | sort -k 1 -n | fstcompile

Any ideas as to why this might be happening and how it can be fixed inside the C++ code?

DoganCan - 2015-08-03 - 20:08

Hi Vivek,

It is hard to debug your problem without seeing the code. My best guess is that you are not setting state 0 as the initial state.

Cheers, Dogan

VivekRangarajan - 2015-08-05 - 16:38

Thanks Dogan. I resolved this sometime back and should've added a comment. Your guess was right, I was setting the SetStart to str.size() instead of 0.

Generation of words accepted by a Finite-State Machine

JulioCarrasquel - 2015-06-04 - 15:00

Hello,

I would like to know how to print all the words that can be accepted for a Finite-State Machine. Note that I'm using acceptors. There is a library called Lextools who does the Job, specifically using a function of that library called lexfsmstrings , but unfortunately it's not longer available free by the guys of AT&T.

MichaelRiley - 2016-05-27 - 10:20

You could ask for a very large number of n-best from fstshortestpath.

GaborPinter - 2016-08-01 - 06:46

> all the words... Provided the FSA/FST is acyclic (otherwise it would take somewhat longer), it is relatively easy to write a program in C++ that recursively iterates over all the arcs. With output size of around 10k per fst it was surprisingly fast.

KyleGorman - 2016-10-11 - 19:45

Pynini (pynin.opengrm.org) i also has something called StringPaths which enumerates all paths in an acyclic FST. If you don't want to use Python, you could build something off of the paths.h header therein.

If the FST is not acyclic, it would, of course, take not just "somewhat" longer but infinitely longer

Arcs labeled with strings

JulioCarrasquel - 2015-06-03 - 09:51

Hi again,

I have another question. I need to print using Graphviz a FST whose arc labels could be displayed as letters. I will highly appreciate a solution for this case. Thanks

Julio

JulioCarrasquel - 2015-06-03 - 11:22

I've already found a solution to this too. Thanks.

Putting only one expression on arcs

JulioCarrasquel - 2015-06-01 - 11:12

Hello,

I need to put only one expression in the arcs of my FST since I'm not<br> modelling transducers, but acceptors. <br>

I mean, for example, to put in an arc only "x" instead of "x:y". <br>

If someone has a quick solution to this, I will highly appreciate it.

Regards, <br> Julio Carrasquel <br> juliocarrasquel@gmail.com

JulioCarrasquel - 2015-06-01 - 11:14

I forgot to mention, I'm using in my project the command line library.

JulioCarrasquel - 2015-06-03 - 11:22

Nevermind,

I found a solution using the option --acceptor=true through the command fstdraw.

Thanks in any case.

Expected edit distance

RolandSchwarz - 2015-05-08 - 03:26

Hi,

I was wondering if it is possible using the library to compute the expected edit distance between two distributions over strings.

So, if X and Y are FSAs that contain a single sequence each, and T is my edit distance transducer then shortestdist(X o T o Y) over the tropical SR computes the edit distance.

(a) Now assume X and Y are probabilistic FSAs (log SR), i.e. they define a valid distribution over a set of sequences, how do I compute the expected edit distance, i.e. \sum_x \sum_y P(x) P(y) ED(x,y)?

(b) And finally, given a probabilistic FST that defines a joint distribution for X and Y, how do I compute the expected edit distance over the joint, i.e. \sum_x \sum_y P(x,y) ED(x,y)?

I guess in the case (a) the expectation SR might help? But how do I do it in the case (b)?

Any help would really be appreciated.

Thanks,

Roland

DoganCan - 2015-05-11 - 19:58

Hi Roland,

There is an algorithm by Mohri for computing expected edit distance between weighted automata. See this paper for details.

I implemented a slightly modified version of this algorithm some time ago. I just put together a simple setup based on that implementation and pushed it to GitHub in case you want to take a look at it.

Cheers, Dogan

RolandSchwarz - 2015-05-13 - 05:51

Amazing, thanks, I'll check it out!

Roland

Assigning the symbol table to FSTs in C++

AnuragDas - 2015-05-04 - 08:56

Hi, I am able to build a FST from a string through C++. But the symbol table is showing "None" when i do a fstinfo. I want to add symbol table like "**Byte" which gets added automatically when we do "thraxmakedep" from a grammar file with --save_symbols set = True but through c++ only.

From weighted words to weighted phrase: how to isolate word scores at phrase level?

FrancoisSofilvacer - 2015-04-11 - 13:43

Such an operation must be classical in any concrete usage of FST cascade (OCR, speech, ...) but I struggle to express it in OpenFST.

Take a simple example: the input is a character stream, made of words inter-spaced with blanks. A non-weighted and functional lexicon FST maps sequences of letters to words. To deal with uncertainty, a weighted and non-functional extension of it maps a sequence of letters to word/score pairs.

For handling a sequence of words, a concatenation of WFSTlex is needed, resulting in multiplying (in the weight semiring) word scores but eventually, the contribution of each word is lost: the only score available encompasses the entire input.

So a language model FST, further down in the cascade has no access to individual word scores, preventing it to compute a context-dependent weighted average for example, associating a word x with a different coefficient depending on it preceding y or z.

Compiling with mingw32

BenoitFavre - 2015-02-26 - 07:44

In case you want to compile openfst 1.4.1 with mingw (for cross-compiling it for windows from unix, for example). You need to use a mmap implementation such as mman-win32 and set LDFLAGS to use it.

make LDFLAGS=-lmman

Here is the patch:

diff -rupN old/src/lib/mapped-file.cc new/src/lib/mapped-file.cc

old/src/lib/mapped-file.cc 2014-04-30 00:15:18.000000000 +0200 +++ new/src/lib/mapped-file.cc 2015-02-26 08:53:46.339405863 +0100 @@ -20,6 +20,10 @@ #include <fcntl.h> #include <unistd.h>

+#if (defined _WIN32 || defined _WIN64 || defined WINDOWS || defined MINGW32) +#include <windows.h> +#endif + namespace fst {

// Alignment required for mapping structures (in bytes.) Regions of memory @@ -76,7 +80,13 @@ MappedFile* MappedFile::Map(istream* s, size_t pos = spos; int fd = open(source.c_str(), O_RDONLY); if (fd = -1) { +#if (defined _WIN32 || defined _WIN64 || defined WINDOWS || defined MINGW32) + SYSTEM_INFO system_info; + GetSystemInfo(&system_info); + int pagesize = system_info.dwPageSize; +#else int pagesize = sysconf(_SC_PAGESIZE); +#endif off_t offset = pos % pagesize; off_t upsize = size + offset; void *map = mmap(0, upsize, PROT_READ, MAP_SHARED, fd, pos - offset);

Is there an existing implementation to check if one node is reachable from another in an fst ?

KostyaK - 2015-02-14 - 17:49

Hi! I used with etalon code to do lookahead composition: <verbatim> typedef fst::StdVectorFst GRAPH; GRAPH _graph1, _graph2; //... fst::StdOLabelLookAheadFst graph1Look(_graph1); std::auto_ptr<GRAPH> graph2TrueRelabel(_graph2.Copy(true)); fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true); fst::ArcSort(graph2TrueRelabel.get(), fst::StdILabelCompare()); std::auto_ptr<GRAPH> res(new GRAPH); fst::Compose(graph1Look, *graph2TrueRelabel.get(), res.get()); </verbatim> It's work great, but i want to do lookahead composition without PushWeightsComposeFilter for example. First, I try to repeat lookahead composition without class wrappers: <verbatim> typedef fst::StdVectorFst GRAPH; GRAPH _graph1, _graph2; //... typedef fst::LabelLookAheadMatcher<fst::SortedMatcher<GRAPH>, fst::olabel_lookahead_flags> LOOK_MATCHER; typedef fst::SortedMatcher<GRAPH> SORTED_MATCHER; typedef fst::SequenceComposeFilter<LOOK_MATCHER, SORTED_MATCHER> SEQ_FILTER; typedef fst::LookAheadComposeFilter<SEQ_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> LOOK_COMPOSE_FILTER; typedef fst::PushWeightsComposeFilter<LOOK_COMPOSE_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_WEIGHTS_FILTER; typedef fst::PushLabelsComposeFilter<PUSH_WEIGHTS_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_LABELS_FILTER;

typedef PUSH_LABELS_FILTER COMPOSE_FILTER;

// My compose options fst::CacheOptions cacheOptions; fst::ComposeFstImplOptions<LOOK_MATCHER, SORTED_MATCHER, COMPOSE_FILTER> composeOptions(cacheOptions); composeOptions.matcher1 = new LOOK_MATCHER(_graph1, fst::MATCH_OUTPUT);

//Relabel _graph1 such as fst::StdOLabelLookAheadFst graph1Look(_graph1); LOOK_MATCHER::MatcherData* matcherData = composeOptions.matcher1->GetData(); fst::LabelReachable<Arc> graph1Reachable(matcherData); std::auto_ptr<GRAPH> graph1Relabeled(&_graph1); graph1Reachable.Relabel(graph1Relabeled.get(), false); graph1Relabeled->SetInputSymbols(NULL);

//Relabel _graph2 such as fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true); fst::LabelReachable<Arc> graph2Reachable(matcherData); std::auto_ptr<GRAPH> graph2Relabeled(&_graph2); graph2Reachable.Relabel(graph2Relabeled.get(), true);

fst::ArcSort(graph2Relabeled.get(), fst::StdILabelCompare());

composeOptions.matcher2 = new SORTED_MATCHER(*graph2Relabeled.get(), fst::MATCH_INPUT); composeOptions.filter = new COMPOSE_FILTER(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions.matcher1, composeOptions.matcher2);

std::auto_ptr<GRAPH> composedRes(new GRAPH); *composedRes = fst::ComposeFst<Arc>(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions); </verbatim>

But result of my lookahead composition code is not equal to result of etalon code: my graph is much smaller than etalon. What I'm doing wrong? May be exists other way?

KostyaK - 2015-02-14 - 17:57

<verbatim> test </verbatim>

KostyaK - 2015-02-14 - 18:08

Text formatting don't work?

fstshortestpath --unique?

GezaKiss - 2015-02-10 - 14:50

I used this command, expecting it to produce all unique output strings - but it produced only one. The FST I tried it on contained just the two possible pronunciations of the orthographic form "record", and nothing else (here '}' separates the input and output symbols): r}9r e}E c}k o}3r r}<epsilon> d}d r}9r e}I c}k o}oU r}9r d}d

My questions: 1) What is the meaning of the --unique parameter? 2) How can I get the behavior I want using OpenFst (i.e. all the possible output strings)?

Thanks! Géza

KyleGorman - 2015-09-03 - 16:20

Hi Géza, I assume you've figured this out by now, but just in case: --unique simply indicates that the resulting shortest path FST will contain only distinct paths, but it doesn't specify how many distinct paths will be present. How many paths are returned is controlled by the --nshortest flag, which it defaults to 1.

(Presumably the reason --unique doesn't have the semantics you expected it to is that the machine might be cyclic, in which case termination would not be guaranteed. But any suggestions to improve the documentation would be taken under consideration.)

farcreate bug

DoganCan - 2015-01-15 - 21:36

Hi,

I just wanted to report a small bug in farcreate. When the --file_list_input option is set to true, farcreate skips the first input file on the command line due to an off by one error. Below is the relevant fix.

Cheers, Dogan

diff -ur openfst-1.4.1.orig/src/include/fst/extensions/far/create.h openfst-1.4.1/src/include/fst/extensions/far/create.h
--- openfst-1.4.1.orig/src/include/fst/extensions/far/create.h   2014-04-25 02:55:40.000000000 -0700
+++ openfst-1.4.1/src/include/fst/extensions/far/create.h   2014-04-25 02:57:48.000000000 -0700
@@ -48,7 +48,7 @@

vector<string> inputs;
if (file_list_input) {
-    for (int i = 1; i < in_fnames.size(); ++i) {
+    for (int i = 0; i < in_fnames.size(); ++i) {
ifstream istrm(in_fnames[i].c_str());
string str;
while (getline(istrm, str))


KyleGorman - 2015-09-10 - 19:49

Hi, thanks for that bug report. A fix should be in the next release.