OpenFst Forum 2014 Archive

--++ Using LabelLookAheadMatcher on input fst and PhiMatcher on output fst in LookAheadComposeFilter

OliverW - 2014-12-10 - 16:19

Hi,

I'm trying to use the LabelLookAheadMatcher on an input fst (a lexicon transducer) and the PhiMatcher on an output fst (a custom ngram fst) in a LookAheadComposeFilter.

The setup is as follows:

// Phimatcher typedef fst::PhiMatcher<fst::SortedMatcher<fst::Fst<fst::LogArc> > > PM;

// some definitions for the lookahead matcher class
OLookAhead {
 public: typedef fst::LogArc ARC;
 typedef fst::Fst<ARC> FST;
 typedef fst::SortedMatcher<FST> SM;
 typedef fst::LabelLookAheadMatcher<SM, fst::kOutputLookAheadMatcher | fst::kLookAheadEpsilons | fst::kLookAheadWeight | fst::kLookAheadPrefix | fst::kLookAheadNonEpsilonPrefix> OLAM;
 typedef fst::AltSequenceComposeFilter<OLAM, PM> ASCF;
 typedef fst::LookAheadComposeFilter<ASCF, OLAM, PM, fst::MATCH_OUTPUT> LACF;
 typedef fst::LabelReachableData<typename ARC::Label> LRD;
 typedef fst::LabelReachable<ARC, fst::DefaultAccumulator<ARC>, LRD> Reachable;
};

// the lookahed matcher
OLookAhead::OLAM *OLAM12 = new OLookAhead::OLAM(Input_Unk_Lex_OSort, fst::MATCH_OUTPUT);

// create relabler to be used to relabel LanguageModelFST input labels
OLookAhead::Reachable RD(OLAM12->GetData());

// create phi matcher with relabeled phi symbol
PM *PM22 = new PM(LanguageModelFST, fst::MATCH_INPUT, RD.Relabel(PHI_SYMBOLID), false);

// the lookahead filter
OLookAhead::LACF *LACF = new OLookAhead::LACF(Input_Unk_Lex_OSort, LanguageModelFST, OLAM12, PM22);

// the composition options
fst::ComposeFstImplOptions<OLookAhead::OLAM, PM, OLookAhead::LACF> copts2(fst::CacheOptions(), OLAM12, PM22, LACF);

// do the composition
fst::ComposeFst<fst::LogArc> Input_Unk_Lex_LM(Input_Unk_Lex_OSort, LanguageModelFST, copts2);

The input labels of my language model fst are relabeled according to the relabler. Unfortunately the composition result only contains one start state.

Interestingly if I leave out the relabling of the PHI symbol and the Language model input arc, I get an almost complete composition result the only part missing are the last states wich are reached by a transition with <eps> on the output arc.

Examples for the fsts can be found in here: https://www.dropbox.com/sh/dt7jayrw8upw9dg/AACOS99CeBvnGkc2xbowQnMTa?dl=0

in_lex_lm: the composition result in_lex: the input fst (lexicon) lm: the output fst (relabeled ngram) lm.orig: lm without relabeling Relabeling.txt: the used label mapping

I would be happy if someone can help me with my problem. Is it possible that the phi matcher dois not work with the lookahead? I could not get my head around it but thinking about it, it seems quite complicated. How does the lookahed for the phi label behave?

Oliver

Log In

Special arcs from the command line?

StevenBedrick - 2014-12-01 - 17:34

Hello-

Is there any way to use phi, rho, or sigma arcs with the OpenFST command-line tools? I know it's possible using the C++ API, but it'd be lovely to be able to use them as easily as we currently use epsilon arcs with fstcompile and friends.

-SB

Log In

Error when compile openfst 1.4.1

NguyenHieu - 2014-11-23 - 23:49

I have compiled openfst 1.4.1 on ubuntu 13.10_64bit, every thing done well but when I use command: "fstinfo --help", it annouce error: "fstinfo: error while loading shared libraries: libfstscript.so.0: cannot open shared object file: No such file or directory". Anyone can help me to solve this problem

DoganCan - 2014-11-24 - 21:27

Hi Nguyen,

Did you do the "make install" step? That would explain why this library can not be found on your system even if you did not specify a custom installation prefix during configuration.

Alternatively, if you specified an installation prefix during configuration, you may need to add that path to LD_LIBRARY_PATH.

Hope these help.

Cheers, Dogan

Log In

Minimizing Transducers

AnA - 2014-11-21 - 12:21

Is there a way to minimize a finite state transducer, s.t. the minimized transducer has the same input/output behavior as the original transducer?

Example:

0 1 1 1
0 1 0 1
1 2 0 0
1 2 1 0
2 1 1 0
2 1 0 0
(a.fsm)

For this example I would expect to get the following minimized transducer:

0 1 0 1
0 1 1 1
1 1 1 0
1 1 0 0

If I run

  fstcompile < a.fsm > b.fsm
  fstminimize b.fsm > c.fsm
I get an empty transducer instead.

DoganCan - 2014-11-24 - 21:10

Hi,

The reason you are getting an empty transducer is because your input transducer does not have any final states.

Looking at the minimization result you are expecting, I'm guessing your input should look like the following.

0 1 1 1
0 1 0 1
1 2 0 0
1 2 1 0
1
2 1 1 0
2 1 0 0
2

Cheers, Dogan

Log In

Determinize. Is this a bug ?

OlivierB - 2014-10-06 - 06:50

Hi,

When I determinize this small (5 states, 7 arcs) Finite state recognizer, the resulting finite state machine is made of 48653930 states and 72980894 arcs and use 1.7 G of disk space.

Is this really the expected behaviour ? If so, can someone give me an explanation for this ? Thanks, Olivier

<verbatim> 0 1 1 1 10 0 3 1 1 10 1 2 2 2 10 1 2 1 1 1 3 1 1 1 3 4 2 2 4 3 1 1 </verbatim>

OlivierB - 2014-10-06 - 06:59

New try, here is the input FST, with hopefully better formating :

0 1 1 1 10 <br>

0 3 1 1 10 <br>

1 2 2 2 10 <br>

1 <br>

2 1 1 1 <br>

3 1 1 1 <br>

3 4 2 2 <br>

4 3 1 1 <br>

CyrilAllauzen - 2014-12-01 - 17:35

This is indeed the expected behaviour. In theory, the algorithm should not terminate but it converges due to floating point issues. The reason for that is that your machine does not have the twins property so there is no guaranty that the algorithm terminates. You can read more about this in the references on the determinize page. If you are really interested in the subject, you can also look at Daniel Kirsten's paper: your machine is polynomially ambiguous and does not have the clone property, so the determinization algorithm will not terminate.

KennethRBeesley - 2015-08-28 - 18:15

It would be very convenient (for me at least) if the OpenFst library included a boolean HasTwinsProperty(A) function that would return true if and only if FST A had that property. Before trying to determinize any FST with cycles, OlivierB and I could simply call HasTwinsProperty(A) to make sure that the algorithm would terminate. (I.e., I'm not suggesting that OpenFst itself should routinely call HasTwinsProperty(A) and store the result in the properties of each FST. I know that it's a rather expensive test to perform. Rather, it would just be available for end-users like me to call explicitly when we need to know.)

Log In

FST Permute ?

SriramVenkatapathy - 2014-08-20 - 11:04

I remember using a python library sometime in 2009 (it was the AT&T version i think) which had a function to permute an FST given the window of permutation. I don't find it in OpenFst, neither can I find again the AT&T library that had this function. Does any have any pointers ?

Log In

Problems using farcompilestrings and far_reader?

EstherJudd - 2014-08-09 - 13:11

I am trying to create pronunciations for a long list of words to evaluate the g2p module. I figured the fastest way to do it is to create a FAR archive of all the words and then in c++ compose each FST in the FAR archive with the trained g2p model FST. But I am running into trouble with extracting each FST in c++. It says it finds a bad FST header <unspecified>.

Does anybody have similar code that I can derive from? Or tell me the correct flags to compile the FAR archive? Right now I use farcompilestrings --symbols=en.isyms --keep_symbols=1 test.txt test.far

The c++ code is borrowed from /usr/local/include/thrax/evaluator.h

EstherJudd - 2014-08-17 - 02:01

I got a little bit further here because I found out I forgot to include /usr/local/include/libfstfarscripts.so as a linker file (I did hav libfstfar.so but that wasn't enough. Now I am stuck with the call to the function ReassignSymbols which is defined in thrax/evaluator.h. It says it has not been declared in the scope of main(). I added #include <thrax/evaluator.h> to the .cc file but that is not enough. What is the best way in c++ to include a function defined in a header file like that?

PaulDixon - 2014-08-18 - 06:33

How about this? https://gist.github.com/edobashira/8183f3ef2e7431345241 It composes the model FST with every FST in the input FAR file and writes the output to another FAR file. For farcompilestrings I just used the symbols flags and the program assumes none of the FSTs have symbols tables and everything is in the tropical semiring.

I'm compiling on OSX 10.9 so you might need slightly different compiler switches for your platform

EstherJudd - 2014-08-19 - 14:08

Thanks, I was able to compile the program on Ubuntu by slightly changing the Makefile. I have one question though, how do you do it with symbol tables? The input far has only letters and the model fst goes from letters to phonemes, so the output is phonemes.

PaulDixon - 2014-08-19 - 15:22

If the symbol table in the far file and the letter symbol table of the g2p model are the same then compose won't throw an error and everything should just work.

EstherJudd - 2014-08-19 - 15:45

The problem is that when you compile a far file, the symbol table only is added to the first FST and you have to make sure you transfer it to the rest. I have been trying to use code from thrax/evaluator.h to use a function called ReassignSymbols but haven't figured out quite how to pull that off yet.

PaulDixon - 2014-08-20 - 02:05

OK, how about just compiling the far --keep_symbols=false and compile the model --keep_isymbols=false

EstherJudd - 2014-08-20 - 03:02

I will give that a try.

EstherJudd - 2014-08-20 - 07:13

That was it! Thanks so much for your help. Now I just have to figure out how to extract the output strings into a text file but that shouldn't be too hard.
Log In

fstcompile to fstdraw

AndreSchlichting - 2014-07-25 - 11:09

Is it possible to compile to draw a fst that has this format?

0   1   3005   0   4.71582,57,115
0   2   3783   0   0,57,115
0   3   0      1   -2.39746,71,347
...

I mean because of the weights: Tks.

DoganCan - 2014-07-25 - 16:21

Hi Andre,

Maybe something like this?

https://github.com/dogancan/tropical_LT_tropical_LT_tropical-arc

Cheers, Dogan

Log In

Double Weighted arcs with composition lookahead filter

DominicHowell - 2014-07-25 - 06:00

Hi I am using the additional lattice4 library to provide two-weighted arcs. It works great but I am having a problem using the composition lookahead filter on WFSTs of the lattice4 arc type. I was just wondering if anyone had ever used composition filters on WFSTs with user-defined weights? Any help would be much appreciated.

PaulDixon - 2014-07-29 - 16:09

Try this (taken from matcher-fst.h)
typedef MatcherFst<ConstFst<KaldiLatticeArc>,
          LabelLookAheadMatcher<SortedMatcher<ConstFst<KaldiLatticeArc> >,
                                olabel_lookahead_flags,
                                FastLogAccumulator<KaldiLatticeArc> >,
                                olabel_lookahead_fst_type,
                                LabelLookAheadRelabeler<KaldiLatticeArc> >
          KaldiLatticeOLabelLookAheadFst;
 

And for shell tools building this into the shared object (from extensions/lookahead/olabel_lookahead-fst.cc) compile this into the lattice4 shared object code

 static FstRegisterer<KaldiLatticeOLabelLookAheadFst>
  OLabelLookAheadFst_KaldiLatticeArc_registerer;
 
Log In

Determinize, multithreading

AlexZatv - 2014-07-22 - 16:21

Hello! I am using Determinize on VectorFst, in multi-threaded program, in OpenFst port for Windows. But, unfortunately, sometimes it fails. May be someone here have experience in multithreading in OpenFst and can help?

Log In

Deleting a specific arc in an FST

VinodhR - 2014-06-25 - 17:55

Hi,

I am iterating all the arcs of an FST.


for(StateIterator<VectorFst <StdArc>>; siter(fstnew); !siter.Done(); siter.Next()) {
         for(MutableArcIterator<VectorFst <StdArc> > aiter(&fstnew,siter.Value()); !aiter.Done(); aiter.Next()) {
                 StdArc arc = aiter.Value();

                 <<delete current arc>>
       }
}

I see that DeleteArcs has been overloaded with an additional parameter n. What does this n refer to ?

How do I delete the current arc that is being iterated ? or for that matter any specific arc in an FST ?

V

DoganCan - 2014-06-25 - 21:06

Hi Vinodh,

OpenFst does not provide a convenience method for deleting specific arcs from a VectorFst. I believe this has to do with the fact that these arcs are kept in a std::vector and the erase operation is inefficient for this data type, i.e. erasing an entry from a std::vector is linear in the size of the vector. As a workaround, you can construct a new fst, which includes only the arcs you want to keep, while iterating through the arcs of the original.

You can use DeleteArcs to delete all arcs at a state or the last n arcs added to a state.

Cheers, Dogan

Log In

Using fstreplace after fst composition

AndreSchlichting - 2014-06-24 - 16:07

Is it possible to use fstreplace tool in FSTs after composing them with some other FSTs?

For example, I have a grammar G with some symbols I want to replace later in it, let's say, S1 and S2. I have also a lexicon L. The symbols S1 and S2 have two other grammar FSTs GS1 and GS2.

I am trying:

fstcompose L G > LG

fstcompose L GS1 > LGS1

fstcompose L GS2 > LGS2

fstreplace LG 'label' LGS1 'label1' LGS2 'label2' > LG1
But this is getting a lot different if I do:
fstreplace G 'label' GS1 'label1' GS2 'label2' > G2

fstcompose L G2 > LG2
LG2 is correct, I can use it for my purposes, but I can't make it right the first way, any idea why?

DoganCan - 2014-06-25 - 03:37

Hi Andre,

There were a few inconsistencies in your example commands. I edited your question and tried to fix them to the best of my understanding. Please check the question and confirm that the example commands match your case.

The most likely reason why LG1 and LG2 differ is because G includes arcs labeled with S1 and S2 while L does not. When you compose L with G, the paths in G which contain S1 and S2 do not match any path in L. I believe you can handle these symbols by adding two simple loops to L, e.g. assuming 0 is both the initial state and a final state of L, you would add

0 0 <eps> S1
0 0 <eps> S2
These loops will match S1 and S2 during composition, hence preserve the paths which include these symbols in LG.

Cheers, Dogan

AndreSchlichting - 2014-06-25 - 08:10

Hey, Dogan, thanks for your answer.

Actually L contains the symbols S1 and S2 exactly like you sugested.

0   1   <eps>   <eps>   0.693147182
0   2   sil   <eps>   0.693147182
1   4   <eps>   S1
1   5   <eps>   S2
1   1   sil   _silence_
1   6   k   w6
1   12   d   w3
1   1   o^   w1   0.693147182
1   3   o^   w1   0.693147182
1   15   g   w5
1   18   t   w2
1   21   t   w4
1   1   #0   #0
1
2   1   #3   <eps>
3   2   sil   <eps>
4   1   #1   <eps>   0.693147182
4   3   #1   <eps>   0.693147182
5   1   #2   <eps>   0.693147182
5   3   #2   <eps>   0.693147182
6   7   a   <eps>
7   8   x   <eps>
8   9   o^   <eps>
9   10   R   <eps>
10   11   o   <eps>
11   1   S   <eps>   0.693147182
11   3   S   <eps>   0.693147182
12   13   o^   <eps>
13   14   i   <eps>
14   1   z   <eps>   0.693147182
14   3   z   <eps>   0.693147182
15   16   a'   <eps>
16   17   t   <eps>
17   1   S   <eps>   0.693147182
17   3   S   <eps>   0.693147182
18   19   e^   <eps>
19   20   N   <eps>
20   1   o_   <eps>   0.693147182
20   3   o_   <eps>   0.693147182
21   22   r   <eps>
22   23   e^   <eps>
23   1   S   <eps>   0.693147182
23   3   S   <eps>   0.693147182

The symbols S1 and S2 will still be in LG after composition because I want to replace them later, isn't it correct? But in fact GS1 and GS2 do not contain the symbols S1 and S2. They are instead what I want to use to replace later.

G

0   1   #0   <eps>   227.955917
0   4   w1   w1
1   2   S1   S1   1.60943794
1   3   S2   S2   1.60943794
1   4   w1   w1   1.60943794
1   5   w2   w2   1.60943794
1   1.60943794
2   1   #0   <eps>   227.955917
2
3   1   #0   <eps>   227.955917
3   2   S1   S1
4   1   #0   <eps>   227.955917
4   5   w2   w2
5   1   #0   <eps>   227.955917
5   3   S2   S2

GS1

0   1   #0   <eps>   16.534359
0   2   w6   w6   0.693147182
0   3   w5   w5   0.693147182
1   2   w6   w6   1.38629436
1   3   w5   w5   1.38629436
1   0.693147182
2   1   #0   <eps>   227.955917
2
3   1   #0   <eps>   227.955917
3

GS2

0   1   #0   <eps>   16.534359
0   2   w3   w3   0.693147182
0   3   w4   w4   0.693147182
1   2   w3   w3   1.38629436
1   3   w4   w4   1.38629436
1   0.693147182
2   1   #0   <eps>   227.955917
2
3   1   #0   <eps>   227.955917
3

By the way, I use the option epsilon_on_replace with fstreplace.

Thanks.

DoganCan - 2014-06-25 - 20:36

Hi Andre,

The difference is due to the structure of the lexicon. There are two issues as far as I can tell.

  1. The loops for handling S1 and S2 symbols are introducing extra disambiguation and silence symbols on the input side. As a result, LG1 will always include these extra symbols on its input side. On the other hand, G2 does not include S1 and S2 symbols. Hence these dummy loops are never traversed while composing L with G2. Make sure these dummy paths do not introduce extra symbols on the input side.
  2. An optional silence is allowed at the beginning. These optional silences will make their way through to LGS1 and LGS2 and create additional silence symbols on the input side of LG1. You might want to use a slightly different lexicon, which does not allow these extra silences at the beginning, while constructing LGS1 and LGS2.

Cheers, Dogan

Log In

Phi-matcher in PDT

AntoineRaux - 2014-06-23 - 20:04

Is it possible to use specialized matchers in a PDT? Specifically, I want to create a PDT from an RTN (using pdtreplace) where some of the FSTs in the RTN have phi transitions. Is this possible with the current version of the pdt extension, and if so how?

BenSnyder - 2014-08-14 - 12:43

I've been grappling with this questions too. I've created a PhiParenMatcher class (inheriting from PhiMatcher with the <M> template specialized to be a PDT ParenMatcher). The main problem is that the phi match fails when the destination state of the phi arc doesn't have an explicit match to the requested match_label, but does have a non-consuming match. This commonly happens in a PDT because the destination state has outgoing arcs labelled with parentheses, which are non-consuming and treated analogously to epsilons.

In fact, I think this is a more general problem with the PhiMatcher, even with a standard FST -- the phi match doesn't work properly when the destination state of the phi arc has an outgoing epsilon arc. I will update if I come up with a solution.

Log In

Encoding and composing a character-to-token lexicon transducer with UNKs

AaronDunlop - 2014-06-19 - 17:12

Is there a relatively standard or straightforward way to build a lexicon transducer (i.e., a character-to-token transducer) which handles unknown words, outputting an appropriate UNK symbol? The lexicon examples at http://www.openfst.org/twiki/bin/view/FST/FstExamples are pretty straightforward, and we can extend the lexicon script to construct a trie transducer for an arbitrary corpus (and handle spaces as word delimiters). But the transducer can't rewrite input that includes unobserved words.

My intuition is that generalizing further to handle UNKs would be similar to encoding a language model with backoff transitions (Allauzen et al., 2003). But I'm confused about the exact encoding, and how to compose the resulting transducer with other transducers in a chain, using the appropriate matchers (my guess is we want either a phi or a rho matcher, but I'm not certain which).

For example, consider a system consisting of 3 transducers: * F: A transducer which passes through all characters unchanged, or collapses sequences of repeated characters at some cost. (Note that this transducer is not deterministic, since it doesn't require that a repeated sequence be collapsed).<br> * G: Character-to-token lexicon transducer<br> * H: Word-level n-gram model<br>

In theory, H(G(F(x))) would normalize input text, deleting duplicated characters as appropriate (e.g., if the character-deletion costs and language model probabilities balance properly, it might change 'pair off access' to 'pair of aces' but leave 'turn off access' unchanged).

F will naturally include arcs with epsilon outputs to delete duplicate characters. But it seems like those epsilon outputs aren't semantically the same as the epsilon inputs in H that encode LM backoffs, or likely to analogous epsilon inputs in G. Both compositions (F o G) and (F o G) o H will require handling backoff arcs. That means we need to use a phi or rho matcher, correct? Can we perfom these compositions with the command-line fstcompose tool? Or does phi/rho matching require custom C++ code?

Of course, we could avoid the problem by using a character-level language model instead (in which case we'd only need 2 transducers, and could let OpenGRM encode the LM backoffs), but I'd like to incorporate the longer distance dependencies of a word-level model.

If there are standard tools to do this that I'm missing, I'd love pointers. Thanks in advance.

DoganCan - 2014-06-19 - 22:54

Hi Aaron,

I'm not quite sure about the exact transduction you are looking for. Is the following an accurate description of what you have in mind?

  1. Input is a sequence of characters where word boundaries are marked with whitespace characters, i.e. there is no confusion about where one word starts and ends.
  2. Characters may repeat and deleting repeated characters by paying its cost is OK.
  3. There may be character sequences which can not be mapped to any word in the vocabulary even after character deletion. These should be mapped to UNK.

Could you give an example input and the desired output? Ideally something which covers all above cases.

Cheers, Dogan

AaronDunlop - 2014-06-20 - 19:09

Dogan,

I'm specifically interested in the character-to-token (lexicon) transducer that handles unknown words. The duplicate-deletion transducer is an example of a character-level transducer I might combine with it (the transducer I'm working with currently learns character-level mutations from data, and I want to support other character transducers as well, such as something hand-coded in Thrax).

To combine any of those character transducers with a word-level language model, I need a lexicon transducer in between.

So a trivial example chain (that only handles the word 'it') might be:

1. A character-mutation transducer including i:i and t:t pass-through arcs (at 0 cost), and t:<eps> at some cost

2. A lexicon which accepts i t <space> and outputs 'it'. (sorry, I can't figure out how to include images for these transducers). If any character other than 'i' appears at the start state, it should take a failure transition to an UNK state (at some cost); similarly for any character other than 't' in the 'i' state; the UNK state would have a self-loop for any character other than <space>, and an arc for <space> back to the start, outputting <unk>.

I can create transducer 2 with explicit arcs for all characters in the vocabulary, but of course that's very large for a sizable (word) vocabulary (millions of arcs). So it's impractical to compose with the language model (And it would presumably be much larger if we handled characters outside of the English ASCII range). I know language models use failure arcs to avoid encode backoff transitions compactly. My intuition is that a lexicon with UNK support would look similar. But I haven't figured out how to do that encoding, or what matchers to use (if a specific matcher is needed) for composition and inference. I probably need to go back and read Cyril, Mehryar, and Brian's 2003 language modeling paper in more detail. But it seemed like a problem that might already have been solved.

Did that help clarify my question at all?

Thanks, Aaron

DoganCan - 2014-06-20 - 21:54

Hi Aaron

Does the following lexicon for the vocabulary {man, mars} match what you have in mind? Note that phi matches rest of symbols but does not consume any symbols, while rho matches rest of symbols and consumes a symbol.

Cheers, Dogan

DoganCan - 2014-06-21 - 07:05

Hi Aaron,

I just pushed an example setup to github at the address

https://github.com/dogancan/lexicon-unk

It includes a simple script for constructing the lexicon and a rudimentary fst composition tool with phi and rho matching on the input side of the second (right-hand) input fst.

Cheers, Dogan

EvaHasler - 2015-08-11 - 11:27

Hi Dogan

I have a question that is related to your example from above. I am using your example to set up a rho matcher and it works fine. However, when the second lattice also contains arcs with epsilons as input symbols, it does not work: FATAL: ComposeFst: 2nd argument requires matching but cannot.

How do I fix it to be able to compose with epsilons as well?

Thanks, Eva

Log In

Compile openfst 1.4.1 on 64-bit Ubuntu VM?

EstherJudd - 2014-06-17 - 12:26

When I try to compile openfst on Ubuntu everything seems to compile just fine. But then I try to run a program such as fstinfo and I get the following error:

fstinfo: error while loading shared libraries: libfstscript.so.1: cannot open shared object file: No such file or directory

I have tried configuring with --enable-shared on, I have tried downgrading from gcc 4.8.2 to gcc-4.4. There seems to be no difference. The libraries are installed in /usr/local/lib/fst. I have even tried adding the directory to the path in my bash login-script. What else can I do?

PaulDixon - 2014-06-17 - 16:07

Have you tried adding /usr/local/lib/ to the LD_LIBRARY_PATH environment variable?

EstherJudd - 2014-06-17 - 17:22

That helped. Dumb of me to not think of that. Now I'm just grappling with compiling thrax. I get an internal compiler error whenever I try.

PravinBhosale - 2015-10-21 - 02:47

how to add LD_LIBRARY_PATH environment variable
Log In

Is there a simple way to add weighted arcs from final nodes to initial node?

JihoonKim - 2014-06-09 - 00:50

I have an FST which has more than 10000 final nodes. I'd like to add weighted arcs from these final nodes to initial node. Closure operation seems not to support weighted arcs. Is there any simple way to do that?

Thanks in advance.

JihoonKim - 2014-06-09 - 04:03

To clear up, I want the arcs with epsilon input/output.

PaulDixon - 2014-06-09 - 13:32

How about just iterating over all the states and adding the additional arcs? Untested code:

template<class Arc>
void AddFinalStartArc(MutableFst<Arc> *fst, typename Arc::Weight weight) {
typedef typename Arc::Weight Weight;
for (StateIterator<Arc> siter(*fst); !siter.Done(); siter.Next())
  if (fst->Final(siter.Value()) != Weight::Zero())
    fst->AddArc(siter.Value(), Arc(0, 0, weight, fst->Start());
}
Log In

Compile error OpenFST 1.4.1

RolandSchwarz - 2014-05-22 - 08:22

Hi, I'm trying to install the latest version on Windows using Cygwin. All previous versions that I tried (last was 1.3.3) compiled without problems. In this version I get the following compile error:

/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT determinize.lo -MD -MP -MF .deps/determinize.Tpo -c -o determinize.lo determinize.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT determinize.lo -MD -MP -MF .deps/determinize.Tpo -c determinize.cc -DDLL_EXPORT -DPIC -o .libs/determinize.o as: .libs/determinize.o: too many sections (45066) /tmp/ccUSIrLz.s: Assembler messages: /tmp/ccUSIrLz.s: Fatal error: can't write .libs/determinize.o: File too big as: .libs/determinize.o: too many sections (45066) /tmp/ccUSIrLz.s: Fatal error: can't close .libs/determinize.o: File too big Makefile:390: recipe for target 'determinize.lo' failed

Any idea what has changed that it doesn't compile anymore? From what I understand there is little I can do on my side to make it work? Any help appreciated as usual.

Thanks,

Roland

PaulDixon - 2014-05-24 - 14:52

PaulDixon - 2014-05-24 - 14:59

I saw a similar error about too many segments when compiling version 1.4.1 on Visual Studio. In VS there is a switch /bigobj that worked for me but there doesn't seem to be an equivalent for Cygwin. You could try compiling without debugging information to try reduce the file size. I'm not familiar with Cygwin but would using a 64bit compiler help? http://mingw-w64.sourceforge.net/

You also try using the VS compiled version, I put a link to the solution in the contrib section. They should run fine under Cygwin

DanielPovey - 2014-05-24 - 15:38

Someone reported on the Kaldi forum that if you configure with --enable-static --disable-shared , it works.

DanielPovey - 2014-05-28 - 15:00

Kaisheng Yao posted this to kaldi-users:

I found the following link http://stackoverflow.com/questions/16596876/object-file-has-too-many-sections It is related to using template. Another way to resolve this is turning on optimization. In my case I use

EXTRA_CXXFLAGS += -Wno-sign-compare -O1 in src\lat\Makefile

I therefore changed the following in src\kaldi.mk and compilation was successful after this change

CXXFLAGS = -msse -msse2 -Wall -I.. -DKALDI_DOUBLEPRECISION=0 -fPIC -DHAVE_POSIX_MEMALIGN -DHAVE_CLAPACK -I ../../tools/CLAPACK/ -Wno-sign-compare -Winit-self -I ../../tools/CLAPACK/ -I $(FSTROOT)/include $(EXTRA_CXXFLAGS) -O1

ChristopherBader - 2014-09-19 - 15:13

Just to clarify DP's comment:

--enable-static --disabled shared does NOT work

Turning on C++ optimization DOES work.

Put CXXFLAGS="-O" at the end of the configuration command, e.g.

./configure CXXFLAGS="-O"

Log In

Multiple score on the arc...?

ZeeshanAhmed - 2014-04-08 - 15:41

Hi, Is it possible to define multiple scores on the arcs? e.g. acoustic and language modeling scores at the same time. Would determinization and minimization algorithms work in this situation as is?

Thanks.

PaulDixon - 2014-04-09 - 05:09

Yes to both. This is described in the OpenFst paper http://openfst.org/twiki/pub/FST/FstBackground/ciaa.pdf. Kaldi also has a LatticeWeight type which will do exactly what you want. Here is minimal standalone example that will build an OpenFst shared object and enable the type in the shell tools. https://gist.github.com/edobashira/10244848

ZeeshanAhmed - 2014-05-10 - 12:37

Hi Paul, I compiled the standalone version on openfst.1.3.4. However, I am getting the following error when run the following test command. Can you help me what could be the problem?
$ fstcompile --arc_type=lattice4 test.txt ERROR: GenericRegister::GetEntry : lookup failed in shared object: lattice4-arc.so FATAL: No operation found for "CompileFst" on arc type lattice4

DoganCan - 2014-05-11 - 19:06

Hi Zeeshan,

Did you build the shared object lattice4-arc.so using Paul's gist? It seems like it is not in the working directory or in the dynamic library search path LD_LIBRARY_PATH (DYLD_LIBRARY_PATH on OS X).

Cheers, Dogan

ZeeshanAhmed - 2014-05-13 - 17:57

Hi Dogan,

Thanks for the reply. It worked fine on pure Linux system but on cygwin it's not working though the lattice4-arc.dll is in the working directory. The problem is that on cygwin shared libraries for openFst are not built. Any idea on how to build openFst shared libraries on cygwin?

Thanks, Zeeshan

DoganCan - 2014-05-13 - 18:26

Hi Zeeshan,

I have no idea how things work under cygwin. That being said, are you sure your problem is related to shared libraries not being built? I think your problem might have to do with the shared object file being named lattice4-arc.dll instead of lattice4-arc.so. Try renaming that file to lattice4-arc.so. If that does not work, try lattice4-arc.so.dll. Good luck.

Cheers, Dogan

Log In

Error installing openGRM on Mac 10.9.2

SepidehA - 2014-04-05 - 13:38

Hi all, I have installed OpenFST successfully on my mac 10.9.2 with --enable-far. But then when I want to configure my OpenGRM for installing I get " checking fst/fst.h usability... no checking fst/fst.h presence... no checking for fst/fst.h... no configure: error: fst/fst.h header not found" but my fst.h is there in /usr/local/include any ideas?

Thanks

CyrilAllauzen - 2014-04-07 - 14:53

Hi,

It seems this is due by Apple changing the default C++ standard library in the latest version of Xcode (version 5).

CXX=-stdlib=libstdc++ ./configure
should fix this.

EstherJudd - 2014-05-07 - 13:03

I have the same problem. The command above does not fix it. I had it working on a different machine previously but just started working on my iMac with 10.9.2 and the latest version of Xcode installed.

PaulDixon - 2014-05-08 - 01:15

The latest c++11 version of OpenFst (1.4.1) and OpenGrm Ngram (1.21) should just compile without any extra flags or environment variables.

EstherJudd - 2014-05-08 - 11:43

Then why when I run ./configure do I get the following messages? fat binaries are installed in /usr/local/bin and it is in my path

checking fst/fst.h usability... no checking fst/fst.h presence... no checking for fst/fst.h... no configure: error: fst/fst.h header not found

AaronDunlop - 2014-05-08 - 16:10

Here's what worked for me on Mavericks (10.9.2)

In the extracted opengrm-ngram-1.2.1 directory:

# Replace imports of 'unordered_map' with 'tr1/unordered_map' perl -pi -e 's#<unordered_map>#<tr1/unordered_map>#' src/include/ngram/ngram-shrink.h src/include/ngram/ngram-marginalize.h src/include/ngram/ngram-count.h

# Configure with Clang arguments ./configure --prefix=/usr/local CXX="clang++ -std=c++11 -stdlib=libstdc++"

make sudo make install

Note: To configure OpenFST, I used:

./configure --prefix=/usr/local --enable-compact-fsts --enable-const-fsts --enable-far=yes --enable-lookahead-fsts --enable-pdt CXX="clang++ -std=c++11 -stdlib=libstdc++"

Of course, your mileage may vary. But maybe someone will find that helpful.

CyrilAllauzen - 2014-05-12 - 10:48

CyrilAllauzen - 2014-05-12 - 10:55

Aaron: My original message was about compiling OpenGrm NGram 1.1.x. With version 1.2.1, you shouldn't need to -stdlib=libstdc++. Have you tried compiling without it?

Esther: Could you clarify which version of OpenFst and OpenGrm NGram you were using? Which version of Xcode? Also could you look at config.log and see if you're seeing anything suspicious there. Also, if you've recently installed Xcode5, you might need to accept the license: /Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -license accept.

AaronDunlop - 2014-05-14 - 17:59

Cyril: Compiling OpenFST without -stdlib=libstdc++ seems to work, but configuring OpenGRM without that flag fails.

./configure --prefix=/usr/local CXX="clang++ -std=c++11" ... checking for fst/fst.h... no configure: error: fst/fst.h header not found

AliceWalton - 2014-06-13 - 00:41

I have the same problem when I try to configure Thrax. It shows configure: error: fst/fst.h header not found Using the flag CXX="clang++ -std=c++11" does not fix it.

In OpenFst, if I configure with CXX="clang++ -std=c++11 -stdlib=libstdc++" as suggested by Aaron, I got a fatal error of unordered_map not found. Well I can configure and install OpenFst without this flag but I cannot configure Thrax.

Thank you very much for your help.

JatishKhanna - 2015-09-26 - 05:33

Below would be helpful for the above thread-

http://www.cslu.ogi.edu/~sproatr/Courses/TextNorm/tutorial.html

In short: 1. Install - openfst-1.5.0 or higher version http://www.openfst.org/twiki/bin/view/FST/WebHome

2. Set the LD_LIBRARY_PATH=/usr/local/bin [all the library *.so by default to be installed ]

3. Clean up if already tried [ make clean ]

4. Try installing it back [make && make install]

smile walah Done

Log In

Intersection gives a FATAL error

ErikAxelson - 2014-03-07 - 04:03

Hi all,

I have two transducers with no symbol tables:

0       0       9      9      0
0       0       6      6      0
0       1       8      8      0
0       1       7      7      0
0       1       3      3      0
1       1       9      9      0
1       1       6      6      0
1       2       8      8      0
2       2       9      9      0
2       2       6      6      0
2       3       4      4      0
3       3       9      9      0
3       3       6      6      0
3       3       8      8      0
3       3       7      7      0
3       3       3      3      0
3       4       5      5      0
4       4       9      9      0
4       4       6      6      0
4       5       8      8      0
5       5       9      9      0
5       5       6      6      0
5       6       8      8      0
5       6       7      7      0
5       6       3      3      0
6       6       9      9      0
6       6       6      6      0
6       0

and

0       0       5      5      0
0       0       4      4      0
0       1       3      3      0
1       2       6      6      0
2       3       5      5      0
2       3       4      4      0
2       2       3      3      0
2       4       9      9      0
2       2       8      8      0
2       2       7      7      0
3       3       5      5      0
3       3       4      4      0
3       2       3      3      0
3       2       8      8      0
3       2       7      7      0
4       4       5      5      0
4       4       4      4      0
4       5       3      3      0
5       5       5      5      0
5       5       4      4      0
5       0

When I try to intersect them (t1 is the first transducer and t2 the second):

    RmEpsilon(t1);
    RmEpsilon(t2);

    ArcSort(t1, OLabelCompare<StdArc>());
    ArcSort(t2, ILabelCompare<StdArc>());

    EncodeMapper<StdArc> encoder(kEncodeLabels,ENCODE);
    EncodeFst<StdArc> enc1(*t1, &encoder);
    EncodeFst<StdArc> enc2(*t2, &encoder);

    IntersectFst<StdArc> intersect(enc1, enc2);

I get an error message "FATAL: ComposeFst: 1st argument cannot match on output labels and 2nd argument cannot match on input labels (sort?)."

What am I doing wrong?

PaulDixon - 2014-03-08 - 09:10

You need to sort the encoded fsts before the intersection.

ErikAxelson - 2014-03-12 - 04:50

It seems that using Encode instead of EncodeFst solves the problem, because then I'm able to sort the arcs before intersection:

Encode<StdArc>(t1, &encoder); Encode<StdArc>(t2, &encoder);

ArcSort(t1, OLabelCompare<StdArc>()); ArcSort(t2, ILabelCompare<StdArc>());

IntersectFst<StdArc> intersect(*t1, *t2);

Log In

OpenFst website back up

CyrilAllauzen - 2014-02-14 - 18:20

The OpenFst website was down for a few days due to some hardware issue. The site is back up but some things (such as the doxygen links) are not working properly yet. We're working on resolving the remaining issues.

Log In

Linking OpenFst

OliverAdams - 29 Jan 2014 - 21:09

I've built OpenFst in my home directory.

I'm aiming to compile the following toy code:

1 #include <fst/fstlib.h> 2 3 int main(void) { 4 fst::StdVectorFst fst; 5 }

So far I've run a number of commands that are variations of the following theme:

g++ main.cc -I/home/oadams/openfst-1.3.4/src/include/ -L/home/oadams/openfst-1.3.4/src/lib/.libs/libfst.so

I get various undefined reference errors. I feel I should probably have append the commands with -l , but for the obvious variations I get errors along the lines of:

/usr/bin/ld: cannot find -lopenfst collect2: error: ld returned 1 exit status

How can I get it to compile? Any help would be greatly appreciated.

Log In

Bug in weight-class.h: ConvertKeyToSoFilename() ?

FedericoFlego - 26 Jan 2014 - 06:54

Hello,

I've got a problem running the fstprune binary with my user defined arc type.

It works fine if I do something like:

> fstprune myfst.fst > /dev/null

but if I do

> fstprune --weight=8 myfst.fst > /dev/null

I get

ERROR: GenericRegister::GetEntry : tropical_LT_tropical.so: cannot open shared object file: No such file or directory Segmentation fault

I believe this is due to function ConvertKeyToSoFilename() in weight-class.h:

virtual string ConvertKeyToSoFilename(const string &key) const { return key + ".so"; }

which should probably be:

virtual string ConvertKeyToSoFilename(const string& key) const { string legal_type(key); ConvertToLegalCSymbol(&legal_type);

return legal_type + "-arc.so"; }

Thanks a lot for your feedback!

Federico

DoganCan - 27 Jan 2014 - 03:19

Hi Federico,

I believe, you need a shared object for your user defined weight type (see attached code) as well. Make sure the the DSO is named tropical_LT_tropical.so and put it somewhere dlopen can find.

Also, you will get the following error with that fstprune call since 8 can not be parsed as a pair weight.

fstprune --weight=8 test.fst > /dev/null
FATAL: StrToWeight: Bad weight = "8", source = WeightClass, line = 0 

Try this instead

fstprune --weight=8,0 test.fst > /dev/null

Cheers, Dogan

#include <fst/arc.h>
#include <fst/script/weight-class.h>

using namespace fst;

namespace fst {
  namespace script {
    REGISTER_FST_WEIGHT(LexicographicWeight<TropicalWeight, TropicalWeight>);
  }
}

FedericoFlego - 27 Jan 2014 - 08:40

Thanks a lot Dogan!

Actually I've got the shared object (named tropical_LT_tropical-arc.so) which is loaded correctly when I do for example fstprint. I see in the code DSOs are generally loaded adding the suffix -arc:

legal_type + "-arc.so"

It's only in src/include/fst/script/weight-class.h

that the suffix is not added. After changing function ConvertKeyToSoFilename() in weight-class.h from:

  virtual string ConvertKeyToSoFilename(const string &key) const {
    return key + ".so";
  }

to:

  virtual string ConvertKeyToSoFilename(const string &key) const {
    //return key + ".so"; //ff257: is this a bug?
    string legal_type(key);
    ConvertToLegalCSymbol(&legal_type);

    return legal_type + "-arc.so";
  }

it worked fine. Thanks for confirming that I need to add REGISTER_FST_WEIGHT. I didn't do it initially because it doesn't seem to be mentioned in the on-line documentation, but after a grep of the src for REGISTER_FST I had found it. Then yes, the last missing bit was to pass 8,0 rather than 0. I've just tried and it works perfectly! smile

Thanks a lot!

Federico

FedericoFlego - 27 Jan 2014 - 08:46

Apologies for the bad formatting... And I don't see an option to fix it frown

DoganCan - 28 Jan 2014 - 03:55

Hi Federico,

I am glad it worked out. Regarding the change you made in the source code, I don't think it is a bug. OpenFst binaries can dynamically load user defined weight types just like user defined arc types. Even though in practice these two mean more or less the same thing, technically these are two different things. In this particular case, fstprune is looking for the user defined weight type (not the arc type) to parse the command line weight parameter. That is why you don't see the same error when you call fstprint (or when you call fstprune without the --weight option) which only needs to load the user defined arc type.

I believe the intended way of handling this issue is by creating another DSO named tropical_LT_tropical.so (in addition to tropical_LT_tropical-arc.so) which only registers the weight type. Patching the library code is not a great idea if you plan to update the OpenFst library later when a new version comes out or if you plan to run your code on different machines -- you will have to remember patching the library -- or if you plan to distribute your code.

Cheers, Dogan

Log In

Is there a general approach to eliminating redundant (higher-weighted) paths in cyclic WFSAs?

AlecS - 16 Jan 2014 - 05:36

Given a simple tropical semiring WFSA, defined by the expression A* | (A / 1)* | (B /2)

wfsa1.JPG

Is there any general way to reduce this to the semantically equivalent WFSA A* | (B /2)

wfsa2.JPG

In other words, how to remove redundant higher-weighted paths in tropical semiring WFSAs with weighted cycles?

Does anyone have a solution to this?

Thanks.

DoganCan - 18 Jan 2014 - 21:12

Hi Alec,

I don't think there is a general solution to your problem provided by the public release of OpenFst at this time. Determinization will do what you want as long as the input automaton has the twins property. The example you gave does not have this property, hence determinization will not work in this case.

Having said that, conclusion section of this paper alludes to a general disambiguation algorithm for weighted automata which does not require the twins property to hold. I believe there is a good chance that algorithm will be a part of the public release of OpenFst once it is published, but this is pure speculation on my end.

Cheers, Dogan

AlecS - 04 Feb 2014 - 05:26

Hi Dogan, Thanks for your input.

Yes I agree, if someone could devise a working disambiguation algorithm for weighted FSAs, that would do the trick.

Also, a determinization algorithm that would determinize all determinizable WFSAs, despite the lack of twins property in certain cases (like the example above), would suffice in my case.

Let’s hope your speculation is correct.

Thanks, Alec

PaulDixon - 07 Feb 2014 - 01:01

mifst toolkit has a determinization algorithm that tries to detect self loops with different weights.
Log In

-- Michael Riley - 2018-05-30

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2018-05-30 - MichaelRiley
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback