> > 
META TOPICPARENT 
name="FstForum" 
OpenFst Forum 2008 Archive
Patch for GNU autotools including libtool for OpenFst 20080422
DavidShao  28 Dec 2008  00:41
I have been developing patches to enable an open source software stack of OpenFst 20080422, leptonlib 1.58, iulib subversion revision 123, Tesseract subversion revision 205, and OCRopus subversion revision 1307 to all use libtool to create dynamic linked libraries on all of Mac OS X 10.5.6 using Macports, Ubuntu 8.10, FreeBSD 7, and Windows XP using Cygwin. The patches can be found at OCRopus files with filenames beginning with toautotools. As of today, for OpenFst, the corresponding file would be toautotools_openfst_20080422_20081227.diff As I have stated in a previous post, to apply and use the patch, say to a user subdirectory busr so that one does not need administrator privileges to install,
patch p1 E < ../toautotools_openfst_20080422_20081227.diff
autoreconf install force
./configure prefix=$HOME/busr CPPFLAGS="I/opt/local/include" LDFLAGS="L/opt/local/lib"
make
make check
make install
I have enabled make check using tests already provided in OpenFst so that one can test whether modules can be dlopened. All tests pass for all platforms except Cygwin on Windows XP. Even for Cygwin on Windows XP, this patch enables a dynamic linked library to be built such that OCRopus is able to use it to pass its own OpenFst tests. (There is an additional failure of one OCRopus test, testfstsearch2.lua, on Mac OS X and FreeBSD.)
My preliminary investigation for Cygwin on Windows XP appears to show that something is going wrong with reinterpret_casting the read() function so that it can be recovered using a map in both fst/lib/register.h and fst/bin/main.h. The pointer to this function appears to be immediately recovered if the map is accessed right after it is stored, but a later GetEntry() returns a null pointer.
I believe that using GNU autotools to provide dynamic linked libraries is an essential step to helping more wide distribution of OpenFst through standard platform package or port managers. I think the current patch just about finishes this task other than the above problem for Cygwin. I also have not tested this patch on any 64bit platforms other than an Intel Macbook running Mac OS X.
DavidShao  28 Dec 2008  00:42
Determinize, Sequentialize, Subsequentialize
My colleagues and I are trying to understand the relationship between the OpenFst Determinize() algorithm and the sequentialization and subsequentialization discussed in Mohri's 1997 paper "FiniteState Transducers in Language and Speech Processing".
Any mindtuning that you could offer would be much appreciated.
Ken
The Determinize() algorithm is the algorithm described by Mohri in 1997 Computational Linguistic paper.
In the weighted automaton case, the implementation follows exactly Mohri's paper. The weighted transducer is dealt by considering weighted transducer over seminiring K as weighted automaton over the semiring obtained by taking the cross product of the string semiring and the semiring K.
Cyril
Determinize and functional transducers
The documentation of the Determinize() function in OpenFst states "The transducer must be functional." and deeper down, "N.B. Current version only determinizes functional transducers". Can we expect a future version of Determinize() that is not limited to functional transducers?
Thanks,
Ken
Yes, we indeed plan to have determinization of nonfunctional transducers in a future version.
Best, Cyril
Has anybody written a version of determinization which works for nonfunctional transducers ? Cyril, would you have a preliminary version of that code to give to people looking to implement it a headstart ? Thanks
Request: Determinize in place
Minimize(&A) minimizes a network A "in place", but Determinize(A, &B) takes A and produces as output a new determinized Fst B. Would it be possible to provide a new version of Determinize that determinizes in place, i.e. Determinize(&A)?
Thanks,
Ken
We could add a Determinize(&A). The reason we chose not to so was that the underlying algorithm would still be constructive: create B as the determinization of A, then delete A and assign B to A. Hence, it won't be any more (memory) efficient than using Determinize(A, &B).
Whatever goes on underneath, I venture to suggest that it would be (to some of us at least) intuitive and useful to add a new Determinize(&A) that is parallel to RmEpsilon(&A) and Minimize(&A). From a user's point of view, it often makes sense to perform such operations "in place" on an existing Fst object.
In my own work (a programming language built on top of OpenFst) an assignment statement like
$net = a*b+[wz]? ;
causes the righthand side to be evaluated (by calling various functions in the OpenFst library) and in a symbol table the name $net is bound to a Java Fst object that stores (as a Java long int) the pointer to the resulting C++ Fst object. That pointer is my handle from the Java world to the Fst object in the C++ universe. If I then type
$net2 = $net ;
then $net2 becomes an alias for the same Java Fst object that contains the pointer to the original C++ Fst object. (If I really want to make a copy, I've got a copy function that relies on the Map() function in the OpenFst library.)
With RmEpsilon(&A), I can remove the epsilons from the original C++ Fst object in place, and both $net and $net2 still reference the same Java Fst object that still contains the pointer to the original C++ Fst object. If I could (at least superficially) also Determinize() that original Fst object in place, life would be a bit easier for me. With the existing Determinize(A, &B), a new C++ Fst object B is created, which usually causes my language to create a new Java Fst object to wrap it; and I've got to worry about changing the pointer in the original Java Fst object (the value, in the symbol table, of $net and any aliases like $net2) and deleting the original C++ object A to prevent a kind of memory leak.
So I now understand why Determinize(A, &B) is as it is, but I do still suggest that I and perhaps others would find a new Determinize(&A) useful and intuitively parallel to RmEpsilon(&A) and Minimize(&A), so that all could be performed "in place".
Best,
Ken
Hi Ken,
I understand that what you describe is a rather unsastisfying way of wrapping. However, I not sure I understand why this is the only of wrapping. Anyhow, you can easily add a Determinize(&A) as follows:
template <class Arc>
void Determinize(MutableFst<Arc> *fst) {
*fst = DeterminizeFst<Arc>(*fst,
DeterminizeFstOptions(CacheOptions(true, 0)));
}
Best, Cyril
When you iterate through the states in an Fst:
for (StateIterator<StdFst> siter(fst); !siter.Done(); siter.Next()) {
StateId state_id = siter.Value() ;
...
}
is the first state_id returned from the Iterator always the Start state of the Fst?
Thanks,
Ken
The first state returned by the iterator is not guaranteed to be the start state. You should always use the fst.Start() method to determinne the start state,
Best, Cyril
Failure transitions
Hey guys 
Great effort to build this library  there are so many uses  it's very nice to have such knowledgeable people building this for everyone. Thank you!
I'm interested in doing some language modeling with this library which I anticipate will require backoff weights that are commonly going to build paths which will be "better" than the proper path for certain ngrams.
I know you guys implemented failure transitions in the GRM library for LM purposes, and later an algorithm to replace them with epsilon transitions along with some compression heuristics to reduce the size of the final machines to address this exact problem.
My question is: What's the best route to take to implement this for the OpenFST project? If I take this on myself... should I create a new FST class to implement failure transitions? Or can I get away with just building a new Arc class which can trick the search functions for composition?
Thanks again for your help!
Chris
Hi Chris,
There is an undocumented support for failure transitions in composition. This feature is not documented and the handling of failure transitions will change significantly in the next version of the library.
For now, you can label failure transitions with kPhiLabel (you will need to use the C++ library for this, Fsts containing transitions labeled by kPhiLabel cannot be written to a file).
To perform composition with failure transitions, you need to use
ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST1_PHI>())
when fst1 contains transitions labeled by kPhiLabel or ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST2_PHI>())
when fst2 contains transitions labeled by kPhiLabel .
Best, Cyril
Nice... thanks for the head's up... I'm glad I asked!
Out of curiosity, what's the new implementaiton going to be? Something along the lines of a custom Arc/State/Fsm class?
Chris
Cyril 
I noticed that during composition with failure transitions, the composition does not chase down the transitive closure of both epsilon and failure transitions.
You can reproduce the behavior by having a state with a failure transition lead to a state with an epsilon transition  which in turn leads to a state with a real symbol arc on it. For example an FSA with the following makeup: 0 1 2 2 1 2 0 0 2 3 1 1 3
When you do composition with an FST containing only that real symbol (symbol 1 in the above case)  I would expect it to yield a connected machine (failure > epsilon > symbol)  which it does not. Instead it looks through the state with the failure transition and sees no matching symbol (only an epsilon transition) and aborts composition.
I tried to fix the problem, but couldn't quite figure out a good way to do so. The logic to chase down the transitive closure of failure transitions is making assumptions on the structure of the FST (deterministic w.r.t. failure transitions, or at least that they're all equally good, because it takes the first one it finds which yields a match) which probably means any fix I make will be short lived anyway.
If you'd like more information, please let me know, otherwise I'll just try another workaround  but I don't think fixing it in the current implementation is worthwhile... do you?
Chris
Hi Chris,
Indeed, I should have mentioned this: failure and epsilon transitions do not work together. You should also have at most 1 failure transition at a given state.
Cyril
Interactive manual testing of an FST
In the context of an interactive GUI, it might sometimes be useful to build an FST and then test it interactively by typing in individual strings to be generated (or analyzed), getting back the output string(s). If the FST to be tested is MainFST, then one obvious way to do this would be the following:
1. Get the input string typed by the user. 2. Build the input string into a onepath acceptor (call it InputACC) 3. StdVectorFst ResultFST ; Compose(InputACC, MainFST, &ResultFST) ; // compose for generation Project(&ResultFST, PROJECT_OUTPUT) ; // take the output projection if (number of paths in ResultFST not infinite or huge) { // print strings encoded by ResultFST }
This testing would be put in a loop, allowing the user to manually enter as many test input strings as desired before terminating the loop. To implement analysis, one could 1) invert the MainFST before performing the steps above, or 2) do the following:
Compose(MainFST, InputACC, &ResultFST) ; // compose for analysis Project(&ResultFST, PROJECT_INPUT) ; // take the input projection
QUESTION: Is there a better, more efficient, way to implement such manual testing? Perhaps using the lazy operations ComposeFst() and ProjectFst() ?
Thanks,
Ken
This approach looks good to me. Using lazy operations shouldn't improve the efficiency as is.
You could benefit from lazy operations when the number of strings in ResultFst is large and you don't want to display all these strings (or even don't want to compute all of ResultFst). Instead, you could rewrite your code to generate all paths to work for a nonexpanded FST and to stop once a fixed number of strings have been found (you could even ask to the user if he wants to see more strings then).
Best, Cyril
Easy way to query the number of paths in an FST?
If you have an FST, is there an easy way to query the number of paths?
Thanks, Ken 
KennethRBeesley 
13 Jun 2008  14:10 
The natural way is to use the shortestdistance algorithm: 1. Use Map to turn your FST into an FST over the log semiring setting the weight of every transition and the final weight of final states to LogWeight::One(). 2. Call ShortestDistance with reverse == true. Let d be the value return in the distance vector for the initial state. d is then the log of the number of successful paths in the FST.
Best, Cyril 
CyrilAllauzen 
16 Jun 2008  12:56 
Thanks, Cyril. I suspect that my terminology was wrong and/or my C++ skills are still shaky. What I really need is the number of complete paths in an FST, from the start state to a final state. Following these instructions (as best I can), I seem to get the number of paths from the start state to any state (final or not). I hesitate to post my code here, but I'd be more than happy to send it to you or anyone else.
Ken 
KennethRBeesley 
17 Jun 2008  16:48 
The key is to use the reverse option: ShortestDistance(fst, &distance, true) . For any state q, we then have that distance[q] is the sum of the weights of all the paths from q to a final states. Hence, distance[fst.Start()] is what you want (but you should always check before that distance.size() > fst.Start() ).
Cyril 
CyrilAllauzen 
18 Jun 2008  11:37 
Cyril,
I am using the reverse option = true. Perhaps I'm not properly setting the exit weight of the final states to LogWeight::One(). Will the following Map do it?
// Mapper from StdArc to LogArc. // orig. from lib/map.h // struct StdToLogMapper { struct ModifiedStdToLogMapper { typedef StdArc FromArc; typedef LogArc ToArc;
// orig //LogArc operator()(const StdArc &arc) const { // return LogArc(arc.ilabel, arc.olabel, arc.weight.Value(), arc.nextstate); //}
// KRB, change weight of all arcs to LogWeight::One() LogArc operator()(const StdArc &arc) const { return LogArc(arc.ilabel, arc.olabel, LogWeight::One(), arc.nextstate); }
// KRB: also need to change the exit weight of all final states to // LogWeight::One()
// KRB, will this do it? MapFinalAction FinalAction() const { return MAP_REQUIRE_SUPERFINAL; }
// MAP_NO_SUPERFINAL, the final weight is mapped to a final weight // // MAP_ALLOW_SUPERFINAL, final weight mapped to an arc to the // superfinal state, if the result cannot be represented as a final // weight (see below?); superfinal state created only if needed) // // MAP_REQUIRE_SUPERFINAL, final weight mapped to an arc to the // superfinal state, unless the weight can be represented as a final // weight of weight Zero(). superfinal state always created. (N.B. the // weight of the superfinal state is set to Weight::One())
uint64 Properties(uint64 props) const { return props; } };
Thanks,
Ken 
KennethRBeesley 
25 Jun 2008  13:17 
Following instructions from Cyril, here's a pathCount() function, and a trivial main() for testing, that work for me. If it gets garbled in the posting, I'd be glad to forward it by email to anyone interested.
Ken
#include "fst/lib/fstlib.h"
#include <iostream>
// iostream gives access to cout
using namespace fst ;
// From map.h
// Mapper to map all nonZero() weights to One().
//template <class A, class B = A>
//struct RmWeightMapper {
// typedef A FromArc;
// typedef B ToArc;
// typedef typename FromArc::Weight FromWeight;
// typedef typename ToArc::Weight ToWeight;
// B operator()(const A &arc) const {
// ToWeight w = arc.weight != FromWeight::Zero() ?
// ToWeight::One() : ToWeight::Zero();
// return B(arc.ilabel, arc.olabel, w, arc.nextstate);
// }
// MapFinalAction FinalAction() const { return MAP_NO_SUPERFINAL; }
// uint64 Properties(uint64 props) const {
// return props & kWeightInvariantProperties  kUnweighted;
// }
//}
typedef VectorFst<LogArc> LogVectorFst ;
typedef Fst<LogArc> LogFst ;
typedef LogArc::StateId StateId ;
// pathCount return the number of paths in an FST
// (this version assumes that the input is a StdVectorFst)
int pathCount(const StdVectorFst &sfst) {
// if the fst has loops, then the language or relation
// encoded by the FST is infinite, just return 1
if (sfst.Properties(kCyclic, true)) {
return 1 ;
}
// copy the FST input to a new unweighted network,
// over the Log Semiring
LogVectorFst lfst ;
// for each original StdArc, if the weight is _not_
// TropicalWeight::Zero() then change it to LogWeight::One(),
// else (when the weight is TropicalWeight::Zero()) change it to
// LogWeight::Zero(). This is handily done with
// RmWeightMapper(), which is in the library.
Map(sfst, &lfst, RmWeightMapper<StdArc, LogArc>()) ;
vector<LogArc::Weight> distance ;
ShortestDistance(lfst, &distance, true) ;
// from the distance vector, get the weight for the start state
LogArc::Weight w = distance[lfst.Start()] ;
// w.Value() is the log of the number of paths
// cout << "Weight from pathCount(): " << w << endl ;
// so make w positive and get the exp()
int paths = exp((double)(1.0 * w.Value())) ;
// cout << "Paths from pathCount(): " << paths << endl ;
return paths ;
}
// a trivial main() that tests pathCount()
//
int main() {
// Create a little FST over the Tropical Semiring
StdVectorFst fst ;
fst.AddState() ; // will be state 0
fst.SetStart(0) ;
fst.AddState() ; // will be state 1
fst.AddState() ; // will be state 2
// add some arcs
fst.AddArc(0, StdArc(1, 1, 0.693, 1)) ; // arc from 0 to 1
fst.AddArc(0, StdArc(2, 2, 0.693, 1)) ; // arc from 0 to 1
fst.AddArc(0, StdArc(7, 7, 0.0, 1)) ;
fst.AddArc(1, StdArc(3, 3, 0.693, 2)) ; // arc from 1 to 2
fst.AddArc(1, StdArc(4, 4, 0.693, 2)) ; // arc from 1 to 2
fst.AddArc(1, StdArc(5, 5, 0.693, 2)) ;
fst.AddArc(0, StdArc(8, 8, 0.0, 2)) ;
// Uncomment the following line to create a cycle/loop,
// which makes the language or relation infinite.
//fst.AddArc(1, StdArc(5, 5, 0.0, 1)) ;
fst.SetFinal(2, 0.693) ;
int count = pathCount(fst) ;
cout << "Path Count from main(): " << count << endl << endl ;
return 0 ;
}

KennethRBeesley 
03 Jul 2008  16:46 
Hi Ken,
I fixed your post.
Cyril 
CyrilAllauzen 
09 Jul 2008  11:41 
Cyril, Thanks again for your help in implementing the pathcounting algorithm above. Could you or anyone help me understand how the code above returns the number of paths? Why does the weight of the start state, in the Log semiring, correspond to the number of paths.? In larger FSTs, some accuracy seems to be lost. In one of our examples, an FST reported (by the algorithm above) to have 1,048,580 paths, has, in fact 4 fewer paths. Not a big error, but it surprised one of our testers who expected to get an exact path count. Thanks,
Ken 
KennethRBeesley 
24 Jun 2009  19:55 
It uses the shortestdistance algorithm describe in FST.ShortestDistanceDoc. In the real semiring, when the weights of all the arcs are 1, the shortestdistance from the initial state to a state q is the number of paths from the initial state to q. However, since the libary does not implement the real semiring, we have to do the computation in the log semiring. In this very special case where the weights are integer and we always multiply by 1, going to the log actually leads to more numerical errors.
There are two things you can try if you want to get better precision:
1. Use a smaller delta. Shortestdistance (like many algorithms in the library), use a delta parameter to determine whether two weights are "equal". By default, this delta is kDelta. This is to make the algorithm converge faster when they are cycles. Here, since the machine is acyclic, you could get away with a much much smaller delta.
2. Use doubles instead of floats. If reducing significantly delta does not help, the imprecisions might be due to floating point errors. You could try using doubles instead of floats by using ArcTpl<LogWeightTpl > instead of LogArc.
Finally, if this is very important to you, the best thing would probably be to implement the (+,x)semiring over unsigned 64bit integers.
Cyril 
CyrilAllauzen 
25 Jun 2009  17:02 
About Factoring Algorithm of WFST
Hi,
In this paper
Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted FiniteState Transducers. In Larry Rabiner and Fred Juang, editors, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition. SpringerVerlag, Heidelberg, Germany, 2008.
The factoring algorithm is mentioned but no enough details.
Can you tell me where to find the code or detailed descrption of the factoring algorithm?
Thanks in advance!
Zhenyu Pan 
ZyPan 
30 May 2008  13:08 
Here is a patch to use CMake for builds (tested on Linux and Mac OS X)
It's minor, but it would be nice to have a SymbolTable::size() function that returns the number of keys. 
MarkusD 
23 Apr 2008  11:06 
Efficient Way of Taking the Union of Several FSTs
Hi,
I'm trying to take the union of several thousands of FSTs. After a while the union becomes quite large and the union operation appears to take more and more time to complete. I tried removing epsilon transitions and determinizing after each union. This helps to reduce the time consumed by the union operation, however this time the overall operation (union+rmepsilon+encode+determinize+decode) takes much more time compared to the former case. I wonder if there is an efficient way of taking the union of several FSTs.
Thanks in advance, Dogan 
DoganCan 
22 Apr 2008  12:57 
Hi,
When dealing with the union of a very large number N of FSTs, a good approach is two proceed in log N steps. The first step consists of computing the union of the (2i)th and (2i+1)th FSTs for each 0 < i < N/2, leading to a set of N/2 transducer for the next step. The final transducer is obtained as the union of 2 intermediary transducers at the log N step.
Best, Cyril 
CyrilAllauzen 
22 Apr 2008  17:58 
possible bug in fstdeterminize in case of log semiring
Hi,
I noticed that the outputs of fstdeterminize and fsmdeterminize from AT&T FSM toolkit do not match in the case of log semiring. Here is a simple fst in log semiring with the mentioned problem and the results of two executables
cat test.txt 4 0 74 0 0.00274658203 4 1 74 0 5.89324951 4 3 20364 0 0.0282592773 4 2 20364 0 3.57971692 4 5 0 1 1.0986321 0 5 0 1 4.69826591e06 0 2 20364 0 3.68112183 0 3 20364 0 0.0255126953 1 5 0 1 1 2 20364 0 2 5 0 1 3 5 0 1 5
fstcompile arc_type=log test.txt  fstdeterminize  fstprint 0 1 0 1 1.0986321 0 2 74 0 1.51862259e05 0 3 20364 0 1.98714133e05 1 2 1 0 1 0.000163570745 2 3 20364 0 0.000163570745 3 1 0 1 4.95321437e05
fsmcompile t s log test.txt  fsmdeterminize  fsmprint 0 1 0 1 1.0986321 0 2 74 0 1.51862259e05 0 3 20364 0 1.98714133e05 1 2 1 0 1 4.6852183e06 2 3 20364 0 4.68526787e06 3 1 0 1 7.15111692e10
I believe a previous discussion titled "fstdeterminize" is about the same issue where fstdeterminize changes the path weights in the log semiring. I tried the suggestion "The issue might be due to numerical instabilities. One solution would then be to use a smaller delta (in DeterminizeFstOptions)."
with much smaller/larger deltas, but the result of fstdeterminize did not change.
Could you please explain what I'm missing or how to correct if this is actually a bug.
Best, Dogan 
DoganCan 
15 Apr 2008  10:44 
Hi,
The output of fsmdeterminize and fstdeterminize do not match because there are significant differences in the implementation. In that example, both outputs seem valid to me.
However, changing the delta should have had an effect on the result. It appears there was a bug and the delta parameter was ignored when determinizing transducers (delta works as it is supposed to for acceptors).
A new version of the library fixing this issue is available on the download page.
Best, Cyril 
CyrilAllauzen 
17 Apr 2008  17:28 
LM using FST
Hi, I used to use GRM to build LMs with FSM, is FST compatible with GRM ? If not, is there an alternative in FST ?
Also, What is the equivalent of farcompilestrings & farprintstrings under FST?
Thanks, Mohamed Noamany 
MohamedNoamany 
03 Apr 2008  20:50 
Hi, Only the text representation of OpenFst and FSM are compatible. So, if you want to use a LM build by GRM in OpenFst, you need to convert it to the OpenFst format using fsmprint / fstcompile : fsmprint lm.fsm  fstcompile > lm.fst There are no equivalent of farcompilestrings/farprintstrings in OpenFst.
Best, Cyril 
CyrilAllauzen 
04 Apr 2008  10:50 

JonathanB 
16 Oct 2009  11:16 
Hi Cyril, Would the far tool in OpenKernel do the trick? (edit) ..mmhh.. I take it back. I just noticed version 1.2 came out and it does have the far extension. cheers 
AMorenoDaniel 
13 Aug 2010  19:36 
compilation gotcha with spaces in directory names
Low priority item for the todo list (which will be likely fixed with autoconf but good to put in the test suite): compiling in a directory with spaces in the name (allowed and quite possible under macos) causes $(PWD) in bin/Makefile to break.
Symptom:
localhost:~/cse733/sp08/794L tutorial materials/fst/OpenFst/fst fosler$ make ( cd lib ; make all ) make[1]: Nothing to be done for `all'. ( cd bin ; make all ) make[1]: * No rule to make target `"/Users/fosler/cse733/sp08/794L', needed by `fstarcsort.o'. Stop. make: * [all] Error 2
Clear and easy workaround is to replace $(PWD) with a hard coded directory with escaped spaces. 
EricFoslerLussier 
26 Mar 2008  21:35 
Input and output alphabets
I've been using nondisjoint but different input and output alphabets, along with keep_osymbols and keep_isymbols.
Two commands in particular are confusing here. fstinvert does not understand that the input and output symbol tables need to be swapped if the result is going to be correct, and fstproject does not understand that the alphabets in the result are going to be either both derived from the osymbols or both derived from the isymbols. Presumably the trouble is that these two commands are outliers compared to the rest of the library, so the infrastructure which they get from the library doesn't quite fit. 
ChrisBrew 
14 Mar 2008  16:40 
This is actually a bug. It should do the right thing with the symbol tables for project and invert. The bug has been fixed and you can download the fixed version of the library on the download page. 
CyrilAllauzen 
14 Mar 2008  19:24 
Thankyou. There's an easy workaround: don't use different alphabets, which actually works for me. But better not to have to. 
ChrisBrew 
15 Mar 2008  10:49 
Thankyou. There's an easy workaround: don't use different alphabets, which actually works for me. But better not to have to. 
ChrisBrew 
15 Mar 2008  10:49 
Actually, a new bug was introduced in fstinvert in the version released on friday. A new version is now available on the download page fixing this bug. 
CyrilAllauzen 
17 Mar 2008  11:51 
Interpreting complement character ranges
I'm implementing a programming language based on regular expressions, which are interpreted to produce finitestate networks. The current interpreter, which is just starting to work, calls functions in the OpenFst library, though I've kept the design modular enough to use other interpreters based on other libraries.
Interpreting character ranges: If I parse and interpret the following statement:
$foo = b [aeiou] t ;
a character range like [aeiou] or [az] can be handled straightforwardly as a union, equivalent to (a 
e 
i 
o 
u) and (a 
b 
c 
d 
... 
z), respectively. No problem. But how, in OpenFst, can I interpret complemented character ranges such as
$bar = a [^aeiou] e ;
???
If . is used in the syntax to represent "any possible character" (denoting the set of all singlecharacter strings), then [^aeiou] is equivalent to (.  (a 
e 
i 
o 
u)), but that just raises the question, How would one interpret the following:
$fum = a . e ;
denoting the set of all threecharacter strings where the first character is 'a', the second character is "any possible character" and the third is 'e' ?
Thanks,
Ken 
KennethRBeesley 
13 Mar 2008  13:12 
Wow. That last message got garbled wherever I used a verticalbar to indicate union. The first garbled example, expanding [aeiou], was (a verticalbar e verticalbar i verticalbar o verticalbar u ). The second, equivalent to [az] was (a verticalbar b verticalbar c verticalbar d verticalbar ... z)
The third, equivalent to [^aeiou] was
(.  ( a verticalbar e verticalbar i verticalbar o verticalbar u))
Ken 
KennethRBeesley 
13 Mar 2008  14:02 
Mindtuning: value of separate input and output symbol tables?
OpenFst allows the use of separate input and output symbol tables for mapping symbol names (strings) to the integers used as labels on an Fst. So, for examples, the label 2 on the input side could be mapped to 'a', and the label 2 on the output side could be mapped to 'b'.
1. What's the perceived value of allowing separate input and output mappings? 2. In practice, is it most common to use the same symbol table for both sides? Thanks,
Ken 
KennethRBeesley 
04 Mar 2008  11:53 
Actually, it is very common to have different symbol tables for each side. For instance, if you a lexicon that maps prononciation of words, i.e. sequence of phonemes, to words. The input symbol table would contains "phonemes" and the output table words. This is a real example from speech recognition (cf. this paper) 
CyrilAllauzen 
04 Mar 2008  11:59 
Yes, that's a good example. Thanks. In my own work, at least, I would tend to want individual alphabetic symbols like 'a', 'b' and 'c' to be stored (by default) on arcs with their standard Unicode code point values, including IPA symbols in a speech application; and any symbols with multicharacter print names (including whole words, as in the speechrecognition example) would be stored with code point values from a Unicode Private Use Area. Unicode currently offers 137,468 Private Use code pointsis that enough for all the words and other multicharacter symbols (e.g. ngrams of phoneme characters) used in a typical speech application? 
KennethRBeesley 
05 Mar 2008  15:43 
Unfortunately, this solution would not be satisfying for speech recognition. Some very large vocabulary speech recognition systems use vocabularies consisting of several million words (i.e. multicharacter symbols).
Allowing users to define their own input and output symbol tables is the most general approach. For your work, the approach you describe may be more than adequate, and the library indeed allows you to have the same symbol table used as both input and output table for all your FSTs.
Cyril 
CyrilAllauzen 
06 Mar 2008  10:55 
"Normalizing" arc/final weights on an FST?
I am using FSTs to represent probabilistic finitestate automata and there are various situations in which I need to "normalize" themthat is, scale the arc weights such that the probability of arcs leading out of a state (plus the probability of state finality) sums to 1 for each state. I am having trouble, however, figuring out how to modify the weights on arcs of an FST. I have tried the following code, and while this modifies the final weight appropriately, it doesn't affect the arc weights at all:
void makeProper(VectorFst<StdArc> &fst) {
using fst::TropicalWeight;
for(StateIterator<VectorFst<StdArc> > siter(fst); !siter.Done(); siter.Next()) {
int s = siter.Value();
float tot = pow(2, 1 * fst.Final(s).Value());
for(ArcIterator<VectorFst<StdArc> > aiter(fst,s); ! aiter.Done(); aiter.Next()) {
StdArc arc = aiter.Value();
float contribution = pow(2, 1 * arc.weight.Value());
tot += contribution;
}
float adjustment = 1 * log(tot) / log(2);
float newFinal = fst.Final(s).Value()  adjustment;
fst.SetFinal(s,TropicalWeight(newFinal));
for(ArcIterator<VectorFst<StdArc> > aiter(fst,s); ! aiter.Done(); aiter.Next()) {
StdArc arc = aiter.Value();
float newWeight = arc.weight.Value()  adjustment;
arc.weight = TropicalWeight(0.0);
}
}
}
How can I modify the arc weights? Or is there some other functionality I haven't noticed yet that performs this normalization on its own?
Many thanks once again,
Roger 
RogerLevy 
02 Mar 2008  23:34 
Ahh...my naive attempt to use ... was not entirely successful. My full query with properlyindented code can be found here:
http://idiom.ucsd.edu/~rlevy/normalize_fst_weights
Roger 
RogerLevy 
02 Mar 2008  23:37 
Hi Roger,
1. In order to modify an arc, you need to use a MutableArcIterator instead of an ArcIterator. Then do aiter.SetValue(arc) at the end of your last for loop. The constructor of aiter should then also take &fst as an argument instead of fst.
2. There is an operation to normalize the weights in the way you describe. It is call pushing: Push(&fst, REWEIGHT_TO_INITIAL)
However, pushing is defined using the semiring operation, hence in your case you'll need to push in the log semiring.
VectorFst<StdArc> fst;
VectorFst<LogArc> log_fst;
Map(fst, &log_fst, StdToLogMapper());
Push(&log_fst, REWEIGHT_TO_INITIAL);
Map(log_fst, &fst, LogToStdMapper());
The total weight will still be present at the initial state. You can remove it by doing after calling Push:
vector<LogWeight> distance;
ShortestDistance(log_fst, &distance, true);
LogWeight total = distance[log_fst>Start()];
MutableArcIterator< VectorFst<LogArc> > aiter(&log_fst, log_fst>Start());
for (; !aiter.Done(); aiter.Next()) {
LogArc arc = aiter.Value();
arc.weight = Divide(arc.weight, total);
aiter.SetValue();
}
P.S. I didn't compile the code so there might be some typos. P.P.S. I fixed the formating of your original post. 
CyrilAllauzen 
04 Mar 2008  11:25 
Many thanks once more, Cyril. (The last call needed to be aiter.SetValue(arc) but that was easy enough to figure out.) The approach I was taking had an error in it, anyway, so Push is much better.
Best
Roger 
RogerLevy 
04 Mar 2008  22:50 
One other question: is there an easy way of making LogArc and LogWeight operate in base 2 logs rather than in natural logs?
Roger 
RogerLevy 
05 Mar 2008  17:52 
Unfortunately, they are no easy way to do this.
Cyril 
CyrilAllauzen 
06 Mar 2008  10:44 
Got it. Also, I notice that RealWeight and RealArc are not yet implemented, is this correct? (they would be convenient for transparent posthoc inspection of normalized logsemiring FSTs.)
Best
Roger 
RogerLevy 
08 Mar 2008  15:56 
No RealWeight yet. The reason was we never used the real semiring for anything else than toy example with the FSM library. But, we might decide to add it to the next release since people seem to request it.
Best, Cyril 
CyrilAllauzen 
11 Mar 2008  11:59 
Natural code for printing all strings accepted by an FST?
I have written some code for printing all strings accepted by an FST together with their weights. (Clearly a bad idea in general, but useful right now for some tests I'm running with very small FSTs.) Since I am new to both C++ and OpenFst I wonder whether there is a more natural way to write this functionality. In particular I wonder:
1) is there a better way of checking whether a state is final?
2) whether there are better ways to implement the accumulation of weights along the path through the FST, perhaps by using a Weight class directly?
void printAllStrings(VectorFst<StdArc> &fst) {
vector<int> str(0);
printAllStringsHelper(fst, 0, str, 0.0);
}
void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, float cost) {
if(fst.Final(state).Value() < 100000)
cout << vectorToString(str) << " with cost " << (cost + fst.Final(state).Value()) << endl;
for(ArcIterator<VectorFst<StdArc> > aiter(fst,state); ! aiter.Done(); aiter.Next()) {
StdArc arc = aiter.Value();
str.push_back(arc.ilabel);
printAllStringsHelper(fst, arc.nextstate, str, cost + arc.weight.Value());
str.pop_back();
}
}
string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ
stringstream s;
s << i;
return s.str();
}
string vectorToString(vector<int> v) {
if(v.size() == 0)
return "<>";
string result = "<" + itos(v[0]);
for(int i = 1; i < v.size(); i++) {
result += "," + itos(v[i]);
}
return result + ">";
}
Many thanks in advance for any pointers!
Roger 
RogerLevy 
01 Mar 2008  21:04 
hmm...TWiki has done some damage here to my doubleequals equality test, to my plusequals assignments, and also has erased some templates...hopefully the code is still reasonably readable! I've posted it at http://idiom.ucsd.edu/~rlevy/print_all_strings
Roger 
RogerLevy 
01 Mar 2008  21:09 
1. Yes! There is a much better to test if a state is final.
if (fst.Final(state) == Weight::Zero()) // if state is not final Where you can replace Weight by the appropriate weight class (even better do a typedef of the current weight to Weight).
2. Indeed, you should try to use the Weight class operations. You should almost never need to use weight.Value(). Even cout << weight and cin >> weight should work since the operators << and >> are defined.
Remember, the operation to combine weight along a path is Times. So in your code you should haveTropicalWeight cost and Times(cost, arc.weight) P.S. I fixed the formatting of your original post 
CyrilAllauzen 
04 Mar 2008  12:12 
Many thanks, Cyril, this is great!
Roger 
RogerLevy 
04 Mar 2008  22:47 
Roger, 1. Do you have a final version of your code to print all paths with their weights?, and 2. Would you consider sharing it? Thanks,
Ken 
KennethRBeesley 
11 Jun 2008  11:52 
Hi Ken,
It looks like the current version of my code is pretty similar to the earlier version:
// a bad idea if there are loops :) void printAllStrings(VectorFst &fst) { vector str(0); TropicalWeight tw(TropicalWeight::One()); printAllStringsHelper(fst, 0, str, tw); }
void printAllStringsHelper(VectorFst &fst, int state, vector str, TropicalWeight cost) { if(fst.Final(state) = TropicalWeight::Zero()) cout << vectorToString(str) << " with cost " << (Times(cost,fst.Final(state))) << endl; for(ArcIterator<VectorFst > aiter(fst,state); ! aiter.Done(); aiter.Next()) { StdArc arc = aiter.Value(); str.push_back(arc.ilabel); printAllStringsHelper(fst, arc.nextstate, str, Times(cost, arc.weight.Value())); str.pop_back(); } }
string vectorToString(vector v) { if(v.size()
0) return "<>"; string result = "<" + itos(v[0]); for(int i = 1; i < v.size(); i++) { result + "," + itos(v[i]); } return result + ">"; }
string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ stringstream s; s << i; return s.str(); }
I have a similar but separate function pair for LogArc FSTs. It would be nice to have a single function pair that works for any arc type, and except for the types the code would be the same, but I don't see how to do it because there isn't a common superclass for LogArc and StdArc. If someone else sees how to do this, let me know!
And please feel free to use this code :)
Roger 
RogerLevy 
11 Jun 2008  20:17 
Ah, I also forgot  I believe that arc.weight.Value() can be replaced by arc.weight, as per Cyril's suggestion earlier. 
RogerLevy 
11 Jun 2008  20:18 
Roger, Many thanks for posting this code. One comment at this point: the initial call to printAllStringsHelper(fst, 0, str, tw) assumes that the start state is state 0. After combining and optimizing fsts, this may not be the case. Better to use printAllStringsHelper(fst, fst.Start(), str, tw).
Ken 
KennethRBeesley 
13 Jun 2008  11:52 
Roger, Another possibility: generalization to print Input vs. Output strings. Your code currently uses arc.ilabel. You could do
enum Projection { PROJECT_INPUT, PROJECT_OUTPUT } ;
void printAllStrings(VectorFst &fst, Projection proj) { ... printAllStringsHelper(fst, fst.Start(), str, tw, proj) ; }
....
if (proj == PROJECT_INPUT) { str.push_back(arc.ilabel) ; } else { str.push_back(arc.olabel) ; }
Ken 
KennethRBeesley 
13 Jun 2008  12:15 
Roger,
In testing for a final state, I think that your code is incorrect (or garbled in the posting). You have if(fst.Final(state) = TropicalWeight::Zero()) but (as I understand it) a state is final if the exit weight is not equal to Weight::Zero(), so if (fst.Final(state) = TropicalWeight::Zero()) with a notequal operator, is what you want. Straighten me out as necessary.
Ken 
KennethRBeesley 
13 Jun 2008  12:37 
Roger, I see that my "correction" got corrupted the same way when posted. The test should be if (fst.Final(state) notEqualTo TropicalWeight::Zero()) where notEqualTo is the usual exclamation mark followed by an equalsign. Ken 
KennethRBeesley 
13 Jun 2008  14:04 
Why is TropicalWeight considered standard?
In another word, why does StdArc use TropicalWeight instead of FloatWeight or LogWeight. Is there some material that explains the significance of TropicalWeight?
Also, what's the reason that there are Plus(), Times() and Divide() implemented for FloatWeight()? Because it is too trivial? 
JiaPu 
29 Feb 2008  18:57 
sorry, type. It should read:
".... there are NOT Plus() ..." 
JiaPu 
29 Feb 2008  18:58 
sorry, typo. It should read:
".... there are NOT Plus() ..." 
JiaPu 
29 Feb 2008  18:58 
StdArc use TropicalWeight and LogArc use LogWeight. StdArc could have been call TropicalArc, but since it is the most commonly used [it correspond to the Viterbi approximation] a shorter name like StdArc was prefered. If you don't like it, you can still do:
typedef StdArc TropicalArc; FloatWeight is just an abstract class from which LogWeight and TropicalWeight are derived. It does not correspond to a semiring, that's why it does not have Plus(), Times(), ... 
CyrilAllauzen 
04 Mar 2008  11:43 
For the significance of the tropical semiring, you can check out the papers in FstBackground or this paper or that one. 
CyrilAllauzen 
04 Mar 2008  11:50 
Cyril,
Please give me an algebra 101. According to the definition of semiring on wiki, float (or rational number) is a semiring. And when authorizing finite state grammar for speech recognition, it is more natural to use float as weight. Although, internally log weight is preferred. 
JiaPu 
04 Mar 2008  15:23 
Float in FloatWeight only means than the underlying representation is a C/C++ float type. TropicalWeight and LogWeight both use C/C++ float to represent real numbers, hence they are both derived from the partially abstract type FloatWeight. Of course, the real number with as operation the usual plus and times is also a semiring. It is not implemented in the library for now because it is not very commonly used. We might add it in a future version. If we do, it will also be as a type derived from FloatWeight.
Float just defines the set, it is not enough to define a semiring. Different times and plus operations would lead to different semiring sharing the same set of elements.
Finally, you didn't have to all the way to wikipedia, there is some documentation on here on this website. 
CyrilAllauzen 
04 Mar 2008  19:10 
Compiler error on Determinize().
When compiling following code:
StdVectorFst in_fst; ... add some states and arcs to in_fst ... StdVectorFst out_fst Determinize(fst, out_fst);
I got compiler error: /Volumes/Data/Projects/OpenFst/macosx/Corpus2SphinxFsg.cpp:82: error: no matching function for call to 'Determinize(fst::StdVectorFst&, fst::StdVectorFst&)'
Since StdVectorFst is derived from Fst, I don't understand that why this doesn't compile. 
JiaPu 
28 Feb 2008  12:48 
Aha, never mind. Just realize that Determinize() expects a pointer instead of a reference on the second argument. But why can't we use reference on both arguments? 
JiaPu 
28 Feb 2008  13:18 
This is one of our coding conventions: input arguments are declared as const references and output arguments are declared as pointers. 
CyrilAllauzen 
29 Feb 2008  14:26 
New Release of OpenFst?
Are there any plans for a new release of OpenFst? If so, I have some requests ...
Ken 
KennethRBeesley 
06 Feb 2008  17:20 
There will be a new release of OpenFst. You're welcome to make some suggestions. 
CyrilAllauzen 
08 Feb 2008  15:20 
Cyril,
Very glad to hear about the new release.
I respectfully request the following:
void Copy(&In, *Out) // create a deep copy of the input automaton
Convenience boolean functions for testing an existing network:
isLabelSorted(&Fst, side) isFunctional isAcceptor isWeighted isEpsilonFree isDeterministic
Finally, a new text inputoutput format that is Unicodecapable. I'll try to write up some more specific suggestions for an XMLbased notation.
Thanks for all your good work, Ken 
KennethRBeesley 
13 Feb 2008  23:53 
Hi Ken,
A deep copy can already be performed using Map() with the IdentityMapper, however this might not be obvious at first so a Copy() could indeed be something useful.StdVectorFst in;
StdVectorFst out;
Map(in, &out, IdentityMapper<StdArc>());
For the boolean functions, I'm not so sure that that: if(isAcceptor(fst)) ... is much simpler than: if(fst.Properties(kAcceptor, true)) ...
Best, Cyril 
CyrilAllauzen 
29 Feb 2008  12:48 
For functional, the issue is that the best algorithms to test for functionality are quadratic in the number of transitions. This prevented us from using it as a property bits, that's why there are no kFunctional. 
CyrilAllauzen 
29 Feb 2008  12:53 
Thanks Cyril,
Your fst.Properties(kAcceptor, true) example was what I needed to get oriented. No boolean function isAcceptor(fst) is needed. Other beginners like me might want to look in fst/lib/properties.h for more property names, including:
kAcceptor kNotAcceptor
kWeighted kNotWeighted
kEpsilons kNoEpsilons
kIDeterministic (deterministic on the input side) kNonIDeterministic
kODeterministic (deterministic on the output side) kNonODeterministic
kILabelSorted (sorted with regard to input labels) kNotILabelSorted
kOLabelSorted (sorted with regard to output labels) kNotOLabelSorted 
KennethRBeesley 
29 Feb 2008  17:06 
Another newbie question
I'm having trouble compiling another example from the Quick Tour page, to get myself going with FST composition. here it is:
#include "fst/lib/fstlib.h"
int main() {
using fst::StdVectorFst; using fst::StdFst; using fst::StdOLabelCompare; using fst::StdILabelCompare; using fst::ArcSort;
StdFst *input = StdFst::Read("input.fst");
StdFst *model = StdFst::Read("model.fst");
ArcSort(input, StdOLabelCompare()); ArcSort(model, StdILabelCompare());
StdVectorFst result;
Compose(*input, *model, &result);
}
Trying to compile gives me the following error:
rlevy@truffle:~/src/C++$ g++ I /local/contrib/cpl/OpenFst Example1.cpp /local/contrib/cpl/OpenFst/fst/lib/libfst.so lpthread Example1.cpp: In function ‘int main()’: Example1.cpp:16: error: no matching function for call to ‘ArcSort(fst::StdFst*&, fst::StdOLabelCompare)’ Example1.cpp:17: error: no matching function for call to ‘ArcSort(fst::StdFst*&, fst::StdILabelCompare)’
It's not obvious to me why it's not matching the function properly  any ideas?
Many thanks!
Roger 
RogerLevy 
01 Feb 2008  01:14 
This does not work because the input of ArcSort needs to be a mutable fst (of type MutableFst). Here, input and model are of type StdFst i.e. Fst and not MutableFst.
There are several solutions for this: 1. Restrict the program to only accepts as input VectorFst which is mutable: StdVectorFst *input = StdVectorFst::Read("input.fst"); StdVectorFst *model = StdVectorFst::Read("model.fst"); Simple but not general.
2. Copy model and input into a VectorFst before sorting: StdVectorFst input2(*input); ArcSort(&input2, StdOLabelCompare());
3. Use onthefly sorting: ArcSortFst StdOLabelCompare> input2(*input, StdOLabelCompare());
Best, Cyril 
CyrilAllauzen 
29 Feb 2008  13:19 
I am getting the same error, except I have already declared mutable FSTs. I would really appreciate some help! My code is of the form:
#include "fst/fstlib.h"
using namespace std; using namespace fst;
int main() { StdVectorFst fst1; StdVectorFst fst2; StdVectorFst fst3;
ArcSort(fst1, StdOLabelCompare()); fst3 = ComposeFst(fst1, fst2);
return 0; }
Specifically, the compile error is:
no matching function for call to 'ArcSort(fst::StdVectorFst&, fst::StdOLabelCompare)'
Thank you very much, Jonathan 
JonathanB 
16 Oct 2009  12:03 
Access control: 