Line: 1 to 1  

Added:  
> > 
OpenFst Forum 2015 Archive
converting standard lattice format into and FST fileMuhammadSalem  20151111  08:11Hi i have Standard lattice format files, and i want to convert them into WFST to use them in Kaldi each file contains more than one NULL node, which were used in htk to decrease number of arcs, now if i want to convert those files to FST What should those null nodes be in the symbol file Not that i am working with the specific symbol file as input symbol file and output symbol file in order to finish the process
Minimization for fsts with negative weights Miikka Silfverberg  20150902I get weird results when I try to minimize a tropical weight fst with negative weights as shown below. Additionally, the minimization takes quite long (around 20 seconds) although the fst only has 2 states.
$ cat bar.txt 0 1 1 1 5 1 1 2 2 1 1 0 $ fstcompile bar.txt bar.ofst $ fstminimize bar.ofst bar.min.ofst $ fstprint bar.min.ofst 0 1 1 1 16777220 1 1 2 2 1 16777216
As far as I understand, this fst is already minimal so If I convert all weights to positive weights, the problem vanishes
$ cat baz.txt 0 1 1 1 5 1 1 2 2 1 1 0 $ fstcompile baz.txt baz.ofst $ fstminimize baz.ofst baz.min.ofst $ fstprint baz.min.ofst 0 1 1 1 5 1 1 2 2 1 1 I'm using the latest stable release (http://openfst.org/twiki/pub/FST/FstDownload/openfst1.5.0.tar.gz).
DoganCan  20150902  22:29Hi Miikka,The minimization algorithm first applies weight pushing to normalize the distribution of weights along the paths of the input automaton. Weight pushing algorithm requires shortest distance from any state to final states to be defined. The example automaton with negative weights violates this condition due to the negative weight cycle. You can look into Jason Eisner's WFSA minimization algorithm if you need to minimize automata with negative weights, particularly if they have negative weight cycles. Cheers, Dogan
MiikkaSilfverberg  20150903  06:59Thanks for the explanation Dogan I'll look into Eisner's minimization algorithm. Do you happen to know about freely available implementations?
DoganCan  20150903  22:34Hi MiikkaI don't know if there is a publicly available implementation of Eisner's algorithm. Cheers, Dogan
Gentle Intro to Lazy FST OperationsKennethRBeesley  20150813  16:18Where can I find a gentle introduction to lazy FST operations? When and how to use them? Tradeoffs in size and performance?
MichaelRiley  20160527  10:18there is teh OpenFSt library paper ref'ed under background material topic here and there is some comments on efficiency in the efficiency topic.
Wildcard arcsRussellMoore  20150810  09:58Hi, sorry if this has been asked before. I am using OpenFST via the command line interface, so I want to build my FSTs as text files before compiling them. I would like to know if it's possible to add an arc that will:
So in my head I imagine something like:
0 1 a a [wgt_a] 0 1 * * [wgt_other] Where the asterisks represent "any other symbol", i.e. that arc operates as a catchall for any symbol that is not "a". (Update 17:35  I think what I want is a "rho" matcher. Can I specify this inside a text file? Or is there another way to do it?) Thanks!
DoganCan  20150811  22:59Hi Russell,
In OpenFst, special symbol information is not stored in the FSTs themselves, but are handled by the FST operations that accept these symbols as arguments. Unfortunately, special symbol matching operations are not available through the command line interface of standard OpenFst binaries. What you can do instead is to use a custom FST composition binary that can handle
Check out https://github.com/dogancan/lexiconunk. There you will find a tool called Cheers, Dogan
TopSort not working on FSTVivekRangarajan  20150727  15:33I run into a weird issue when I create a fst in C++ from a string by adding states and arcs. I have a function that does this work. When I write the FST using Write(), the final state is printed at the top. It looks like this:
5 0 1 66 66 0 2 81 81 0 3 2837 2837 1 2 10 10 1 3 749 749 1 4 930 930 2 3 66 66 2 4 324 324 2 5 864 864 3 4 8 8 3 5 146 146 4 5 30 30 fsttopsort on this fst does not sort the states either. I overcame this by just piping fstprint X  sort k 1 n  fstcompile Any ideas as to why this might be happening and how it can be fixed inside the C++ code?
DoganCan  20150803  20:08Hi Vivek,It is hard to debug your problem without seeing the code. My best guess is that you are not setting state 0 as the initial state. Cheers, Dogan
VivekRangarajan  20150805  16:38Thanks Dogan. I resolved this sometime back and should've added a comment. Your guess was right, I was setting the SetStart to str.size() instead of 0.
Generation of words accepted by a FiniteState MachineJulioCarrasquel  20150604  15:00Hello,I would like to know how to print all the words that can be accepted for a FiniteState Machine. Note that I'm using acceptors. There is a library called Lextools who does the Job, specifically using a function of that library called lexfsmstrings , but unfortunately it's not longer available free by the guys of AT&T.
MichaelRiley  20160527  10:20You could ask for a very large number of nbest from fstshortestpath.
GaborPinter  20160801  06:46> all the words... Provided the FSA/FST is acyclic (otherwise it would take somewhat longer), it is relatively easy to write a program in C++ that recursively iterates over all the arcs. With output size of around 10k per fst it was surprisingly fast.
KyleGorman  20161011  19:45Pynini (pynin.opengrm.org) i also has something called StringPaths which enumerates all paths in an acyclic FST. If you don't want to use Python, you could build something off of the paths.h header therein.If the FST is not acyclic, it would, of course, take not just "somewhat" longer but infinitely longer
Arcs labeled with stringsJulioCarrasquel  20150603  09:51Hi again,I have another question. I need to print using Graphviz a FST whose arc labels could be displayed as letters. I will highly appreciate a solution for this case. Thanks Julio
JulioCarrasquel  20150603  11:22I've already found a solution to this too. Thanks.
Putting only one expression on arcsJulioCarrasquel  20150601  11:12Hello,I need to put only one expression in the arcs of my FST since I'm not<br> modelling transducers, but acceptors. <br> I mean, for example, to put in an arc only "x" instead of "x:y". <br> If someone has a quick solution to this, I will highly appreciate it. Regards, <br> Julio Carrasquel <br> juliocarrasquel@gmail.com
JulioCarrasquel  20150601  11:14I forgot to mention, I'm using in my project the command line library.
JulioCarrasquel  20150603  11:22Nevermind,I found a solution using the option acceptor=true through the command fstdraw. Thanks in any case.
Expected edit distanceRolandSchwarz  20150508  03:26Hi,I was wondering if it is possible using the library to compute the expected edit distance between two distributions over strings. So, if X and Y are FSAs that contain a single sequence each, and T is my edit distance transducer then shortestdist(X o T o Y) over the tropical SR computes the edit distance. (a) Now assume X and Y are probabilistic FSAs (log SR), i.e. they define a valid distribution over a set of sequences, how do I compute the expected edit distance, i.e. \sum_x \sum_y P(x) P(y) ED(x,y)? (b) And finally, given a probabilistic FST that defines a joint distribution for X and Y, how do I compute the expected edit distance over the joint, i.e. \sum_x \sum_y P(x,y) ED(x,y)? I guess in the case (a) the expectation SR might help? But how do I do it in the case (b)? Any help would really be appreciated. Thanks, Roland
DoganCan  20150511  19:58Hi Roland,There is an algorithm by Mohri for computing expected edit distance between weighted automata. See this paper for details. I implemented a slightly modified version of this algorithm some time ago. I just put together a simple setup based on that implementation and pushed it to GitHub in case you want to take a look at it. https://github.com/dogancan/expectededitdistance Cheers, Dogan
RolandSchwarz  20150513  05:51Amazing, thanks, I'll check it out!Roland
Assigning the symbol table to FSTs in C++AnuragDas  20150504  08:56Hi, I am able to build a FST from a string through C++. But the symbol table is showing "None" when i do a fstinfo. I want to add symbol table like "**Byte" which gets added automatically when we do "thraxmakedep" from a grammar file with save_symbols set = True but through c++ only.
From weighted words to weighted phrase: how to isolate word scores at phrase level?FrancoisSofilvacer  20150411  13:43Such an operation must be classical in any concrete usage of FST cascade (OCR, speech, ...) but I struggle to express it in OpenFST.Take a simple example: the input is a character stream, made of words interspaced with blanks. A nonweighted and functional lexicon FST maps sequences of letters to words. To deal with uncertainty, a weighted and nonfunctional extension of it maps a sequence of letters to word/score pairs. For handling a sequence of words, a concatenation of WFSTlex is needed, resulting in multiplying (in the weight semiring) word scores but eventually, the contribution of each word is lost: the only score available encompasses the entire input. So a language model FST, further down in the cascade has no access to individual word scores, preventing it to compute a contextdependent weighted average for example, associating a word x with a different coefficient depending on it preceding y or z.
Compiling with mingw32BenoitFavre  20150226  07:44In case you want to compile openfst 1.4.1 with mingw (for crosscompiling it for windows from unix, for example). You need to use a mmap implementation such as mmanwin32 and set LDFLAGS to use it.make LDFLAGS=lmman Here is the patch: diff rupN old/src/lib/mappedfile.cc new/src/lib/mappedfile.cc old/src/lib/mappedfile.cc 20140430 00:15:18.000000000 +0200 +++ new/src/lib/mappedfile.cc 20150226 08:53:46.339405863 +0100 @@ 20,6 +20,10 @@ #include <fcntl.h> #include <unistd.h> +#if (defined _WIN32  defined _WIN64  defined WINDOWS  defined MINGW32) +#include <windows.h> +#endif + namespace fst { // Alignment required for mapping structures (in bytes.) Regions of memory @@ 76,7 +80,13 @@ MappedFile* MappedFile::Map(istream* s, size_t pos = spos; int fd = open(source.c_str(), O_RDONLY); if (fd = 1) { +#if (defined _WIN32  defined _WIN64  defined WINDOWS  defined MINGW32) + SYSTEM_INFO system_info; + GetSystemInfo(&system_info); + int pagesize = system_info.dwPageSize; +#else int pagesize = sysconf(_SC_PAGESIZE); +#endif off_t offset = pos % pagesize; off_t upsize = size + offset; void *map = mmap(0, upsize, PROT_READ, MAP_SHARED, fd, pos  offset);
Is there an existing implementation to check if one node is reachable from another in an fst ?NatS  20150224  12:44
Lookahead composition without PushWeightsComposeFilterKostyaK  20150214  17:49Hi! I used with etalon code to do lookahead composition: <verbatim> typedef fst::StdVectorFst GRAPH; GRAPH _graph1, _graph2; //... fst::StdOLabelLookAheadFst graph1Look(_graph1); std::auto_ptr<GRAPH> graph2TrueRelabel(_graph2.Copy(true)); fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true); fst::ArcSort(graph2TrueRelabel.get(), fst::StdILabelCompare()); std::auto_ptr<GRAPH> res(new GRAPH); fst::Compose(graph1Look, *graph2TrueRelabel.get(), res.get()); </verbatim> It's work great, but i want to do lookahead composition without PushWeightsComposeFilter for example. First, I try to repeat lookahead composition without class wrappers: <verbatim> typedef fst::StdVectorFst GRAPH; GRAPH _graph1, _graph2; //... typedef fst::LabelLookAheadMatcher<fst::SortedMatcher<GRAPH>, fst::olabel_lookahead_flags> LOOK_MATCHER; typedef fst::SortedMatcher<GRAPH> SORTED_MATCHER; typedef fst::SequenceComposeFilter<LOOK_MATCHER, SORTED_MATCHER> SEQ_FILTER; typedef fst::LookAheadComposeFilter<SEQ_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> LOOK_COMPOSE_FILTER; typedef fst::PushWeightsComposeFilter<LOOK_COMPOSE_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_WEIGHTS_FILTER; typedef fst::PushLabelsComposeFilter<PUSH_WEIGHTS_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_LABELS_FILTER;typedef PUSH_LABELS_FILTER COMPOSE_FILTER; // My compose options fst::CacheOptions cacheOptions; fst::ComposeFstImplOptions<LOOK_MATCHER, SORTED_MATCHER, COMPOSE_FILTER> composeOptions(cacheOptions); composeOptions.matcher1 = new LOOK_MATCHER(_graph1, fst::MATCH_OUTPUT); //Relabel _graph1 such as fst::StdOLabelLookAheadFst graph1Look(_graph1); LOOK_MATCHER::MatcherData* matcherData = composeOptions.matcher1>GetData(); fst::LabelReachable<Arc> graph1Reachable(matcherData); std::auto_ptr<GRAPH> graph1Relabeled(&_graph1); graph1Reachable.Relabel(graph1Relabeled.get(), false); graph1Relabeled>SetInputSymbols(NULL); //Relabel _graph2 such as fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true); fst::LabelReachable<Arc> graph2Reachable(matcherData); std::auto_ptr<GRAPH> graph2Relabeled(&_graph2); graph2Reachable.Relabel(graph2Relabeled.get(), true); fst::ArcSort(graph2Relabeled.get(), fst::StdILabelCompare()); composeOptions.matcher2 = new SORTED_MATCHER(*graph2Relabeled.get(), fst::MATCH_INPUT); composeOptions.filter = new COMPOSE_FILTER(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions.matcher1, composeOptions.matcher2); std::auto_ptr<GRAPH> composedRes(new GRAPH); *composedRes = fst::ComposeFst<Arc>(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions); </verbatim> But result of my lookahead composition code is not equal to result of etalon code: my graph is much smaller than etalon. What I'm doing wrong? May be exists other way?
KostyaK  20150214  17:57<verbatim> test </verbatim>
KostyaK  20150214  18:08Text formatting don't work?Etalon lookahead composition code: http://pastebin.com/wbD2ZudV My lookahead composition code: http://pastebin.com/LjAUdv91
fstshortestpath unique?GezaKiss  20150210  14:50I used this command, expecting it to produce all unique output strings  but it produced only one. The FST I tried it on contained just the two possible pronunciations of the orthographic form "record", and nothing else (here '}' separates the input and output symbols): r}9r e}E c}k o}3r r}<epsilon> d}d r}9r e}I c}k o}oU r}9r d}dMy questions: 1) What is the meaning of the unique parameter? 2) How can I get the behavior I want using OpenFst (i.e. all the possible output strings)? Thanks! Géza
KyleGorman  20150903  16:20Hi Géza, I assume you've figured this out by now, but just in case: unique simply indicates that the resulting shortest path FST will contain only distinct paths, but it doesn't specify how many distinct paths will be present. How many paths are returned is controlled by the nshortest flag, which it defaults to 1.(Presumably the reason unique doesn't have the semantics you expected it to is that the machine might be cyclic, in which case termination would not be guaranteed. But any suggestions to improve the documentation would be taken under consideration.)
farcreate bugDoganCan  20150115  21:36Hi,
I just wanted to report a small bug in Cheers, Dogan
diff ur openfst1.4.1.orig/src/include/fst/extensions/far/create.h openfst1.4.1/src/include/fst/extensions/far/create.h  openfst1.4.1.orig/src/include/fst/extensions/far/create.h 20140425 02:55:40.000000000 0700 +++ openfst1.4.1/src/include/fst/extensions/far/create.h 20140425 02:57:48.000000000 0700 @@ 48,7 +48,7 @@ vector<string> inputs; if (file_list_input) {  for (int i = 1; i < in_fnames.size(); ++i) { + for (int i = 0; i < in_fnames.size(); ++i) { ifstream istrm(in_fnames[i].c_str()); string str; while (getline(istrm, str))
KyleGorman  20150910  19:49Hi, thanks for that bug report. A fix should be in the next release.
