You can use the formatting commands describes in TextFormattingRules in your comment.
If you want to post some code, surround it with <verbatim> and </verbatim> tags.
Auto-linking of WikiWords is now disabled in comments, so you can type VectorFst and it won't result in a broken link.
You now need to use <br> to force new lines in your comment (unless inside verbatim tags). However, a blank line will automatically create a new paragraph.
I am trying to compile Thrax in a Ubuntu VM using VirtualBox. I have gcc 4.8.2 installed and compiled openfst with far and pet enabled and in shared mode. I have 1Gb of RAM dedicated to the VM. If I try ./configure --enable-shared, it fails because I run out of memory. If I try just ./configure and then make, everything seems to compile ok until I get an internal compilation error:
/bin/bash ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT loader.lo -MD -MP -MF .deps/loader.Tpo -c -o loader.lo `test -f 'walker/loader.cc' || echo './'`walker/loader.cc
libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT loader.lo -MD -MP -MF .deps/loader.Tpo -c walker/loader.cc -fPIC -DPIC -o .libs/loader.o
g++: internal compiler error: Killed (program cc1plus)
Try commenting out the lines that refer to Log64Arc in src/include/thrax/function.h, viz
function.h:70:extern Registry<Function<fst::Log64Arc>* > kLog64ArcRegistry;
function.h:87: typedef name<fst::LogArc> Log64Arc ## name; function.h:88: REGISTER_LOGARC_FUNCTION(Log64Arc ## name)
(Obviously be careful in that #define REGISTER_GRM_FUNCTION to leave the continuation "\"s all happy.
The downside is you won't get log64 arcs. The upside is it should be smaller. The fact that it's running out of memory in compiling the loader makes me suspect that may be the problem because for each of the different arc types, all of the templated classes have to be expanded. This should reduce the size, therefore. If that still doesn't work, remove log arcs too. You won't likely be using them. Indeed, for precisely these sorts of issues I have been thinking of disabling those in future versions.
Ok thanks.
So the question is why you aren't getting that by inheritance. This is the first time I've seen this problem and I have no idea where it has suddenly broken.
Hi Richard (etc.), using Thrax 1.1.0 (and with OpenFst 1.3.4 already installed), compilation fails while making the file `ast/identifier-node.cc` due to an issue in the `include/thrax/compat/utils.h` header. Here's the error:
/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT identifier-node.lo -MD -MP -MF .deps/identifier-node.Tpo -c -o identifier-node.lo `test -f 'ast/identifier-node.cc' || echo './'`ast/identifier-node.cc
libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT identifier-node.lo -MD -MP -MF .deps/identifier-node.Tpo -c ast/identifier-node.cc -fno-common -DPIC -o .libs/identifier-node.o
In file included from ast/identifier-node.cc:22:
./../include/thrax/compat/utils.h:119:8: error: field has incomplete type
'char []'
char buf[];
^
I presume this is because buf[] doesn't have a length defined (nor is it initialized with a string), and when I change the line to
char buf[1024];
compilation goes through. (I'm not sure this is a sensible default; I spent no time trying to understand what this code is doing.)
I'd include a patch but it's one line.
Kyle
Just remove that line: that variable is not used. Apparently it's a holdover from some earlier implementation, and I just forgot to update it. I'll fix this in the next release.
Hi,
I am currently using thrax to extend my some features of an alignment tool I wrote for my g2p system.
The basic idea is that the user can specify some alignment correspondence rules and optional default penalties, and then these can be incorporated into the EM training process.
At present I have kind of hacked the functionality of the thraxcompiler command tool to read in the grammar, and then return the desired FST+symbol table to the alignment program.
EDIT: Maybe it makes more sense to just provide a couple of snippets:
GetFstFromGrammar
sy = SymbolTable['simple.syms'];
zero = "0".sy : "zero".sy;
units = ( "these're".sy : ( "these're".sy | "[these]" | "[these]" "are".sy ) );
split = ( "[these]" "are".sy : "these're".sy );
sigma = "<sigma>".sy : "<sigma>".sy;
abc = ( "a".sy "b c".sy : "a b b".sy );
export RULES = Optimize[ sigma* ( units | zero | abc ) sigma* ];
Here the 'sigma' is used in combination with a specialized 1-state alignment transducer that relies on RHO and SIGMA matchers.
Is there an alternative or recommended way to do this? It would be great if I could either specify the symbol table just once at the beginning, or automatically infer/generate the whole symbol table and return it - or even better modify the grammar from my C++ application to simply what the user is responsible for doing.
I went through the FAQ but did not notice any answers to these questions.
Thanks for your time.
UPDATE:
I solved this by creating some bindings with pybindgen and then writing a generator that interprets a simplified version of the Thrax grammar, then expands it to the versbose version with the extra quotes and symfile suffixes, etc.
yes (openfst 1.3.4 compiled with --enable-far and some other enable options ), thrax compiled successfully,but compilation fails while making the file `batch_test.c` (extracted form export.tgz), can you me some advice
I'd like to but first I need to understand what is going on. I can't reproduce your error (apparently) and I don't know what batch_test.c is since it's not part of the Thrax distribution. Is this your own code? If so then I need to see EXACTLY what you are doing, including probably your sending me a directory with all of the additional code.
If this is part of the Thrax distribution then please tell me where it is because I can't find it (nor do I remember such a file).
thank you for your reply.
in this page:
http://openfst.cs.nyu.edu/twiki/bin/view/Contrib/ThraxContrib,
you can see
Projects using the OpenGrm Thrax tools:
export.tgz: Grammars and software developed as part of a text normalization class taught at the Center for Spoken Language Understanding, Fall 2011. URL for the course: http://www.cslu.ogi.edu/~sproatr/Courses/TextNorm/
i download "export.tgz" .
there is a file called batch_tester.cc in batch_tester directory(extract from export.tgz)。
Ok that helps. Yes, I did write that, but it wasn't obvious from your query that this is what you were referring to. Please in future give all necessary information when reporting a bug.
In the meantime I will have a look. I do not know off the top of my head what the problem is.
Ok it's the usual nonsense about ordering of shared object libraries. If you do things in this order it should work:
g++ -g -O2 -o batch_tester batch_tester.o -L/usr/local/lib/fst -lm -ldl -lfst -lthrax -Wl,--rpath -Wl,/usr/local/lib/fst -Wl,--rpath -Wl,/usr/local/lib/fst -Wl,--rpath -Wl,/usr/local/lib /usr/local/lib/fst/libfstfar.so
Evidently there is a bug in the configuration of the distribution that was not causing problems before, but is now. I will look into that, but in the meantime, please try linking manually as above.
So far I find thrax a very neat piece of software but I have two questions...
Can I somehow use probability semiring as weights, because it seems Thrax only allows specifying log and tropical semirings? How about the other ones... Or should I somehow postprocess the generated far file?
Another question: I tried to use "fstdraw" on a far file, but got: ERROR: FstHeader::Read: Bad FST header: example.far
Is this a version mismatch?
Sorry, I missed the earlier comment -- for some reason I didn't get email about it.
Unfortunately the restriction to Log and Tropical is due to a similar restriction in the fst library: the real semiring does not come predefined. The best suggestion would be to use Tropical and then just do the obvious e^-cost conversion.
Access control: