TWiki> Forum Web>GrmThraxForum (revision 43)EditAttach

OpenGrm Thrax Forum

You need to be a registered user to participate in the discussions.
Log In or Register

You can start a new discussion here:

Help You can use the formatting commands describes in TextFormattingRules in your comment.
Tip, idea If you want to post some code, surround it with <verbatim> and </verbatim> tags.
Warning, important Auto-linking of WikiWords is now disabled in comments, so you can type VectorFst and it won't result in a broken link.
Warning, important You now need to use <br> to force new lines in your comment (unless inside verbatim tags). However, a blank line will automatically create a new paragraph.
Subject
Comment
Log In

regex lookahead in Thrax

BernardR - 2015-06-29 - 10:39

Is it possible to do lookahead in the Thrax grm files? For example, require at least one digit, one lowercase, and one uppercase as in regex below:

( (?=.*\d) (?=.*[a-z]) (?=.*[A-Z]) .{6,20} )

Thanks

RichardSproat - 2015-07-06 - 18:01

I'm not sure what you are trying to do, but you may just want to use a CDRewrite rule, which allows you to change one regexp to another in the context of two other regexps that are not considered part of the first two regexps.

BernardR - 2015-07-08 - 15:26

So there is no simple way to use regex lookahead? So Thrax does not support this? Would like to create FSA to detect the pattern described. Thanks.
Log In

User defined symbol tables on PDTs

SofiaK - 2014-12-18 - 09:41

Hi all,

I am new to Thrax and OpenFst and I would appreciate it a lot if you could help me with the following issue. I need to use my own symbol table with a PDT or to be able to extract the symbol table in a non-binary format. So far I was not able to do so as the fst extracted from my far has an empty symbol table.

Let me show you how I worked:

1. I created my grammar that will cover digits one to nine and I got the symbol table I use let's say with another fst.

numbers_en_US.grm

# Numbers simple grammar for en-US. # Covers numbers 0 to 9

my_symbol_table=SymbolTable['numbers.txt'];

export PARENS = ("[<s>]" : "[</s>]");

space = " " ;

units = Optimize [ ("zero".my_symbol_table) | ("one".my_symbol_table) | ("two".my_symbol_table) | ("three".my_symbol_table) | ("four".my_symbol_table) | ("five".my_symbol_table) | ("six".my_symbol_table) | ("seven".my_symbol_table) | ("eight".my_symbol_table) | ("nine".my_symbol_table) ];

export NUMBERS = ("[<s>]" (units space)* units "[</s>]")* ;

numbers.txt

eight 0

extra1 1

extra2 2

<eps> 3

five 4

four 5

nine 6

one 7

</s> 8

<s> 9

seven 10

six 11

three 12

two 13

zero 14

2. Then I compiled my grammar, extracted the fst from the far and checked the fst info:

$ fstinfo NUMBERS

fst type vector

arc type standard

input symbol table none

output symbol table none

# of states 12

# of arcs 32

initial state 11

...

3. So as the symbol table is empty, when I test, it is impossible to get rewrites:

$ thraxrewrite-tester --far=numbers_en_US.far --rules=NUMBERS\$PARENS --output_mode=numbers.txt

Input string: one

Rewrite failed.

$ thraxrewrite-tester --far=numbers_en_US.far --rules=NUMBERS\$PARENS

Input string: one

Rewrite failed.

So, any ideas on how to use my symbol table? Or even how to get the internal symbol table in a non-binary format?

Thanks, Sofia

RichardSproat - 2014-12-19 - 10:39

RichardSproat - 2014-12-19 - 10:45

The symbols generated for the PARENS will be in the FST named *StringFstSymbolTable, which you will see if you do a farextract on the far.

But it looks as if you are assuming two symbol tables here, one being your own, the other being the one that will be generated for those extended labels. I think what you want to do is something like this:

export PARENS = ("<s>".my_symbol_table : "</s>".my_symbol_table);

Then you need to run the compiler with the --save_symbols flag. Finally you will need to use the --input_mode and probably the --output_mode flags to thraxrewrite-tester with the argument being your symbol table.

If that still doesn't work, can you send me (rws@google.com) the complete set of files needed to build your target, and I will have a look.

--R

SofiaK - 2014-12-24 - 05:12

Hi Richard, I followed your advice but the .far I get with my symbol table is completely different from the one without it. Which is expected but "initial state 0" worries me for example. I will send you my set of files to get an idea.
Log In

compile error on openSuse 13.1

RogerB - 2014-11-19 - 14:38

Hi, I downloaded openfst 1.4.1 and opengrm-ngram 1.2.1 but the latter won't compile on openSuse 13.1.

./configure says "configure: error: fst/extensions/far/far.h header not found"

however i find this file at /home/roger/sphinx/openfst-1.4.1/src/include/fst/extensions/far/far.h

compile&installation of openfst was successfull (as far i can tell yet)

do I need to add this path/header file somewhere?

Thanks Roger

RogerB - 2014-11-19 - 16:19

oh, i found out openfst must be 'built' with ./configure --enable-far=true

RichardSproat - 2014-11-20 - 09:01

Right, glad you found it.

Log In

Russian phonetic transcription rules

AlexisWilpert - 2014-08-22 - 09:40

Hi all, nice to meet you!

Let me introduce myself, as I am new here. My name is Alexis and I am a computational linguist and software developer. I was very excited with the discovery of the Thrax framework and after a short investigation I decided this was my thing smile I immediately started digging into it, but unfortunately I was not able to find "real-world" examples of usage, which would have simplified my task.

However, I just kept going on. I have been working for Yandex and developing a rule-based system for generating Russian phonetic transcriptions (in the context of speech synthesis). My company has been very generous and allowed me to open source the rules I wrote.

Probably I do not even use half of the power of Thrax, but I managed to write a working rule-based system just sticking to the basics smile I thought this could be useful for someone else (as it would have been for myself at the beginning). That is why I thought I should post here about them. Please, take in account that this was my first try with Thrax and that I probably could have written the rules in a much better way, if I had more knowledge.

In case someone is interested, you will find them here: https://github.com/wilpert/RusPhonetizer/tree/master/grammars

Thrax was a wonderfully powerful and easy to use framework for my work, something I did not experience before. I am utterly thankful to the authors for their amazing achievement. And to Yandex for allowing me to share my work.

Thanks to you all and be happy smile

Alexis

RichardSproat - 2014-11-17 - 09:11

RichardSproat - 2014-11-17 - 09:15

Hi Alexis:

Glad it has proved useful to you. Yeah there are various toy examples around, but not much "real world" examples that I know of that are public, at least not yet.

I'll be happy to take a look sometime at your grammars and send along suggestions if I have any.

Richard Sproat

AlexisWilpert - 2014-11-29 - 12:57

Hi Richard,

yes, it would be great if you would find any time to have a look at my grammars, any feedback would be terribly appreciated!

Thanks again for the software,

Alexis

Log In

Error compiling on Ubuntu VM

EstherJudd - 2014-06-19 - 13:02

I am trying to compile Thrax in a Ubuntu VM using VirtualBox. I have gcc 4.8.2 installed and compiled openfst with far and pet enabled and in shared mode. I have 1Gb of RAM dedicated to the VM. If I try ./configure --enable-shared, it fails because I run out of memory. If I try just ./configure and then make, everything seems to compile ok until I get an internal compilation error:

/bin/bash ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT loader.lo -MD -MP -MF .deps/loader.Tpo -c -o loader.lo `test -f 'walker/loader.cc' || echo './'`walker/loader.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT loader.lo -MD -MP -MF .deps/loader.Tpo -c walker/loader.cc -fPIC -DPIC -o .libs/loader.o g++: internal compiler error: Killed (program cc1plus)

RichardSproat - 2014-06-20 - 09:12

Try commenting out the lines that refer to Log64Arc in src/include/thrax/function.h, viz

function.h:70:extern Registry<Function<fst::Log64Arc>* > kLog64ArcRegistry; function.h:87: typedef name<fst::LogArc> Log64Arc ## name; function.h:88: REGISTER_LOGARC_FUNCTION(Log64Arc ## name)

(Obviously be careful in that #define REGISTER_GRM_FUNCTION to leave the continuation "\"s all happy.

The downside is you won't get log64 arcs. The upside is it should be smaller. The fact that it's running out of memory in compiling the loader makes me suspect that may be the problem because for each of the different arc types, all of the templated classes have to be expanded. This should reduce the size, therefore. If that still doesn't work, remove log arcs too. You won't likely be using them. Indeed, for precisely these sorts of issues I have been thinking of disabling those in future versions.

EstherJudd - 2014-06-20 - 12:22

I did that and also had to comment out similar lines in src/lib/walker/evaluator-specialization.cc (lines 35 and 49-53).

I also tried taking out LogArc and all it's mentions in function.h and evaluator-specialization.cc. But I still get an internal compilation error.

LemOmogbai - 2014-11-16 - 11:42

Did you ever get this to work? I have the same problem compiling Thrax.
Log In

utils/utils.cc 'close' not declared?

StevenBedrick - 2014-02-17 - 18:01

Hello, Richard et al.-

While compiling Thrax 1.1 (against OpenFST 1.3.4 on an Ubuntu 13.10 system), I'm getting the following compilation error:

<pre> ... /bin/bash ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c -o utils.lo `test -f 'util/utils.cc' || echo './'`util/utils.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c util/utils.cc -fPIC -DPIC -o .libs/utils.o util/utils.cc: In function 'bool thrax::Readable(const string&)': util/utils.cc:139:13: error: 'close' was not declared in this scope close(fdes); ^ make[3]: * [utils.lo] Error 1 make[3]: Leaving directory `/home/steven/thrax-1.1.0/src/lib' make[2]: * [all-recursive] Error 1 make[2]: Leaving directory `/home/steven/thrax-1.1.0/src' make[1]: * [all-recursive] Error 1 make[1]: Leaving directory `/home/steven/thrax-1.1.0' make: * [all] Error 2

</pre>

Any ideas what might be going on here?

StevenBedrick - 2014-02-17 - 18:02

OK, having wiki formatting trouble. Trying the code snippet again:

<verbatim> /bin/bash ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c -o utils.lo `test -f 'util/utils.cc' || echo './'`util/utils.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c util/utils.cc -fPIC -DPIC -o .libs/utils.o util/utils.cc: In function 'bool thrax::Readable(const string&)': util/utils.cc:139:13: error: 'close' was not declared in this scope close(fdes); ^ make[3]: * [utils.lo] Error 1 make[3]: Leaving directory `/home/steven/thrax-1.1.0/src/lib' make[2]: * [all-recursive] Error 1 make[2]: Leaving directory `/home/steven/thrax-1.1.0/src' make[1]: * [all-recursive] Error 1 make[1]: Leaving directory `/home/steven/thrax-1.1.0' make: * [all] Error 2 </verbatim>

StevenBedrick - 2014-02-17 - 18:03

Third time's the charm? <!-- <pre> --> /bin/bash ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c -o utils.lo `test -f 'util/utils.cc' || echo './'`util/utils.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c util/utils.cc -fPIC -DPIC -o .libs/utils.o util/utils.cc: In function 'bool thrax::Readable(const string&)': util/utils.cc:139:13: error: 'close' was not declared in this scope close(fdes); ^ make[3]: * [utils.lo] Error 1 make[3]: Leaving directory `/home/steven/thrax-1.1.0/src/lib' make[2]: * [all-recursive] Error 1 make[2]: Leaving directory `/home/steven/thrax-1.1.0/src' make[1]: * [all-recursive] Error 1 make[1]: Leaving directory `/home/steven/thrax-1.1.0' make: * [all] Error 2 <!-- </pre> -->

StevenBedrick - 2014-02-17 - 18:04

OK, this is ridiculous. Click here to see a Gist:

https://gist.github.com/stevenbedrick/809dbe2c921d745fbcc6

RichardSproat - 2014-02-18 - 09:07

I don't know. I will have to investigate.

RichardSproat - 2014-02-18 - 09:31

Does explicitly including unistd.h help?

StevenBedrick - 2014-02-23 - 23:01

Yup, adding that #include to util/utils.cc does the trick.

RichardSproat - 2014-02-24 - 09:01

RichardSproat - 2014-02-24 - 09:09

Ok thanks.

So the question is why you aren't getting that by inheritance. This is the first time I've seen this problem and I have no idea where it has suddenly broken.

Log In

compilation fails

KyleGorman - 05 Nov 2013 - 14:53

Hi Richard (etc.), using Thrax 1.1.0 (and with OpenFst 1.3.4 already installed), compilation fails while making the file `ast/identifier-node.cc` due to an issue in the `include/thrax/compat/utils.h` header. Here's the error:

/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT identifier-node.lo -MD -MP -MF .deps/identifier-node.Tpo -c -o identifier-node.lo `test -f 'ast/identifier-node.cc' || echo './'`ast/identifier-node.cc libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT identifier-node.lo -MD -MP -MF .deps/identifier-node.Tpo -c ast/identifier-node.cc -fno-common -DPIC -o .libs/identifier-node.o In file included from ast/identifier-node.cc:22: ./../include/thrax/compat/utils.h:119:8: error: field has incomplete type 'char []' char buf[]; ^

I presume this is because buf[] doesn't have a length defined (nor is it initialized with a string), and when I change the line to

char buf[1024];

compilation goes through. (I'm not sure this is a sensible default; I spent no time trying to understand what this code is doing.)

I'd include a patch but it's one line.

Kyle

RichardSproat - 05 Nov 2013 - 16:38

Just remove that line: that variable is not used. Apparently it's a holdover from some earlier implementation, and I just forgot to update it. I'll fix this in the next release.
Log In

TEST

RichardSproat - 13 Sep 2013 - 12:16

This is a test. Please ignore.

Log In

Recommended way to obtain FST+symbols for use

JosefNovak - 10 Jun 2013 - 09:46

Hi,

I am currently using thrax to extend my some features of an alignment tool I wrote for my g2p system.

The basic idea is that the user can specify some alignment correspondence rules and optional default penalties, and then these can be incorporated into the EM training process.

At present I have kind of hacked the functionality of the thraxcompiler command tool to read in the grammar, and then return the desired FST+symbol table to the alignment program.

EDIT: Maybe it makes more sense to just provide a couple of snippets:

GetFstFromGrammar

template <typename Arc>
VectorFst<Arc> GetFstFromGrammar(const string& input_grammar, const string& rules_name) {
  GrmCompilerSpec<Arc> grammar;
  VectorFst<StdArc> rules;
  if ( grammar.ParseFile(input_grammar) && grammar.EvaluateAst() ) {
    const GrmManagerSpec<Arc>* manager = grammar.GetGrmManager();
    FstMap fsts = manager->GetFstMap();
    for( typename FstMap::const_iterator it=fsts.begin();
         it != fsts.end(); ++it ){
      cout << "Echo: " << it->first << endl;
    }
    rules = *fsts[rules_name];
    return rules;
  }

  return rules;
}

toy.grm

sy = SymbolTable['simple.syms'];

zero  = "0".sy : "zero".sy;
units = ( "these're".sy : ( "these're".sy | "[these]" | "[these]" "are".sy ) );
split = ( "[these]" "are".sy : "these're".sy );
sigma = "<sigma>".sy : "<sigma>".sy;
abc   = ( "a".sy "b c".sy : "a b b".sy );
export RULES = Optimize[ sigma* ( units | zero | abc ) sigma* ];

Here the 'sigma' is used in combination with a specialized 1-state alignment transducer that relies on RHO and SIGMA matchers.

Is there an alternative or recommended way to do this? It would be great if I could either specify the symbol table just once at the beginning, or automatically infer/generate the whole symbol table and return it - or even better modify the grammar from my C++ application to simply what the user is responsible for doing.

I went through the FAQ but did not notice any answers to these questions.

Thanks for your time.

UPDATE: I solved this by creating some bindings with pybindgen and then writing a generator that interprets a simplified version of the Thrax grammar, then expands it to the versbose version with the extra quotes and symfile suffixes, etc.

Log In

Need some help, New to "Thrax"

GoudjilKamel - 03 Jan 2013 - 17:29

compiling under unbuntu LTS 12.04 : got the msg below at linking libtool: link: g++ -g -O2 -o .libs/thraxcompiler compiler.o -L/usr/local/lib/fst -lm -ldl -lfst /usr/local/lib/fst/libfstfar.so ../lib/.libs/libthrax.so -Wl,-rpath -Wl,/usr/local/lib/fst -Wl,-rpath -Wl,/usr/local/lib ../lib/.libs/libthrax.so: undefined reference to `fst::IsSTList(std::basic_string<char, std::char_traits, std::allocator > const&)' ../lib/.libs/libthrax.so: undefined reference to `fst::IsSTTable(std::basic_string<char, std::char_traits, std::allocator > const&)' collect2: ld returned 1 exit status

RichardSproat - 29 Aug 2013 - 11:47

Did you compile the fst library with the far extension?

DanXu - 08 Jan 2014 - 02:55

I also have encountered the same problem with v1.1.0(compile export/batch_test), and compiled thrax with far enable.

RichardSproat - 08 Jan 2014 - 09:06

Yes, but did you also compile the fst library with far enabled?

DanXu - 09 Jan 2014 - 09:53

yes (openfst 1.3.4 compiled with --enable-far and some other enable options ), thrax compiled successfully,but compilation fails while making the file `batch_test.c` (extracted form export.tgz), can you me some advice

RichardSproat - 10 Jan 2014 - 09:11

I'd like to but first I need to understand what is going on. I can't reproduce your error (apparently) and I don't know what batch_test.c is since it's not part of the Thrax distribution. Is this your own code? If so then I need to see EXACTLY what you are doing, including probably your sending me a directory with all of the additional code.

If this is part of the Thrax distribution then please tell me where it is because I can't find it (nor do I remember such a file).

DanXu - 11 Jan 2014 - 09:18

thank you for your reply.

in this page:

http://openfst.cs.nyu.edu/twiki/bin/view/Contrib/ThraxContrib,

you can see

Projects using the OpenGrm Thrax tools: export.tgz: Grammars and software developed as part of a text normalization class taught at the Center for Spoken Language Understanding, Fall 2011. URL for the course: http://www.cslu.ogi.edu/~sproatr/Courses/TextNorm/

i download "export.tgz" . there is a file called batch_tester.cc in batch_tester directory(extract from export.tgz)。

RichardSproat - 12 Jan 2014 - 09:08

Ok that helps. Yes, I did write that, but it wasn't obvious from your query that this is what you were referring to. Please in future give all necessary information when reporting a bug.

In the meantime I will have a look. I do not know off the top of my head what the problem is.

RichardSproat - 12 Jan 2014 - 13:40

Ok it's the usual nonsense about ordering of shared object libraries. If you do things in this order it should work:

g++ -g -O2 -o batch_tester batch_tester.o -L/usr/local/lib/fst -lm -ldl -lfst -lthrax -Wl,--rpath -Wl,/usr/local/lib/fst -Wl,--rpath -Wl,/usr/local/lib/fst -Wl,--rpath -Wl,/usr/local/lib /usr/local/lib/fst/libfstfar.so

Evidently there is a bug in the configuration of the distribution that was not causing problems before, but is now. I will look into that, but in the meantime, please try linking manually as above.

DanXu - 14 Jan 2014 - 03:47

it's ok using above command you wrote,thanks!

RichardSproat - 14 Jan 2014 - 09:01

Ok good, I'll update the tar file. Not sure why it worked before and not now, but I won't think about that.
Log In

Weight semiring

LauriLyly - 21 Nov 2012 - 00:34

So far I find thrax a very neat piece of software but I have two questions...

Can I somehow use probability semiring as weights, because it seems Thrax only allows specifying log and tropical semirings? How about the other ones... Or should I somehow postprocess the generated far file?

Another question: I tried to use "fstdraw" on a far file, but got: ERROR: FstHeader::Read: Bad FST header: example.far

Is this a version mismatch?

LauriLyly - 29 Nov 2012 - 07:34

Sorry, obviously my bad as it's a far and not an fst file stick out tongue Still not too familiar. But the weight question still applies wink

RichardSproat - 29 Nov 2012 - 10:07

Sorry, I missed the earlier comment -- for some reason I didn't get email about it.

Unfortunately the restriction to Log and Tropical is due to a similar restriction in the fst library: the real semiring does not come predefined. The best suggestion would be to use Tropical and then just do the obvious e^-cost conversion.

Log In

Access control:

-- CyrilAllauzen - 13 Aug 2012

Edit | Attach | Watch | Print version | History: r162 | r45 < r44 < r43 < r42 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r43 - 2015-07-08 - BernardR
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback