The command-line utility ngraminfo prints various information about an n-gram model obtained from the NGramModel class and the underlying FST class.


 ngraminfo [in.mod]


See here for an example use of the command-line utility. At the C++ level, it corresponds to:

 cout << "# of states: " << model->GetFst()->NumStates();

 cout << "unigram state: " << model->UnigramState();

 cout << "n-gram order: " << model-<HiOrder();

 cout << "well-formed: " << model->CheckTopology();

 cout << "normalized: " << model->CheckNormalization();

and so forth.


The number of unigrams will differ by one from an ARPA format of the model, since the ARPA format includes a unigram for the start symbol <s>, which is not represented as an n-gram in our model (rather as the start state). We include it in our ARPA format output to be consistent with typical conventions. Note that n-grams that end in the final symbol (</s>) are also not represented as arcs in our representation, instead by final cost. Hence the total number of n-grams are the sum of the number of ngram arcs and the number of final states. For the precise details of the n-gram format, see here.

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2012-03-04 - BrianRoark
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback