TWiki> GRM Web>NGramLibrary>NGramSuggests (revision 10)EditAttach

OpenGrm NGram Library Suggestions

  1. ngramapply takes --verbose, but -v=n (e.g. =v=1) is already provided (BER: fixed to use --v)
  2. ngramapply appears to be more of a debugging tool than than applicator given its human-readable type output. Perhaps a name that suggests or at least a comment in the quick tour that this is the case and that 'fstintersect' (possiby with a phi matcher) is the heavy-duty way to do this.
  3. Improve memory usage of ngramread - uses 14 gb for 1.5gb input
  4. allow unk words to contribute to perplexity measure
  5. random test suite
  6. parameter for unknown word prob
  7. normalize based on prefix count, no additional smoothing
  8. counting from cyclic
  9. giving counts to strings for counting

-- MichaelRiley - 05 Nov 2010

Edit | Attach | Watch | Print version | History: r18 | r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2011-11-15 - BrianRoark
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback