NGramSymbols

Description

Command line utility to produce a symbol table from an input text corpus. Creates a symbol entry for every type in the corpus, as well as for <epsilon> (index 0) and an out-of-vocabulary symbol (last in the symbol table). Command line options --epsilon_symbol and --OOV_symbol permit the specification of the labels wanted for those special symbols.

Usage

ngramsymbols [--options] [in.txt [out.txt]]
  --epsilon_symbol: type = string, default = <epsilon>
  --OOV_symbol: type = string, default = <UNK>
 

Examples

$ ngramsymbols <earnest.txt >earnest.syms

Caveats

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2022-08-06 - KyleGorman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback