TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramModelFormat
(revision 1) (raw view)
Edit
Attach
---+ NGram Model Format The following gives the encoding of all n-gram models produced by the utilities here, including those with unnormalized counts, as a cyclic weighted finite-state transducer in [[http://www.openfst.org][OpenFst]] format. An n-gram is a sequence of _k_ symbols: _w<sub>1</sub> ... w<sub>k</sub>_. Let _N_ be the set of n-grams in the model. * There is a _unigram_ state in every model, representing the empty string. * Every proper prefix of every n-gram in _N_ has an associated state in the model. * The state associated with an n-gram _w<sub>1</sub> ... w<sub>k</sub>_ has a backoff transition (labeled with _⟨epsilon⟩_) to the state associated with its suffix _w<sub>2</sub> ... w<sub>k</sub>_. * An n-gram _w<sub>1</sub> ... w<sub>k</sub>_ is represented as a transition, labeled with _w<sub>k</sub>_, from the state associated with its prefix _w<sub>1</sub> ... w<sub>k-1</sub>_ to a destination state defined as follows: * If _w<sub>1</sub> ... w<sub>k</sub>_ is a proper prefix of an n-gram in the model, then the destination of the transition is the state associated with _w<sub>1</sub> ... w<sub>k</sub>_ * Otherwise, the destination of the transition is the state associated with the suffix _w<sub>2</sub> ... w<sub>k</sub>_. * Start and end of the sequence are not represented via transitions in the automaton or symbols in the symbol table. Rather * The start state of the automaton encodes the "start of sequence" n-gram prefix (commonly denoted _⟨s⟩_). * The end of the sequence (often denoted _⟨/s⟩_) is included in the model through state final weights, i.e., for a state associated with an n-gram prefix _w<sub>1</sub> ... w<sub>k</sub>_, the final weight of that state represents the weight of the n-gram _w<sub>1</sub> ... w<sub>k</sub> ⟨/s⟩_. -- Main.MichaelRiley - 09 Dec 2011
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2011-12-09
-
MichaelRiley
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback