TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramPerplexity
(2012-03-04,
BrianRoark
)
(raw view)
E
dit
A
ttach
---+ NGramPerplexity ---++ Description Command line utility to calculate the perplexity of a corpus given a model. Verbose mode gives the per word contribution to the perplexity. Out-of-vocabulary items can be dealt with in several ways. If an existing OOV token exists in the model, and thus has probability mass, then that symbol can be specified with the switch =--OOV_symbol=. Every symbol not found in the vocabulary will be mapped to that symbol. If there is no OOV symbol with allocated probability mass in the model, the option =--OOV_probability= allows unigram probability mass to be allocated to the class of OOVs. Note that any OOV symbol represents a class of words. To appropriately assign probability to any given instance, that class probability should be shared among the set. To do this, we must specify the OOV class size, which by default is 10000. ---++ Usage |<verbatim> ngramperplexity [--options] ngram.fst [in.far [out.txt]] --OOV_symbol: type = string, default = "" --OOV_class_size: type = double, default = 10000 --OOV_probability: type = double, default = 0 </verbatim> | | ---++ Examples <verbatim> $ ngramperplexity earnest.aa.mod earnest.ab.far </verbatim> ---++ Caveats If there is no OOV_symbol specified, and the OOV_probability is zero, any encountered OOVs -- which would receive 0 probability under these parameterizations -- will be ignored in perplexity calculation.
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r5 - 2012-03-04
-
BrianRoark
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback