TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramMerge
(revision 4) (raw view)
Edit
Attach
---+ NGramMerge ---++ Description ---++ Usage |<verbatim> ngrammerge [--options] in1.fst in2.fst [out.fst] --alpha: type = double, default = 1.0, weight for in1.fst in real semiring --beta: type = double, default = 1.0, weight for in2.fst in real semiring --normalize: type = bool, default = false, whether to normalize the resulting model --use_smoothing: type = bool, default = false, whether to use model smoothing when merging --fixedorder: type = bool, default false, whether to merge in the given argument order </verbatim> | | |<verbatim> class NGramMerge(StdMutableFst *infst1, StdMutableFst *infst2, double alpha, double beta); </verbatim>| | ---++ Examples Suppose we split our corpus up into two parts, earnest.aa and earnest.ab, e.g., by using split: <verbatim> $ split -844 earnest.txt earnest. </verbatim> If we count each half independently, we can then merge the counts to get the same counts as derived above from the full corpus (earnest.cnts): <verbatim> $ farcompilestrings -symbols=earnest.syms earnest.aa >earnest.aa.far $ ngramcount -order=5 earnest.aa.far >earnest.aa.cnts $ farcompilestrings -symbols=earnest.syms earnest.ab >earnest.ab.far $ ngramcount -order=5 earnest.ab.far >earnest.ab.cnts $ ngrammerge earnest.aa.cnts earnest.ab.cnts >earnest.merged.cnts $ fstequal earnest.cnts earnest.merged.cnts </verbatim> Note that, unlike our example merging unnormalized counts above, merging two smoothed models that have been built from half a corpus each will result in a different model than one built from the corpus as a whole, due to the smoothing and mixing. Each of the two model or count FSTs can be weighted, using the _--alpha_ switch for the first input FST, and the _--beta_ switch for the second input FST. These weights are interpreted in the real semiring and both default to one, meaning that by default the original counts or probabilities are not scaled. To triple the contribution of the first model and double the contribution of the second: <verbatim> $ ngrammerge --alpha=3 --beta=2 earnest.aa.mod earnest.ab.mod >earnest.merged.mod </verbatim> ---++ Caveats ---++ References -- Main.MichaelRiley - 09 Dec 2011
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r4 - 2011-12-13
-
BrianRoark
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback