Difference: NGramShrink (1 vs. 7)

Revision 72012-03-08 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Revision 62012-03-04 - BrianRoark

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Line: 53 to 53
 K. Seymore and R. Rosenfeld. "Scalable Backoff Language Models", Proc. of International Conference on Speech and Language Processing. 1996.

A. Stolcke. "Entropy-based Pruning of Backoff Language Models", Proc. of DARPA Broadcast News Transcription and Understanding Workshop. 1998.

Deleted:
<
<
-- MichaelRiley - 09 Dec 2011
 \ No newline at end of file

Revision 52011-12-16 - BrianRoark

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Line: 47 to 47
 

Caveats

Changed:
<
<
The input n-gram model must be weight-normalized (the probabilities at each state must sum to 1).
>
>
For relative entropy or Seymore shrinking, the input n-gram model must be weight-normalized (the probabilities at each state must sum to 1). For count pruning, either a normalized model or raw, unnormalized counts can be used.
 

References

K. Seymore and R. Rosenfeld. "Scalable Backoff Language Models", Proc. of International Conference on Speech and Language Processing. 1996.

Revision 42011-12-14 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Line: 39 to 39
 
Changed:
<
<
StdMutableFst *model = StdMutableFst::Read("in.mod");
>
>
StdMutableFst *model = StdMutableFst::Read("in.mod", true);
 NGramRelativeEntropy ngram(model, 1.0e-7); ngram.ShrinkModel() ngram.GetFst().Write("out.mod");

Revision 32011-12-13 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Description

Added:
>
>
This operation shrinks or prunes an n-gram language model in one of three ways:

  • count pruning: prunes based on count cutoffs for the various n-gram orders specified by count_pattern.
  • relative entropy: prunes based on a relative entropy criterion theta.
  • Seymore: prunes based on the Seymore-Rosenfeld criterion theta.

The C++ classes are all derived from the base class NGramShrink.

 

Usage

Added:
>
>
ngramshink [--opts] [in.mod [out.mod]]
  --method: type = string, one of: count_prune (default) | relative_entropy | seymore
  --count_pattern: type = string, default = ""
  --theta, type = double, default = 0.0
 
 class NGramCountPrune(StdMutableFst *model, string count_pattern);
 
 class NGramRelativeEntropy(StdMutableFst *model, double theta);
 
 class NGramSeymoreShrink(StdMutableFst *model, double theta);
 

Examples

ngramshrink --method=relative_entropy --theta=1.0e-7 in.mod >out.mod


StdMutableFst *model = StdMutableFst::Read("in.mod");
NGramRelativeEntropy ngram(model, 1.0e-7);
ngram.ShrinkModel()
ngram.GetFst().Write("out.mod");
 

Caveats

Added:
>
>
The input n-gram model must be weight-normalized (the probabilities at each state must sum to 1).
 

References

Added:
>
>
K. Seymore and R. Rosenfeld. "Scalable Backoff Language Models", Proc. of International Conference on Speech and Language Processing. 1996.

A. Stolcke. "Entropy-based Pruning of Backoff Language Models", Proc. of DARPA Broadcast News Transcription and Understanding Workshop. 1998.

  -- MichaelRiley - 09 Dec 2011 \ No newline at end of file

Revision 22011-12-10 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Line: 6 to 6
 

Usage

Deleted:
<
<

Complexity

 

Caveats

References

Revision 12011-12-09 - MichaelRiley

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="NGramQuickTour"

NGramShrink

Description

Usage

Complexity

Caveats

References

-- MichaelRiley - 09 Dec 2011

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback