edu.stanford.nlp.parser.lexparser
Class FrenchUnknownWordModel
java.lang.Object
edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.FrenchUnknownWordModel
- All Implemented Interfaces:
- UnknownWordModel, java.io.Serializable
public class FrenchUnknownWordModel
- extends BaseUnknownWordModel
- See Also:
- Serialized Form
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
NULL_ITW, nullTag, nullWord, tagHash, tagIndex, trainOptions, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE, wordIndex |
Method Summary |
java.lang.String |
getSignature(java.lang.String word,
int loc)
TODO Can add various signatures, setting the signature via Options. |
int |
getSignatureIndex(int index,
int sentencePosition,
java.lang.String word)
Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features. |
float |
score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
java.lang.String word)
Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
smartMutation
protected boolean smartMutation
unknownSuffixSize
protected int unknownSuffixSize
unknownPrefixSize
protected int unknownPrefixSize
FrenchUnknownWordModel
public FrenchUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex,
ClassicCounter<IntTaggedWord> unSeenCounter)
FrenchUnknownWordModel
public FrenchUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
- This constructor creates an UWM with empty data structures. Only
use if loading in the data separately, such as by reading in text
lines containing the data.
score
public float score(IntTaggedWord iTW,
int loc,
double c_Tseen,
double total,
double smooth,
java.lang.String word)
- Description copied from class:
BaseUnknownWordModel
- Currently we don't consider loc or the other parameters in determining
score in the default implementation; only English uses them.
- Specified by:
score
in interface UnknownWordModel
- Overrides:
score
in class BaseUnknownWordModel
- Parameters:
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial. Now,
a negative value c_Tseen
- Total count of this tag (on seen words) in trainingtotal
- Total count of word tokens in trainingsmooth
- Weighting on prior P(T|U) in estimateword
- The word itself; useful so we don't look it up in the index
- Returns:
- A double valued score, usually - log P(word|tag)
getSignatureIndex
public int getSignatureIndex(int index,
int sentencePosition,
java.lang.String word)
- Returns the index of the signature of the word numbered wordIndex, where
the signature is the String representation of unknown word features.
- Specified by:
getSignatureIndex
in interface UnknownWordModel
- Overrides:
getSignatureIndex
in class BaseUnknownWordModel
getSignature
public java.lang.String getSignature(java.lang.String word,
int loc)
- TODO Can add various signatures, setting the signature via Options.
- Specified by:
getSignature
in interface UnknownWordModel
- Overrides:
getSignature
in class BaseUnknownWordModel
- Parameters:
word
- The word to make a signature forloc
- Its position in the sentence (mainly so sentence-initial
capitalized words can be treated differently)
- Returns:
- A String that is its signature (equivalence class)
Stanford NLP Group