org.apache.lucene.analysis
Class CharTokenizer
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
- Direct Known Subclasses:
- LetterTokenizer, RussianLetterTokenizer, WhitespaceTokenizer
- public abstract class CharTokenizer
- extends Tokenizer
An abstract base class for simple, character-oriented tokenizers.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Method Summary |
protected abstract boolean |
isTokenChar(char c)
Returns true iff a character should be included in a token. |
Token |
next()
Returns the next token in the stream, or null at EOS. |
protected char |
normalize(char c)
Called on each token character to normalize it before it is added to the
token. |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CharTokenizer
public CharTokenizer(Reader input)
isTokenChar
protected abstract boolean isTokenChar(char c)
- Returns true iff a character should be included in a token. This
tokenizer generates as tokens adjacent sequences of characters which
satisfy this predicate. Characters for which this is false are used to
define token boundaries and are not included in tokens.
normalize
protected char normalize(char c)
- Called on each token character to normalize it before it is added to the
token. The default implementation does nothing. Subclasses may use this
to, e.g., lowercase tokens.
next
public final Token next()
throws IOException
- Returns the next token in the stream, or null at EOS.
- Specified by:
next
in class TokenStream
- Throws:
IOException
Copyright © 2000-2003 Apache Software Foundation. All Rights Reserved.