In Tech Tips: June 23, 1998, an example of string tokenization was presented, using the class java.util.StringTokenizer.
There´s also another way to do tokenization, using java.io.StreamTokenizer. StreamTokenizer operates on input streams rather than strings, and each byte in the input stream is regarded as a character in the range ´\u0000´ through ´\u00FF´.
StreamTokenizer is lower level than StringTokenizer, but offers more control over the tokenization process. The class uses an internal table to control how tokens are parsed, and this syntax table can be modified to change the parsing rules. Here´s an example of how StreamTokenizer works:
import java.io.*;
import java.util.*;
public class streamtoken {
public static void main(String args[])
{
if (args.length == 0) {
System.err.println("missing input filename");
System.exit(1);
}
Hashtable wordlist = new Hashtable();
try {
FileReader fr = new FileReader(args[0]);
BufferedReader br = new BufferedReader(fr);
StreamTokenizer st = new StreamTokenizer(br);
//StreamTokenizer st =
// new StreamTokenizer(new StringReader(
// "this is a test"));
st.resetSyntax();
st.wordChars(´A´, ´Z´);
st.wordChars(´a´, ´z´);
int type;
