推荐给好友 上一篇 | 下一篇

Stream Tokenizing(分解字符串)


声明
  • 声明: 1、任何网站转载本站点内容时需注明来自JAVA-CN.COM,否则我们有权将根据《互联网著作权行政保护办法》追究其相应法律责任; 2、JAVA中文站社区刊登此文只为传递信息,并不表示赞同或者反对.
In Tech Tips: June 23, 1998, an example of string tokenization was presented, using the class java.util.StringTokenizer.
"w_)]0o0g)[
p3`tA2waZzhThere´s also another way to do tokenization, using java.io.StreamTokenizer. StreamTokenizer operates on input streams rather than strings, and each byte in the input stream is regarded as a character in the range ´\u0000´ through ´\u00FF´. JAVA中文站社区门户8kn0J5R-`0n7d]}
JAVA中文站社区门户]_fs"g
StreamTokenizer is lower level than StringTokenizer, but offers more control over the tokenization process. The class uses an internal table to control how tokens are parsed, and this syntax table can be modified to change the parsing rules. Here´s an example of how StreamTokenizer works:
aNr$]dP'dJAVA中文站社区门户A{ve b8M-Z[q
JAVA中文站社区门户f#t+ov1q
import java.io.*;
!AB jTP+?.k5Gimport java.util.*; JAVA中文站社区门户Xa8C;q8@*N-l
JAVA中文站社区门户$W}kZ O%o P a*d
public class streamtoken {
|P'Q+dG_3N @'ywpublic static void main(String args[]) JAVA中文站社区门户 k1f!Ma(A(_'~
{ JAVA中文站社区门户(nE1[7D NXco+_
if (args.length == 0) {
0[:N}ca+a:eR6HSystem.err.println("missing input filename");
DC)vjv*{3V V+jSystem.exit(1);
-SlB1g lq;CF} JAVA中文站社区门户/F ~w*^2HQKTP
JAVA中文站社区门户 lS/_ C%@qw
Hashtable wordlist = new Hashtable(); JAVA中文站社区门户1} [4B{3b

;G{c,?c#S2J7^N:ktry { JAVA中文站社区门户@1L)?A"I*R@
FileReader fr = new FileReader(args[0]);
hGC.V\is6DEBufferedReader br = new BufferedReader(fr); JAVA中文站社区门户W#|G:yO

'HP|h&Mi1N:EuStreamTokenizer st = new StreamTokenizer(br);
}J.e1w6E&W(Vn//StreamTokenizer st =
l&r3d i^7`*q// new StreamTokenizer(new StringReader(
F]B(E_FT2R// "this is a test"));
T._NM t Im Mrst.resetSyntax();
8G3]4N'xzXst.wordChars(´A´, ´Z´); JAVA中文站社区门户5w$F3p(t+N_ {g
st.wordChars(´a´, ´z´); JAVA中文站社区门户Ul#u:p)IL$WV.Z
int type;
Zv!W MPKc?&X HObject dummy = new Object();
2U,l9M:P3ZX3{while ((type = st.nextToken()) !=
,v]] DS"xDkStreamTokenizer.TT_EOF) { JAVA中文站社区门户V:z\ F*w9EGeM
if (type == StreamTokenizer.TT_WORD)
]*Gc0jVwordlist.put(st.sval, dummy); JAVA中文站社区门户FyZH naO3QC
} JAVA中文站社区门户r&^{s4K&\
br.close();
q P~Jy/H} JAVA中文站社区门户2Cn6U3iVD~a#@U ~J,v
catch (IOException e) {
6H"~!Ol{.y zU NSystem.err.println(e); JAVA中文站社区门户k;]rn*O Tl
}
'l-Gj$BdNb.sJAVA中文站社区门户!|y)_/SGWODW
Enumeration enum = wordlist.keys(); JAVA中文站社区门户Fr`:g-RH q
while (enum.hasMoreElements()) JAVA中文站社区门户+V(I BO!~*CR z
System.out.println(enum.nextElement()); JAVA中文站社区门户+j0kE']X?9Rw
} JAVA中文站社区门户4oe+Y{$KI)N)veCq
}
wjE%d i z;LI1kO lC
;\ ? Pl+hT&dl [In this example, a StreamTokenizer is created on top of a FileReader / BufferedReader pair that represents a text file. Note that a StreamTokenizer can also be made to read from a String by using StringReader as illustrated in the commented-out code shown above (StringBufferInputStream also works, although this class has been deprecated). JAVA中文站社区门户D*}k:h`Y:]W

._#EM$ibD-d PcThe method resetSyntax is used to clear the internal syntax table, so that StreamTokenizer forgets any rules that it knows about parsing tokens. Then wordChars is used to declare that only upper and lower case letters should be considered to form words. That is, the only tokens that StreamTokenizer recognizes are sequences of upper and lower case letters.
c1W3WR"uZMd{JAVA中文站社区门户)@ q@`K
nextToken is called repeatedly to retrieve words, and each resulting word is found in the public instance variable "st.sval". The words are inserted into a Hashtable, and at the end of processing the contents of the table are displayed, using an Enumeration as illustrated in Tech Tips: June 23, 1998. So the action of this program is to find all the unique words in a text file and display them.
5XSzO W(Vt
y6be+P;F*|StreamTokenizer also has special facilities for parsing numbers, quoted strings, and comments. It´s a useful alternative to StringTokenizer, and is especially applicable if you are tokenizing input streams, or wish to exercise finer control over the tokenization process JAVA中文站社区门户2O2V(B`op%d8^
JAVA中文站社区门户{ p"uQd m)@?b
 

TAG: stream 分解 字符 Stream Tokenizing
 

评分:0

我来说两句

seccode