setBuildParseTree = false 的引用有什么好的例子吗? [英] Are there any good examples to references where setBuildParseTree = false?
问题描述
我将 antlr 用于简单的 CSV 解析器.我想在 29gig 文件上使用它,但它在 ANTLRInputStream 调用中内存不足:
I'm using an antlr for a simple CSV parser. I'd like to use it on a 29gig file, but it runs out of memory on the ANTLRInputStream call:
CharStream cs = new ANTLRInputStream(new BufferedInputStream(input,8192));
CSVLexer lexer = new CSVLexer(cs);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
ParseTree tree = parser.file();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(myListener, tree);
我尝试将其更改为无缓冲流
I tried to change it to be an unbuffered stream
CharStream cs= new UnbufferedCharStream(input)
CSVLexer lexer = new CSVLexer(cs);
lexer.setTokenFactory(new CommonTokenFactory(true));
TokenStream tokens = new UnbufferedTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
当我运行 walker.walk() 函数时,它不处理任何记录.如果我尝试类似
When I run the walker.walk() function it does not process any records. If I try something like
parser.setBuildParseTree(false);
parser.addParseListener(myListener);
它也失败了.如果我不构建解析树,似乎我必须以不同的方式解析文件,所以我想要有关如何执行此操作的文档或示例.
It also fails. It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.
如果我不使用无缓冲的字符流,但我确实使用了无缓冲的令牌流,则会出现错误:无缓冲的流无法知道其大小.我尝试了不同的排列,但通常会出现 Java 堆错误或超出 GC 开销限制".
If I don't use unbuffered char stream but I do use unbuffered token stream it gives error: Unbuffered stream cannot know its size. I tried different permutations but usually there is a java heap error or a "GC overhead limit exceeded".
我正在使用这个 csv 语法
推荐答案
我已经在这里回答了一个类似的问题:https://stackoverflow.com/a/26120662/4094678
I already answered a similar question here: https://stackoverflow.com/a/26120662/4094678
如果我不构建一个文件,我似乎必须以不同的方式解析文件解析树,所以我想要文档或如何做的例子这个.
It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.
在 antlr 书中寻找 语法动作 - 就像链接答案中所说的那样,忘记侦听器和访问者并构建解析树.即使这还不够,也可以将文件拆分为多个较小的文件,然后对每个文件进行解析.
当然正如评论中提到的增加java vm内存.
Look for grammar actions in antlr book - like said in the linked answer, forget listener and visitor and building a parse tree. Even if this is not enough, split the file in a number of smaller ones and then parse each of them.
And of course as mentioned in the comments increase java vm memory.
这篇关于setBuildParseTree = false 的引用有什么好的例子吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!