NegativeArraySizeException ANTLRv4 [英] NegativeArraySizeException ANTLRv4

查看:18
本文介绍了NegativeArraySizeException ANTLRv4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 10gb 的文件,我需要用 Java 解析它,而当我尝试这样做时会出现以下错误.

I have a 10gb file and I need to parse it in Java, whereas the following error arises when I attempt to do this.

java.lang.NegativeArraySizeException
        at java.util.Arrays.copyOf(Arrays.java:2894)
        at org.antlr.v4.runtime.ANTLRInputStream.load(ANTLRInputStream.java:123)
        at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:86)
        at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:82)
        at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:90)

我该如何正确解决这个问题?我该如何调整这样的输入流来处理这个错误?

How can I solve this problem properly? How can I adjust such an input stream to handle this error?

推荐答案

看起来 ANTLR v4 有一个普遍的硬接线限制,即输入流大小小于 2^31 个字符.消除这个限制不是一件小事.

It looks like ANTLR v4 has a pervasive hard-wired limitation that input stream size is less that 2^31 characters. Removing this limitation would not be a small task.

查看 ANTLRInputStream 类的源代码 - 此处.

Take a look at the source code for the ANTLRInputStream class - here.

如您所见,它尝试将整个流内容保存在单个 char[] 中.那是行不通的……对于巨大的输入文件.但是简单地通过在更大的数据结构中缓冲数据来解决这个问题也不是答案.如果您进一步查看文件,还有许多其他方法使用 int 作为索引流的类型.需要将它们更改为使用 long ... 并且更改会波及.

As you can see, it attempts to hold the entire stream contents in a single char[]. That ain't going to work ... for huge input files. But simply fixing that by buffering the data in a larger data structure isn't going to be the answer either. If you look further down the file, there are a number of other methods that use int as the type for indexing the stream. They would need to be changed to use long ... and the changes will ripple out.

我该如何正确解决这个问题?我该如何调整这样的输入流来处理这个错误?

How can I solve this problem properly? How can I adjust such an input stream to handle this error?

我想到了两种方法:

  • 创建您自己的支持大型输入文件的 ANTLR 版本.这是一个不平凡的项目.我希望 32 位假设会影响到 ANTLR 生成的代码等.

  • Create your own version of ANTLR that supports large input files. This is a non-trivial project. I expect that the 32 bit assumption reaches into the code that ANTLR generates, etc.

在尝试解析输入文件之前,将它们拆分为较小的文件.这是否可行取决于输入语法.

Split your input files into smaller files before you attempt to parse them. Whether this is viable depends on the input syntax.

我的建议是第二种选择.支持"巨大的输入文件(通过内存缓冲)的问题在于它效率低下且内存浪费......而且最终无法扩展.

My recommendation would be the 2nd alternative. The problem with "supporting" huge input files (by in-memory buffering) is that it is going to be inefficient and memory wasteful ... and it ultimately doesn't scale.

您也可以在此处创建问题,或在antlr-discussion.

You could also create an issue here, or ask on antlr-discussion.

这篇关于NegativeArraySizeException ANTLRv4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆