在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常 [英] What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster
问题描述
我正在使用 Storm 0.8.1 从 Amazon SQS 队列读取传入消息,并且在执行此操作时得到一致的异常:
I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so:
2013-12-02 02:21:38 executor [ERROR]
java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)
at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)
at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)
at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)
at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)
at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)
... 5 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)
at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)
at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)
at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)
... 10 more
我已经调试了队列中的数据,一切看起来都不错.我不明白为什么 API 的 XML 响应会导致这些问题.有什么想法吗?
I've debugged the data on the queue and everything looks good. I can't figure out why the API's XML response would be causing these problems. Any ideas?
推荐答案
在这里回答我自己的问题.
Answering my own question here for the ages.
目前在 Oracle 和 OpenJDK 的 Java 中存在 XML 扩展限制处理错误,导致共享计数器在解析多个 XML 文档时达到默认上限.
There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.
- https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
- https://bugs.openjdk.java.net/browse/JDK-8028111
- https://github.com/aws/aws-sdk-java/问题/123
虽然我认为我们的版本 (6b27-1.12.6-1ubuntu0.12.04.4) 没有受到影响,但运行 OpenJDK 错误报告中给出的示例代码确实验证了我们容易受到该错误的影响.
Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.
为了解决这个问题,我需要将 jdk.xml.entityExpansionLimit=0
传递给 Storm 工作人员.通过将以下内容添加到整个集群的 storm.yaml
,我能够缓解这个问题.
To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0
to the Storm workers. By adding the following to storm.yaml
across my cluster, I was able to mitigate this problem.
supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"
我应该注意到,从技术上讲,这会让您面临拒绝服务攻击,但由于我们的 XML 文档仅来自 SQS,我不担心有人伪造恶意 XML 来杀死我们的工人.
I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.
这篇关于在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!