在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常 [英] What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster

查看:28
本文介绍了在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Storm 0.8.1 从 Amazon SQS 队列读取传入消息,并且在执行此操作时得到一致的异常:

I'm using Storm 0.8.1 to read incoming messages off an Amazon SQS queue and am getting consistent exceptions when doing so:

2013-12-02 02:21:38 executor [ERROR] 
java.lang.RuntimeException: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:219)
        at REDACTED.spouts.SqsQueueSpout.nextTuple(SqsQueueSpout.java:88)
        at backtype.storm.daemon.executor$fn__3976$fn__4017$fn__4018.invoke(executor.clj:447)
        at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Thread.java:701)
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:524)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:298)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:167)
        at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:812)
        at com.amazonaws.services.sqs.AmazonSQSClient.receiveMessage(AmazonSQSClient.java:575)
        at REDACTED.spouts.SqsQueueSpout.handleNextTuple(SqsQueueSpout.java:191)
        ... 5 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(XMLStreamReaderImpl.java:219)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.<init>(XMLStreamReaderImpl.java:189)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(XMLInputFactoryImpl.java:277)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(XMLInputFactoryImpl.java:129)
        at com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLEventReader(XMLInputFactoryImpl.java:78)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:85)
        at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:41)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:503)
        ... 10 more

我已经调试了队列中的数据,一切看起来都不错.我不明白为什么 API 的 XML 响应会导致这些问题.有什么想法吗?

I've debugged the data on the queue and everything looks good. I can't figure out why the API's XML response would be causing these problems. Any ideas?

推荐答案

在这里回答我自己的问题.

Answering my own question here for the ages.

目前在 Oracle 和 OpenJDK 的 Java 中存在 XML 扩展限制处理错误,导致共享计数器在解析多个 XML 文档时达到默认上限.

There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.

  1. https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
  2. https://bugs.openjdk.java.net/browse/JDK-8028111
  3. https://github.com/aws/aws-sdk-java/问题/123

虽然我认为我们的版本 (6b27-1.12.6-1ubuntu0.12.04.4) 没有受到影响,但运行 OpenJDK 错误报告中给出的示例代码确实验证了我们容易受到该错误的影响.

Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.

为了解决这个问题,我需要将 jdk.xml.entityExpansionLimit=0 传递给 Storm 工作人员.通过将以下内容添加到整个集群的 storm.yaml,我能够缓解这个问题.

To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0 to the Storm workers. By adding the following to storm.yaml across my cluster, I was able to mitigate this problem.

supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"

我应该注意到,从技术上讲,这会让您面临拒绝服务攻击,但由于我们的 XML 文档仅来自 SQS,我不担心有人伪造恶意 XML 来杀死我们的工人.

I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.

这篇关于在我的 Storm 集群中读取 AWS SQS 队列时,是什么导致了这些 ParseError 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆