无法从EMR中运行的Spark应用程序删除AWS SQS消息 [英] Cannot delete AWS SQS message from Spark application running in EMR

查看:109
本文介绍了无法从EMR中运行的Spark应用程序删除AWS SQS消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在AWS EMR集群中运行Apache Spark应用程序.该应用程序从AWS SQS检索消息,根据消息数据进行一些计算,然后删除每条消息.

I am running an Apache Spark application in AWS EMR cluster. The application retrieves messages from the AWS SQS, does some computations based on the message data and then deletes each message.

我正在具有NAT实例的专用子网上的VPC中运行EMR群集.

I am running the EMR cluster in a VPC on a private subnet with a NAT instance.

我面临的问题是我无法删除该消息.我可以检索所有消息,也可以发送消息,但是不能删除它们.

The problem that I am facing, is that I cannot delete the message. I am able to retrieve all messages and I am able to send messages, but I cannot delete them.

我在EMR群集上使用以下安全性 EC2 instance profile:EMR_EC2_DefaultRole EMR role:EMR_DefaultRole

I am using the following security on the EMR cluster EC2 instance profile:EMR_EC2_DefaultRole EMR role:EMR_DefaultRole

每个角色都附加了以下策略: AmazonSQSFullAccessAmazonElastiCacheFullAccessAmazonElasticMapReduceFullAccessAmazonVPCFullAccess

Each of these roles has the following policies attached: AmazonSQSFullAccess, AmazonElastiCacheFullAccess, AmazonElasticMapReduceFullAccess, AmazonVPCFullAccess

我认为问题出在权限上,但是AmazonSQSFullAccess授予了完全权限,所以我没有选择权.

I thought that the problem is with the permissions, but the AmazonSQSFullAccess grants full permissions, so I am out of options.

这是删除消息的Java代码:

This is the Java code that deletes the message:

public class SQSMessageBroker
{
    private AmazonSQS _amazonSqs;

    public SQSMessageBroker()
    {
        // Create the SQS client
        createSQSClient();
    }

    public void deleteMessage(String queueUrl, String receiptHandle)
        {
            DeleteMessageRequest deleteMessageRequest = new DeleteMessageRequest(queueUrl, receiptHandle);

            _amazonSqs.deleteMessage(deleteMessageRequest);
        }

  private void createSQSClient()
    {
        _amazonSqs = new AmazonSQSClient();
        _amazonSqs.setRegion(Region.getRegion(Regions.EU_WEST_1));
    }
}

SQSMessageBroker在我的应用程序中是单例. 当我在本地运行相同的代码时,一切正常.我在本地创建了一个AWS用户,并将密钥和机密添加到了.aws文件中.

The SQSMessageBroker is a singleton in my application. When I run the same code locally everything works great. Locally I have created an AWS User and I have added the key and secret to a .aws file.

编辑

经过大量研究和测试,这是我发现的:

After a lot of research and testing this is what I have found out:

  1. 似乎不是权限问题(至少不是针对由EMR启动的EC2实例).我连接到该实例,安装了aws cli,检索了一条消息并成功将其删除.
  2. _amazonSqs.deleteMessage(deleteMessageRequest);代码不会引发任何异常.似乎请求已超时,但未引发超时异常. deleteMessage之后的任何代码都不会执行.
  3. 我在单独的线程中处理每条消息,因此我在每个线程中添加了Thread.UncaughtExceptionHandler,但是也没有抛出异常.
  4. 我怀疑问题可能出在ReceiptHandle中,更确切地说,是因为我在多台机器上运行Spark集群,所以我认为机器IP,名称或类似名称已编码在ReceiptHandle中并且deleteMessage可能是从另一台机器执行的,因此这是行不通的.这就是为什么我只用一台机器创建一个Spark集群的原因.遗憾的是,我仍然无法删除该邮件.
  1. It appears that it is not a permission issue (at least not for the EC2 instance that is started by EMR). I connected to the instance, installed the aws cli, retrieved a message and deleted it successfully.
  2. The _amazonSqs.deleteMessage(deleteMessageRequest); code does not throw any exceptions. It looks as if the request times out, but not timeout exception is thrown. Any code after the deleteMessage is not executed.
  3. I am processing each message in a separate thread, so I added a Thread.UncaughtExceptionHandler to each thread, but no exception is thrown there as well.
  4. I suspected that the problem may be in the ReceiptHandle and more precisely, because I was running a Spark cluster on several machines, so I thought that the machine IP, name or something like that were encoded in the ReceiptHandle and the deleteMessage may have been executed from a different machine, so that is why it did not work. This is why I created a Spark cluster with only one machine. Sadly I still couldn't delete the message.

推荐答案

经过大量的调试和测试,我终于设法弄清了问题所在.

After A LOT of debugging and testing I finally managed to figure out what the problem is.

不出所料,这不是权限问题.问题是,由EMR启动并在其上运行Spark应用程序的EC2实例包含用于Java的所有AWS软件包(包括SQS软件包)的某个版本.包含软件包的路径已添加到Hadoop,Yarn和Spark.因此,当我的应用程序启动时,它使用了机器上已经存在的软件包,而我收到了错误消息. (该错误记录在Yarn日志中.我花了一些时间才能弄清楚.)

As expected it was not a permission problem. The problem was, that the EC2 instances that are started by the EMR, and on which the Spark application is run, contain a certain version of all AWS packages for java (including the SQS package). And the path containing the packages were added to Hadoop, Yarn and Spark. So when my application were started, it used the packages that were already on the machine and I received an error. (The error were logged in the Yarn log. It took me some time to figure that out.)

我正在使用maven shade插件为我的应用程序构建uber jar,因此我认为我可以尝试对AWS软件包进行着色(重定位).这将使我能够将依赖关系封装在应用程序内部.不幸的是,这个DID不起作用.亚马逊似乎在软件包内部使用了反射,它们已经对某些类的名称进行了硬编码,从而使阴影变得无用.(在我的阴影软件包中找不到硬编码的类)

I am using the maven shade plugin to build the uber jar for my application, so I thought that I can try and shade (relocate) the AWS packages. This would allow me to encapsulate the dependencies inside my application. Sadly this DID NOT work. It appears that Amazon are using reflection inside the packages and they have hardcoded the names of some classes, thus rendering the shading useless.(The hardcoded classes were not found in my shaded packages)

因此,在更加沮丧之后,我找到了以下解决方案:

So after some more frustration I found the following solution:

  1. 创建一个EMR步骤,将我的uber jar从S3下载到计算机.
  2. 使用以下spark-submit选项创建一个Spark应用程序步骤:

--driver-class-path /path_to_your_jar/myapp.jar --class com.myapp.startapp

这里的关键是--driver-class-path选项.您可以在此处了解更多信息.基本上,我将uber jar添加到Spark驱动程序的类路径中,以允许应用程序使用我的依赖项.

Here the key is the --driver-class-path option. You can read more about it here. Basically I am adding my uber jar to the Spark driver classpath, allowing for the application to use my dependencies.

到目前为止,这是我发现的唯一可接受的解决方案.如果您知道另一个或更好的,请写评论或答案.

So far this is the only acceptable solution that I have found. If you know of another or a better one, please write a comment or an answer.

我希望这个答案对一些不幸的人有用.本来可以节省我几天的时间.

I hope that this answer can be of use to some unfortunate soul. It would have saved me several excruciating days.

这篇关于无法从EMR中运行的Spark应用程序删除AWS SQS消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆