将EMR配置为使用s3a而不是s3进行spark.sql调用 [英] Configure EMR to use s3a instead of s3 for spark.sql calls

查看:388
本文介绍了将EMR配置为使用s3a而不是s3进行spark.sql调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对spark.sql(")的所有调用均因以下堆栈跟踪(1)中的错误而失败

All my calls to spark.sql("") fails with the error in the stacktrace (1) below

更新-2 我已经解决了这个问题,它是sts:AssumeRule的AccessDenied,感谢任何潜在客户

Update - 2 I have zeroed in on the problem, it is AccessDenied for sts:AssumeRule, any leads appreciated

User: arn:aws:sts::00000000000:assumed-role/EMR_EC2_XXXXX_XXXXXX_POLICY/i-3232131232131232 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::00000000000:role/EMR_XXXXXX_XXXXXX_POLICY

使用

spark.read.parquet("s3a://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

我能够读取记录.

但是,当使用 s3:而不是 s3a:方案进行访问时,相同的堆栈跟踪(1)重新浮出水面

But the same stacktrace (1) resurfaces when access with s3: instead of s3a: scheme

spark.read.parquet("s3://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

所以我该如何在EMR上配置Spark以使用 s3a:或运行 s3:而不会拒绝访问,这是假定的,因为它可能未使用适当的凭据链

So how can I configure Spark on EMR to use s3a: or have s3: running without the access denied which is presume because it may not be using the appropriate credential chain

(1)

Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Access denied (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxx-xxxx-xxxx-xxxx-xxxxxxxx)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1658)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1322)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1369)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1338)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1327)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:488)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:460)

更新-1 尝试设置机密和访问密钥不起作用

Update - 1 Tried setting secret and access key doesn't work

spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "")
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "")

推荐答案

此堆栈跟踪显示"amazon EMR S3客户端";不是Apache ASF,而是不同的设置和错误消息.

this stack trace says "amazon EMR S3 client"; not the Apache ASF one, so different settings, and error messages.

关于假定角色"的错误消息提示您正在EC2 VM中运行(是吗?),并且假定角色"实际上是EC2 VM部署时使用的IAM角色.在这种情况下(a)没有其他凭据被获取,并且(b)VM没有访问角色的权限.修复:确定设置以获取凭据,增加EC2 IAM角色权限或创建具有不同角色的VM

That error message about "assumed role" hints that you are running in an EC2 VM (yes?), and that "assumed role" is actually the IAM role the EC2 VM is deployed as. In which case (a) no other credentials are being picked up and (b) that VM doesn't have permissions to access the role. Fixes: work out the setting to get the credentials in, increase EC2 IAM role rights, or create VMs with a different role

这篇关于将EMR配置为使用s3a而不是s3进行spark.sql调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆