AWS EKS Spark 3.0、Hadoop 3.2 错误 - NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException [英] AWS EKS Spark 3.0, Hadoop 3.2 Error - NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException

查看:46
本文介绍了AWS EKS Spark 3.0、Hadoop 3.2 错误 - NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 EKS 上运行 Jupyterhub,并希望利用 EKS IRSA 功能在 K8s 上运行 Spark 工作负载.我之前有使用 Kube2IAM 的经验,但现在我打算搬到 IRSA.

I'm running Jupyterhub on EKS and wants to leverage EKS IRSA functionalities to run Spark workloads on K8s. I had prior experience of using Kube2IAM, however now I'm planning to move to IRSA.

这个错误不是由 IRSA 造成的,因为服务帐户可以完美地附加到 Driver 和 Executor pod 上,我可以通过 CLI 和 SDK 从两者访问 S3.此问题与在 Spark 3.0/Hadoop 3.2 上使用 Spark 访问 S3 相关

This error is not because of IRSA, as service accounts are getting attached perfectly fine to Driver and Executor pods and I can access S3 via CLI and SDK from both. This issue is related to accessing S3 using Spark on Spark 3.0/ Hadoop 3.2

Py4JJavaError:调用 None.org.apache.spark.api.java.JavaSparkContext 时发生错误.:java.lang.NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException

我正在使用以下版本 -

I'm using following versions -

  • APACHE_SPARK_VERSION=3.0.1
  • HADOOP_VERSION=3.2
  • aws-java-sdk-1.11.890
  • hadoop-aws-3.2.0
  • Python 3.7.3

我也用不同的版本进行了测试.

I tested with different version as well.

  • aws-java-sdk-1.11.563.jar

如果有人遇到这个问题,请帮忙给出解决方案.

Please help to give a solution if someone has come across this issue.

PS:这也不是 IAM 政策错误,因为 IAM 政策非常好.

PS: This is not an IAM Policy error as well, because IAM policies are perfectly fine.

推荐答案

最后所有问题都用下面的jar包解决了-

Finally all the issues are solved with below jars -

任何尝试使用 IRSA 在 EKS 上运行 Spark 的人,这是正确的 spark 配置 -

Anyone who's trying to run Spark on EKS using IRSA this is the correct spark config -

from pyspark.sql import SparkSession

spark = SparkSession.builder 
        .appName("pyspark-data-analysis-1") 
        .config("spark.kubernetes.driver.master","k8s://https://xxxxxx.gr7.ap-southeast-1.eks.amazonaws.com:443") 
        .config("spark.kubernetes.namespace", "jupyter") 
        .config("spark.kubernetes.container.image", "xxxxxx.dkr.ecr.ap-southeast-1.amazonaws.com/spark-ubuntu-3.0.1") 
        .config("spark.kubernetes.container.image.pullPolicy" ,"Always") 
        .config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark") 
        .config("spark.kubernetes.authenticate.executor.serviceAccountName", "spark") 
        .config("spark.kubernetes.executor.annotation.eks.amazonaws.com/role-arn","arn:aws:iam::xxxxxx:role/spark-irsa") 
        .config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.WebIdentityTokenCredentialsProvider") 
        .config("spark.kubernetes.authenticate.submission.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") 
        .config("spark.kubernetes.authenticate.submission.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token") 
        .config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false") 
        .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") 
        .config("spark.hadoop.fs.s3a.fast.upload","true") 
        .config("spark.executor.instances", "1") 
        .config("spark.executor.cores", "3") 
        .config("spark.executor.memory", "10g") 
        .getOrCreate()

这篇关于AWS EKS Spark 3.0、Hadoop 3.2 错误 - NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆