使用sc.textFile("s3a://bucket/filePath")Spark读取s3. java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager [英] Spark read s3 using sc.textFile("s3a://bucket/filePath"). java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager

查看:246
本文介绍了使用sc.textFile("s3a://bucket/filePath")Spark读取s3. java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在spark/jars路径中添加了自爆罐子.

I have add blew jars to spark/jars path.

  • hadoop-aws-2.7.3.jar
  • aws-java-sdk-s3-1.11.126.jar
  • aws-java-sdk-core-1.11.126.jar
  • spark-2.1.0

火花壳

scala> sc.hadoopConfiguration.set("fs.s3a.access.key", "***")

scala> sc.hadoopConfiguration.set("fs.s3a.secret.key", "***")

scala> val f = sc.textFile("s3a://bucket/README.md")

scala> f.count

java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.(Lcom/amazonaws/services/s3/AmazonS3; Ljava/util/concurrent/ThreadPoolExecutor;)V 在 org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) 在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) 在org.apache.hadoop.fs.FileSystem.access $ 200(FileSystem.java:94)在 org.apache.hadoop.fs.FileSystem $ Cache.getInternal(FileSystem.java:2703) 在org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:2685)
在org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)处 org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)在 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258) 在 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) 在 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) 在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252) 在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250) 在scala.Option.getOrElse(Option.scala:121)在 org.apache.spark.rdd.RDD.partitions(RDD.scala:250)在 org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252) 在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250) 在scala.Option.getOrElse(Option.scala:121)在 org.apache.spark.rdd.RDD.partitions(RDD.scala:250)在 org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)在 org.apache.spark.rdd.RDD.count(RDD.scala:1157)...被淘汰了48

java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD.count(RDD.scala:1157) ... 48 elided

  1. "java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager"是由不匹配的jar引发的吗? (hadoop-aws,aws-java-sdk)

  1. "java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager" is raised by mismatched jar? (hadoop-aws, aws-java-sdk)

要从Spark应用程序访问存储在Amazon S3中的数据,应使用Hadoop文件AP​​I.那么hadoop-aws.jar是否包含Hadoop文件AP​​IS或必须运行hadoop env?

To access data stored in Amazon S3 from Spark applications should use Hadoop file APIs. So is hadoop-aws.jar contains the Hadoop file APIS or must run hadoop env ?

推荐答案

JAR不匹配; AWS SDK在各个版本中都很脆弱.

Mismatched JARs; the AWS SDK is pretty brittle across versions.

Hadoop S3A代码位于hadoop-aws JAR中;还需要hadoop-common. Hadoop 2.7是针对AWS S3 SDK 1.10.6构建的. (*更新:否,它是1.7.4.向1.10.6迁移到Hadoop 2.8中) HADOOP-12269

Hadoop S3A code is in hadoop-aws JAR; also needs hadoop-common. Hadoop 2.7 is built against AWS S3 SDK 1.10.6. (*updated: No, it's 1.7.4. The move to 1.10.6 went into Hadoop 2.8)HADOOP-12269

您必须使用该版本.如果要使用1.11 JAR,则需要检出hadoop源树并自己构建branch-2.好消息:它使用了阴影的AWS开发工具包,因此其版本的jackson和joda time不会造成任何问题.哦,如果您签出spark master,并使用-Phadoop-cloud配置文件进行构建,它将引入正确的内容以正确设置Spark的依赖项.

You must use that version. If you want to use The 1.11 JARs then you will need to check out the hadoop source tree and build branch-2 yourself. The good news: that uses the shaded AWS SDK so its versions of jackson and joda time don't break things. Oh, and if you check out spark master, and build with the -Phadoop-cloud profile, it pulls the right stuff in to set Spark's dependencies up right.

更新:2017年10月1日:Hadoop 2.9.0-alpha和3.0-beta-1使用1.11.199;假设发货版本是该版本或更新的版本.

Update: Oct 1 2017: Hadoop 2.9.0-alpha and 3.0-beta-1 use 1.11.199; assume the shipping versions will be that or more recent.

这篇关于使用sc.textFile("s3a://bucket/filePath")Spark读取s3. java.lang.NoSuchMethodError:com.amazonaws.services.s3.transfer.TransferManager的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆