java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics [英] java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics

查看:43
本文介绍了java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从服务器运行一个简单的 spark to s3 应用程序,但我不断收到以下错误,因为服务器安装了 hadoop 2.7.3 并且它看起来不包含 GlobalStorageStatistics 类.我在 pom.xml 文件中定义了 hadoop 2.8.x,但试图通过在本地运行它来测试它.

I'm trying to run a simple spark to s3 app from a server but I keep getting the below error because the server has hadoop 2.7.3 installed and it looks like it doesn't include the GlobalStorageStatistics class. I have hadoop 2.8.x defined in my pom.xml file but trying to test it by running it locally.

如果我必须使用 hadoop 2.7.3,我怎样才能让它忽略搜索或包含该类的解决方法选项?

How can I make it ignore searching for that or what workaround options are there to include that class if I have to go with hadoop 2.7.3?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
    at com.ibm.cos.jdbc2DF$.main(jdbc2DF.scala:153)
    at com.ibm.cos.jdbc2DF.main(jdbc2DF.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StorageStatistics
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 28 more

推荐答案

您不能混合使用 Hadoop 的各个部分并期望事情能够正常工作.这不仅仅是 hadoop-common 和 hadoop-aws 中的内部类之间的紧密耦合,它还包括 hadoop-aws 模块构建的特定版本的 amazon-aws SDK.

You can't mix bits of Hadoop and expect things to work. It's not just the close coupling between internal classes in hadoop-common and hadoop-aws, its things like the specific version of the amazon-aws SDK the hadoop-aws module was built it.

如果在尝试使用 s3a:// URL 时获得 ClassNotFoundExceptionMethodNotFoundException 堆栈跟踪,则可能是 JAR 版本不匹配.

If you get ClassNotFoundException or MethodNotFoundException stack traces when trying to work with s3a:// URLs, JAR version mismatch is the likely cause.

使用 RFC2117 必须/应该/可能术语,以下是避免这种情况的规则:

Using the RFC2117 MUST/SHOULD/MAY terminology, here are the rules to avoid this situation:

  1. s3a 连接器位于 hadoop-aws JAR 中;这取决于 hadoop-common 和 aws-sdk-shaded JAR.
  2. 所有这些 JAR 都必须在类路径中.
  3. 类路径上所有版本的 hadoop-* JAR 必须完全相同版本,例如 3.3.1 或 3.2.2.否则:堆栈跟踪.总是
  4. 而且它们必须只属于那个版本;类路径上不得有多个版本的 hadoop-common、hadoop-aws 等.否则:堆栈跟踪.总是.通常 ClassNotFoundException 表示 hadoop-common 和 hadoop-aws 不匹配.
  5. 确切缺失的类因 Hadoop 版本而异:它是 org.apache.fs.s3a.S3AFileSystem 依赖的第一个类,而类加载器找不到 - 确切的类取决于不匹配JARs
  6. AWS 开发工具包版本应该是发布的版本.否则:也许是堆栈跟踪,也许不是.无论哪种方式 - 您都处于自我支持模式或已选择加入 QE 团队进行版本测试.
  7. 您需要的 AWS SDK 的具体版本可以从 Maven Repository 中确定
  8. 更改 AWS 开发工具包版本可能有效.您可以进行测试,如果存在兼容性问题:您可以进行修复.请参阅验证 AWS 开发工具包更新 至少你应该做的.
  9. 您应该使用最新版本的 Hadoop/Spark 进行测试.非关键错误修复不会回溯到旧的 Hadoop 版本,并且 S3A 和 ABFS 连接器正在快速发展.新版本会更好、更强、更快.一般
  10. 如果这些都不起作用.在 ASF JIRA 服务器上提交的错误报告将作为 WORKSFORME 关闭.配置问题不被视为代码错误
  1. The s3a connector is in hadoop-aws JAR; it depends on hadoop-common and the aws-sdk-shaded JARs.
  2. all these JARs MUST be on the classpath.
  3. All versions of the hadoop-* JARs on your classpath MUST be exactly the same version, e.g 3.3.1 everywhere, or 3.2.2. Otherwise: stack trace. Always
  4. And they MUST be exclusively of that version; there MUST NOT be multiple versions of hadoop-common, hadoop-aws etc on the classpath. Otherwise: stack trace. Always. Usually ClassNotFoundException indicating a mismatch in hadoop-common and hadoop-aws.
  5. The exact missing class varies across Hadoop releases: it's the first class depended on by org.apache.fs.s3a.S3AFileSystem which the classloader can't find -the exact class depends on the mismatch of JARs
  6. The AWS SDK version SHOULD be the one shipped. Otherwise: maybe stack trace, maybe not. Either way -you are in self-support mode or have opted to join a QE team for version testing.
  7. The specific version of the AWS SDK you need can be determined from Maven Repository
  8. Changing the AWS SDK versions MAY work. You get to test, and if there are compatibility problems: you get to fix. See Qualifying an AWS SDK Update for the least you should be doing.
  9. You SHOULD use the most recent versions of Hadoop you can/Spark is tested with. Non-critical bug fixes do not get backported to old Hadoop releases, and the S3A and ABFS connectors are rapidly evolving. New releases will be better, stronger, faster. Generally
  10. If none of this works. a bug report filed on the ASF JIRA server will get closed as WORKSFORME. Config issues aren't treated as code bugs

最后:ASF 文档S3A 连接器

这篇关于java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆