java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics [英] java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
问题描述
我正在尝试从服务器运行一个简单的 spark to s3 应用程序,但我不断收到以下错误,因为服务器安装了 hadoop 2.7.3 并且它看起来不包含 GlobalStorageStatistics 类一>.我在 pom.xml 文件中定义了 hadoop 2.8.x,但试图通过在本地运行它来测试它.
I'm trying to run a simple spark to s3 app from a server but I keep getting the below error because the server has hadoop 2.7.3 installed and it looks like it doesn't include the GlobalStorageStatistics class. I have hadoop 2.8.x defined in my pom.xml file but trying to test it by running it locally.
如果我必须使用 hadoop 2.7.3,我怎样才能让它忽略搜索或包含该类的解决方法选项?
How can I make it ignore searching for that or what workaround options are there to include that class if I have to go with hadoop 2.7.3?
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
at com.ibm.cos.jdbc2DF$.main(jdbc2DF.scala:153)
at com.ibm.cos.jdbc2DF.main(jdbc2DF.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StorageStatistics
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more
推荐答案
您不能混合使用 Hadoop 的各个部分并期望事情能够正常工作.这不仅仅是 hadoop-common 和 hadoop-aws 中的内部类之间的紧密耦合,它还包括 hadoop-aws 模块构建的特定版本的 amazon-aws SDK.
You can't mix bits of Hadoop and expect things to work. It's not just the close coupling between internal classes in hadoop-common and hadoop-aws, its things like the specific version of the amazon-aws SDK the hadoop-aws module was built it.
如果在尝试使用 s3a://
URL 时获得 ClassNotFoundException
或 MethodNotFoundException
堆栈跟踪,则可能是 JAR 版本不匹配.
If you get ClassNotFoundException
or MethodNotFoundException
stack traces when trying to work with s3a://
URLs, JAR version mismatch is the likely cause.
使用 RFC2117 必须/应该/可能术语,以下是避免这种情况的规则:
Using the RFC2117 MUST/SHOULD/MAY terminology, here are the rules to avoid this situation:
- s3a 连接器位于 hadoop-aws JAR 中;这取决于 hadoop-common 和 aws-sdk-shaded JAR.
- 所有这些 JAR 都必须在类路径中.
- 类路径上所有版本的 hadoop-* JAR 必须完全相同版本,例如 3.3.1 或 3.2.2.否则:堆栈跟踪.总是
- 而且它们必须只属于那个版本;类路径上不得有多个版本的 hadoop-common、hadoop-aws 等.否则:堆栈跟踪.总是.通常
ClassNotFoundException
表示 hadoop-common 和 hadoop-aws 不匹配. - 确切缺失的类因 Hadoop 版本而异:它是
org.apache.fs.s3a.S3AFileSystem
依赖的第一个类,而类加载器找不到 - 确切的类取决于不匹配JARs - AWS 开发工具包版本应该是发布的版本.否则:也许是堆栈跟踪,也许不是.无论哪种方式 - 您都处于自我支持模式或已选择加入 QE 团队进行版本测试.
- 您需要的 AWS SDK 的具体版本可以从 Maven Repository 中确定
- 更改 AWS 开发工具包版本可能有效.您可以进行测试,如果存在兼容性问题:您可以进行修复.请参阅验证 AWS 开发工具包更新 至少你应该做的.
- 您应该使用最新版本的 Hadoop/Spark 进行测试.非关键错误修复不会回溯到旧的 Hadoop 版本,并且 S3A 和 ABFS 连接器正在快速发展.新版本会更好、更强、更快.一般
- 如果这些都不起作用.在 ASF JIRA 服务器上提交的错误报告将作为 WORKSFORME 关闭.配置问题不被视为代码错误
- The s3a connector is in hadoop-aws JAR; it depends on hadoop-common and the aws-sdk-shaded JARs.
- all these JARs MUST be on the classpath.
- All versions of the hadoop-* JARs on your classpath MUST be exactly the same version, e.g 3.3.1 everywhere, or 3.2.2. Otherwise: stack trace. Always
- And they MUST be exclusively of that version; there MUST NOT be multiple versions of hadoop-common, hadoop-aws etc on the classpath. Otherwise: stack trace. Always. Usually
ClassNotFoundException
indicating a mismatch in hadoop-common and hadoop-aws. - The exact missing class varies across Hadoop releases: it's the first class depended on by
org.apache.fs.s3a.S3AFileSystem
which the classloader can't find -the exact class depends on the mismatch of JARs - The AWS SDK version SHOULD be the one shipped. Otherwise: maybe stack trace, maybe not. Either way -you are in self-support mode or have opted to join a QE team for version testing.
- The specific version of the AWS SDK you need can be determined from Maven Repository
- Changing the AWS SDK versions MAY work. You get to test, and if there are compatibility problems: you get to fix. See Qualifying an AWS SDK Update for the least you should be doing.
- You SHOULD use the most recent versions of Hadoop you can/Spark is tested with. Non-critical bug fixes do not get backported to old Hadoop releases, and the S3A and ABFS connectors are rapidly evolving. New releases will be better, stronger, faster. Generally
- If none of this works. a bug report filed on the ASF JIRA server will get closed as WORKSFORME. Config issues aren't treated as code bugs
最后:ASF 文档S3A 连接器一个>
这篇关于java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!