java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时 [英] java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags

查看:24
本文介绍了java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark 流和 Scala 的初学者.对于项目要求,我试图运行 github 中存在的 TwitterPopularTags 示例.由于 SBT 程序集对我不起作用,而且我不熟悉 SBT,因此我尝试使用 Maven 进行构建.经过大量最初的小问题,我能够创建 jar 文件.但是在尝试执行它时,我收到以下错误.有人能帮我解决这个问题吗?

I am a beginner in Spark streaming and Scala. For a project requirement I was trying to run TwitterPopularTags example present in github. As SBT assembly was not working for me and I was not familiar with SBT I am trying to use Maven for building. After a lot of initial hiccups, I was able to create the jar file. But while trying to execute it I am getting the following error. Can anybody help me in resolving this?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
    at TwitterPopularTags$.main(TwitterPopularTags.scala:43)
    at TwitterPopularTags.main(TwitterPopularTags.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterUtils$
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 9 more

我添加了以下依赖项Spark-streaming_2.10:1.1.0Spark-core_2.10:1.1.0Spark-streaming-twitter_2.10:1.1.0

I have added following dependencies Spark-streaming_2.10:1.1.0 Spark-core_2.10:1.1.0 Spark-streaming-twitter_2.10:1.1.0

我什至为 Spark-streaming-twitter 尝试了 1.2.0,但这也给了我同样的错误.

I even tried the 1.2.0 for Spark-streaming-twitter but that also was giving me the same error.

提前感谢您的帮助.

问候,vpv

推荐答案

感谢您提出建议.我只能通过使用 SBT 程序集来解决这个问题.以下是有关我如何做到这一点的详细信息.

Thank you for giving your suggestion. I was able to resolve this issue by using SBT assembly only. Following is the details regarding how I did this.

Spark - Cloudera VM 中已存在Scala - 不确定它是否存在于 Cloudera 中,如果没有,我们可以安装它SBT - 这也需要安装.我在本地机器上进行了安装并将 Jar 传输到 VM.为了安装它,我使用了以下链接

Spark - Already present in Cloudera VM Scala - Not sure if this is present in Cloudera, if not we can install it SBT - This also needs to be installed. I did both the installs on my local machine and transferred the Jar to the VM. For installing this I used the following link

https://gist.github.com/visenger/5496675

1) 一旦创建了所有这些.我们必须为我们的项目创建父文件夹.我创建了一个名为 Twitter 的文件夹.

1) Once all these are created. We have to create the parent folder for our project. I created a folder called Twitter.

2) 创建另一个具有以下结构 Twitter/src/main/scala 的文件夹,并在该文件夹中创建名为 TwitterPopularTags.scala 的 .scala 文件.这与我们从 github 获得的代码略有不同.我不得不更改导入语句

2) Create another folder with the following structure Twitter/src/main/scala and created the .scala file in this folder with the name TwitterPopularTags.scala. This has slight changes from the code which we got from the github. I had to change the import statements

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf

3) 之后,在父文件夹下创建另一个文件夹,名称如下

3) After this, create another folder under the parent folder with the following name

推特/项目

并创建一个名为 assembly.sbt 的文件.这有程序集插件的路径.以下是文件中的完整代码.

and create a file with the name assembly.sbt . This has the path for the assembly plugin. Following is the full code present in the file.

resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/"))

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

4) 以上两个创建完成后,在项目(Twitter)的父目录中创建一个名为 build.sbt 的文件.这是我们需要提供我们需要创建的 .Jar 文件的名称以及依赖项的地方.请注意,即使是此文件中代码之间的空行也很重要.

4) Once the above two are created, create a file in the parent directory of the project (Twitter) with the name build.sbt. This is where we need to provide the name of the .Jar file we need to create and also the dependencies. Please note that even the blank lines between the codes in this file are important.

name := "TwitterPopularTags"

version := "1.0"

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.2.0" 

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3" 

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

5) 最后,我们必须打开终端并转到项目的父文件夹(Twitter).从这里输入以下命令:

5) Finally we have to open the terminal and go to the parent folder of the project (Twitter). From here enter the following command:

sbt assembly

这将下载依赖项并创建我们需要的 jar 文件.

This will download the dependencies and create the jar file we need.

6) 为了运行程序,我们需要在我们的 ID 下创建一个 twitter 应用程序,并提供身份验证令牌和其他详细信息.以下链接提供了有关如何创建它的详细步骤.

6) In order to run the program we need a twitter app created under our ID and provide the auth token and other details. The detailed step on how to create this is present in following link.

http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

7) 完成上述所有操作后,我们可以使用来自 VM 的 spark-submit 命令来运行作业.示例命令是

7) Once we have all the above done we can use the spark-submit command from VM to run the job. Example command is

./bin/spark-submit \
  --class TwitterPopularTags \
  --master local[4] \
  /path/to/TwitterPopilarTags.jar \
  comsumerkey consumersecret accesstoken accesssecret 

8) 这会将输出打印到控制台,以便监控输出最好降低频率通过调整代码.

8) This prints the output to the console so to monitor the output it is better to reduce the frequency by adjusting the code.

如果需要更多详细信息,请告诉我.

Please let me know if any more details are required.

谢谢&问候,

VPV

这篇关于java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆