如何设置Intellij 14 Scala Worksheet来运行Spark [英] How to setup Intellij 14 Scala Worksheet to run Spark

查看:1312
本文介绍了如何设置Intellij 14 Scala Worksheet来运行Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Intellij 14 Scala工作表中创建一个SparkContext。



这里是我的依赖项

  name:=LearnSpark
版本:=1.0
scalaVersion:=2.11.7
//使用Spark API
libraryDependencies + =org.apache.spark%%spark-core%1.4.0

以下是我在工作表中运行的代码

  import org.apache.spark。{SparkContext,SparkConf } 
val conf = new SparkConf()。setMaster(local)。setAppName(spark-play)
val sc = new SparkContext(conf)

错误

  15/08/24 14 :01:59错误SparkContext:初始化SparkContext时出错。 
抛出java.lang.ClassNotFoundException:rg.apache.spark.rpc.akka.AkkaRpcEnvFactory
在java.net.URLClassLoader的$ 1.run(URLClassLoader.java:372)$ B $在java.net上湾URLClassLoader的1.run $(URLClassLoader.java:361)
在java.security.AccessController.doPrivileged(本机方法)
在java.net.URLClassLoader.findClass(URLClassLoader.java:360)
在java.lang.ClassLoader.loadClass(ClassLoader.java:424)
在sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:308)
在java.lang.ClassLoader.loadClass器(classloader .java:357)
at java.lang.Class.forName0(Native Method)

当我将Spark作为独立应用程序运行时,它运行正常。例如

  import org.apache.spark。{SparkContext,SparkConf} 

//停止详细logs
import org.apache.log4j。{Level,Logger}

object TestMain {

Logger.getLogger(org)。setLevel(Level.OFF )

def main(args:Array [String]):Unit = {

//创建SparkContext
val conf = new SparkConf()
.setMaster(local [2])
.setAppName(mySparkApp)
.set(spark.executor.memory,1g)
.set(spark。 rdd.compress,true)
.set(spark.storage.memoryFraction,1)

val sc = new SparkContext(conf)

val data = sc.parallelize(1到10000000).collect()。filter(_< 1000)
data.foreach(println)
}
}

有人可以提供一些指导,说明我应该在哪些方面解决此异常吗?



谢谢。

解决方案

因为还有相当的有些人怀疑是否可以使用Spark运行IntelliJ IDEA Scala工作表并且这个问题是最直接的问题,我想分享我的截图和菜谱样式配方,以便在工作表中评估Spark代码。



我在IntelliJ IDEA中使用Spark 2.1.0和Scala Worksheet(CE 2016.3.4)。



第一步是在IntelliJ中导入依赖项时有build.sbt文件。我使用了



IntelliJ CE 2017.1的更新(R中的工作表) EPL模式)



在2017.1中,Intellij为工作表引入了REPL模式。我选中了使用REPL选项测试了相同的代码。要运行此模式,您需要在上面描述的工作表设置中选中在编译器进程中运行工作表复选框(默认情况下)。



代码在Worksheet REPL模式下运行正常。



这是截图:


I'm trying to create a SparkContext in an Intellij 14 Scala Worksheet.

here are my dependencies

name := "LearnSpark"
version := "1.0"
scalaVersion := "2.11.7"
// for working with Spark API
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"

Here is the code i run in the worksheet

import org.apache.spark.{SparkContext, SparkConf}
val conf = new SparkConf().setMaster("local").setAppName("spark-play")
val sc = new SparkContext(conf)

error

15/08/24 14:01:59 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: rg.apache.spark.rpc.akka.AkkaRpcEnvFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)

When I run Spark as standalone app it works fine. For example

import org.apache.spark.{SparkContext, SparkConf}

// stops verbose logs
import org.apache.log4j.{Level, Logger}

object TestMain {

  Logger.getLogger("org").setLevel(Level.OFF)

  def main(args: Array[String]): Unit = {

    //Create SparkContext
    val conf = new SparkConf()
      .setMaster("local[2]")
      .setAppName("mySparkApp")
      .set("spark.executor.memory", "1g")
      .set("spark.rdd.compress", "true")
      .set("spark.storage.memoryFraction", "1")

    val sc = new SparkContext(conf)

    val data = sc.parallelize(1 to 10000000).collect().filter(_ < 1000)
    data.foreach(println)
  }
}

Can someone provide some guidance on where I should look to resolve this exception?

Thanks.

解决方案

Since there still are quite some doubts if it is at all possible to run IntelliJ IDEA Scala Worksheet with Spark and this question is the most direct one, I wanted to share my screenshot and a cookbook style recipe to get Spark code evaluated in the Worksheet.

I am using Spark 2.1.0 with Scala Worksheet in IntelliJ IDEA (CE 2016.3.4).

The first step is to have build.sbt file when importing dependencies in IntelliJ. I have used the same simple.sbt from the Spark Quick Start:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"

The second step is to uncheck 'Run worksheet in the compiler process' checkbox in Settings -> Languages and Frameworks -> Scala -> Worksheet. I have also tested the other Worksheet settings and they had no effect on the warning about duplicate Spark context creation.

Here is the version of the code from SimpleApp.scala example in the same guide modified to work in the Worksheet. The master and appName parameters have to be set in the same Worksheet:

import org.apache.spark.{SparkConf, SparkContext}

val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("Simple Application")

val sc = new SparkContext(conf)

val logFile = "/opt/spark-latest/README.md"
val logData = sc.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()

println(s"Lines with a: $numAs, Lines with b: $numBs")

Here is a screenshot of the functioning Scala Worksheet with Spark:

UPDATE for IntelliJ CE 2017.1 (Worksheet in REPL mode)

In 2017.1 Intellij introduced REPL mode for Worksheet. I have tested the same code with 'Use REPL' option checked. For this mode to run you need to leave the 'Run worksheet in the compiler process' checkbox in Worksheet Settings I have described above checked (it is by default).

The code runs fine in Worksheet REPL mode.

Here is the Screenshot:

这篇关于如何设置Intellij 14 Scala Worksheet来运行Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆