如何使用spark-submit运行Scala脚本(类似于Python脚本)? [英] How to run Scala script using spark-submit (similarly to Python script)?

查看:533
本文介绍了如何使用spark-submit运行Scala脚本(类似于Python脚本)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试按照 Spark快速入门教程.我可以轻松执行以下Python代码:

I try to execute a simple Scala script using Spark as described in the Spark Quick Start Tutorial. I have not troubles to execute the following Python code:

"""SimpleApp.py"""
from pyspark import SparkContext

logFile = "tmp.txt"  # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()

print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

我使用以下命令执行此代码:

I execute this code using the following command:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.py

但是,如果我尝试使用Scala进行同样的操作,则会遇到技术问题.更详细地说,我尝试执行的代码是:

However, if I try to do the same using Scala, I have technical problems. In more detail, the code that I try to execute is:

* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "tmp.txt" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

我尝试通过以下方式执行它:

I try to execute it in the following way:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.scala

结果是我收到以下错误消息:

As the result I get the following error message:

Error: Cannot load main class from JAR file

有人知道我在做什么错吗?

Does anybody know what I am doing wrong?

推荐答案

我想将@JacekLaskowski的另一种解决方案添加到我有时用于POC或测试目的的解决方案中.

I want to add to @JacekLaskowski's an alternative solution I use sometimes for POC or tests purposes.

script.scala内部的script.scala:load一起使用.

It would be to use the script.scala from inside the spark-shell with :load.

:load /path/to/script.scala

您不需要定义SparkContext/SparkSession,因为脚本将使用REPL范围内定义的变量.

You won't need to define a SparkContext/SparkSession as the script will use the variables defined in the scope of the REPL.

您也不需要将代码包装在Scala对象中.

You also don't need to wrap the code in a Scala object.

PS:我将其更多地视为黑客行为,而不是用于生产目的.

PS: I consider this more as a hack and not to use for production purposes.

这篇关于如何使用spark-submit运行Scala脚本(类似于Python脚本)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆