Spark build.sbt 文件版本控制 [英] Spark build.sbt file versioning

查看:36
本文介绍了Spark build.sbt 文件版本控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解进入 spark 程序 build.sbt 文件的多个版本号.

I am having a hard time understanding the multiple version numbers going into the build.sbt file for spark programs.

1. version
2. scalaVersion
3. spark version?
4. revision number.

这些版本之间也有多种兼容性.你能解释一下如何为我的项目决定这些版本吗.

There are multiple compatibility between these versions as well. Can you please explain how to decide these versions for my project.

推荐答案

我希望以下 SBT 台词及其评论足以解释您的问题.

I hope the following SBT lines and their comments will be sufficient to explain your question.

// The version of your project itself.
// You can change this value whenever you want,
// e.g. everytime you make a production release.
version := "0.1.0"

// The Scala version your project uses for compile.
// If you use spark, you can only use a 2.11.x version.
// Also, because Spark includes its own Scala in runtime
// I recommend you use the same one;
//you can check which one your Spark instance uses in the spark-shell.
scalaVersion := "2.11.12"

// The spark version the project uses for compile.
// Because you wont generate an uber jar with Spark included,
// but deploy your jar to an spark cluster instance.
// This version must match with the remote one, unless you want weird bugs...
val SparkVersion = "2.3.1"
// Note, I use a val with the Spark version
// to make it easier to include several Spark modules in my project,
// this way, if I want/have to change the Spark version,
// I only have to modify one line,
// and avoid strange erros because I changed some versions, but not others.
// Also note the 'Provided' modifier at the end,
// it indicates SBT that it shouldn't include the Spark bits in the generated jar
// neither in package nor assembly tasks.
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % SparkVersion % Provided,
  "org.apache.spark" %% "spark-sql" % SparkVersion % Provided,
)

// Exclude Scala from the assembly jar, because spark already includes it.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

您还应该注意 SBT 版本,即您项目中使用的 SBT 版本.您可以在 project/build.properties" 文件中进行设置.

You should also take care of the SBT version, that is the version of the SBT used in your project. You set it in the "project/build.properties" file.

sbt.version=1.2.3

注意:我使用 sbt-assembly 插件来生成一个包含除 Spark 和 Scala 之外的所有依赖项的 jar.如果您使用其他库(例如 MongoSparkConnector),这将非常有用.

Note: I use the sbt-assembly plugin, to generate a jar with all dependencies included except Spark and Scala. This is usefull if you use other libraries like the MongoSparkConnector for example.

这篇关于Spark build.sbt 文件版本控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆