带有 IntelliJ 和 SBT 的自定义文件夹结构的 Uber jar [英] Uber jar with custom folder structure with IntelliJ and SBT

查看:24
本文介绍了带有 IntelliJ 和 SBT 的自定义文件夹结构的 Uber jar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对云和 SBT/IntelliJ 还很陌生,所以试试 IntelliJ &SBT 构建环境将我的 jar 部署到数据 proc 集群上.

这是我的项目结构的屏幕截图:

代码非常简单,在 'mytestmain' 中定义了 main,它调用在 'ReadYamlConfiguration' 中定义的另一个方法,它需要一个 moltingyaml 依赖项,我已经包含在我的 build.sbt 中.

这是我的 build.sbt &assembly.sbt 文件:

lazy val root =(文件中的项目(.")).设置(名称:=MyTestProjectNew",版本:=0.0.1-快照",scalaVersion := "2.11.12",编译中的 mainClass := Some("com.test.processing.jobs.mytestmain.scala"))libraryDependencies ++= Seq(net.jcazevedo"%%moltingyaml"%0.4.2")编译中的 scalaSource := baseDirectory.value/src";

assembly.sbt 文件:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")

我创建了 assembly.sbt 来创建 Uber jar 以包含所需的依赖项并从终端运行 'SBT assembly'.它已成功创建了一个程序集 jar 文件,我能够在 Dataproc 集群上成功部署和运行该文件.

gcloud dataproc 作业提交 spark --cluster my-dataproc-cluster --region europe-north1 --class com.test.processing.jobs.mytestmain --jars gs://my-test-bucket/spark-jobs/MyTestProjectNew-assembly-0.0.1-SNAPSHOT.jar

代码按预期工作正常,没有任何问题.

现在我想拥有自己的自定义目录结构,如下所示:

例如,我想要一个名为 'spark-job' 的文件夹名称,其子目录名为 'SparkDataProcessing',然后是 src/main/scala 文件夹,包含包和各自的 scala 类和对象等.

我的主要方法定义在 'com.test.processing' 包内的 'job' 包中.

我需要在 build.sbt 中进行哪些更改?我正在根据我的项目结构寻找以 build.sbt 作为示例的详细说明.还请建议所有需要包含在 gitignore 文件中的内容.

我使用的是 IntelliJ Idea 2020 社区版SBT 1.3.3 版本.我在这里和那里尝试了一些东西,但总是以结构、jar 或 build.sbt 问题告终.我期待在下面的帖子中得到类似的答案.

解决方案

您还需要更改受影响文件顶部的 package 关键字后的包名称.但是,如果您使用 IntelliJ 进行重构(通过创建包,然后使用 UI 将文件拖到包中),那么 IntelliJ 会为您执行此操作.

没有其他需要改变的(build.sbt 和相关文件可以保持不变).

最后,记得修改 class 参数以反映入口点位置的变化;你会通过 --class com.test.processing.jobs.job.mytestmain 而不是 --class com.test.processing.jobs.mytestmain.

至于 .gitignore:请查看 示例 gitignore 文件 其中包括:

  • 包含目标"的输出目录
  • IntelliJ 目录,例如.idea"

另一个 gitignore 示例 忽略编译器生成的所有 .class 文件,这是另一种方法.您应该包含所有动态生成的文件,其中的更改对其他开发人员来说无关紧要.

I am fairly new to cloud and SBT/IntelliJ, So trying my luck with IntelliJ & SBT build environment to deploy my jar on data proc cluster.

Here's a screen shot of my project structure:

Code is quite simple with main defined in 'mytestmain' which call another method defined in 'ReadYamlConfiguration' which needed a moultingyaml dependency, which I have included as shown in my build.sbt.

Here's my build.sbt & assembly.sbt file:

lazy val root = (project in file(".")).
  settings(
    name := "MyTestProjectNew",
    version := "0.0.1-SNAPSHOT",
    scalaVersion := "2.11.12",
    mainClass in Compile := Some("com.test.processing.jobs.mytestmain.scala")
  )

libraryDependencies ++= Seq(
  "net.jcazevedo" %% "moultingyaml" % "0.4.2"
)

scalaSource in Compile := baseDirectory.value / "src" 

assembly.sbt file:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")

I created assembly.sbt to create Uber jar in order to include required dependencies and ran 'SBT assembly' from Terminal. It has created a assembly jar file successfully, Which I was able to deploy and run successfully on Dataproc cluster.

gcloud dataproc jobs submit spark 
--cluster my-dataproc-cluster 
--region europe-north1 --class com.test.processing.jobs.mytestmain 
--jars gs://my-test-bucket/spark-jobs/MyTestProjectNew-assembly-0.0.1-SNAPSHOT.jar

Code is working fine as expected with no issues.

Now I would like to have my own custom directory structure as shown below:

For example, I would like to have a folder name as 'spark-job' with a sub dir named as 'SparkDataProcessing' and then src/main/scala folder with packages and respective scala classes and objects etc.

my main method is defined in in package 'job' within 'com.test.processing' package.

What all changes do I need to make in build.sbt? I am looking for a detail explanation with build.sbt as a sample according to my project structure. Also please suggest what all needs to be included in gitignore file.

I am using IntelliJ Idea 2020 community edition and SBT 1.3.3 version. I tried few things here and there but always ended up some issue with structure, jar or build.sbt issues. I was expecting an answer something similar which is done in below post.

Why does my sourceDirectories setting have no effect in sbt?

As you can see in below pic, the source directory has been changed.

spark-jobs/SparkDataProcessing/src/main/Scala

and when I am building this with below path, it's not working.

scalaSource in Compile := baseDirectory.value / "src" 

it works when I keep the default structure. like src/main/scala

解决方案

You also need to change the package name after the package keyword at the top of affected files. However, if you refactor using IntelliJ (by creating the packages and then dragging the files into the package using the UI), then IntelliJ will do this for you.

Nothing else needs to be changed (build.sbt and related files can stay the same).

Finally, remember to change the class argument to reflect changes in entrypoint locations; you would pass --class com.test.processing.jobs.job.mytestmain instead of --class com.test.processing.jobs.mytestmain.

As for .gitignore: please take a look at an example gitignore file which includes:

  • output directories containing "target"
  • IntelliJ directories such as ".idea"

Another gitignore example ignores all .class files generated by the compiler, another approach. You should include all files which are generated dynamically, where changes do not matter to other developers.

这篇关于带有 IntelliJ 和 SBT 的自定义文件夹结构的 Uber jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆