添加罐子到火花的工作 - 火花提交 [英] Add jars to a Spark Job - spark-submit

查看：154 发布时间：2016/5/22 15:59:44 java scala apache-spark

本文介绍了添加罐子到火花的工作 - 火花提交的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

真...它已经讨论了不少。

True ... it has been discussed quite a lot.

但是有很多歧义和一些问题的答案提供......包括在坛子里/执行器/驱动器配置或选项复制JAR引用。

However there is a lot of ambiguity and some of the answers provided ... including duplicating jar references in the jars/executor/driver configuration or options.

继模糊，不清晰，和/或省略细节应明确每个选项：

Following ambiguity, unclear, and/or omitted details should be clarified for each option:

类路径是如何受到影响
- 驱动程序
- 执行器（用于任务运行）
- 双方
- 并不
- 的任务（每个执行人）
- 远程驱动器（如果在集群模式下运行）
1. - 罐子
2. SparkContext.addJar（...） 方法
3. SparkContext.addFile（...） 方法
4. - 配置spark.driver.extraClassPath = ... 或 - 驱动程序类路径...
5. - 配置spark.driver.extraLibraryPath = ... 或 - 驱动程序库路径...
6. - 配置spark.executor.extraClassPath = ...
7. - 配置spark.executor.extraLibraryPath = ...
8. 不要忘记，火花提交也是一个.jar文件的最后一个参数。
1. --jars
2. SparkContext.addJar(...) method
3. SparkContext.addFile(...) method
4. --conf spark.driver.extraClassPath=... or --driver-class-path ...
5. --conf spark.driver.extraLibraryPath=..., or --driver-library-path ...
6. --conf spark.executor.extraClassPath=...
7. --conf spark.executor.extraLibraryPath=...
8. not to forget, the last parameter of the spark-submit is also a .jar file.
我知道我在哪里可以找到主火花文档，并具体了解如何提交的的选项使用，也是的的JavaDoc 。但是，这种留给我还是相当一些漏洞，尽管它回答部分了。

I am aware where I can find the main spark documentation, and specifically about how to submit, the options available, and also the JavaDoc. However that left for me still quite some holes, although it answered partially too.

我希望它是不是所有的复杂，虽然有人可以给我一个明确和简洁的答案。

I hope that it is not all that complex, and that someone can give me a clear and concise answer.

如果我是从文档猜测，似乎 - 罐子和 SparkContext addJar 和 addFile 方法是，将自动分发文件的人，而其他选项仅仅是修改后的classpath。

If I were to guess from documentation, it seems that --jars, and the SparkContext addJar and addFile methods are the ones that will automatically distribute files, while the other options merely modify the ClassPath.

难道是安全的假设，为了简单起见，我可以在同一时间使用添加3个主要选项的其他应用程序的jar文件：

Would it be safe to assume that for simplicity, I can add additional application jar files using the 3 main options at the same time:
```
spark-submit --jar additional1.jar,additional2.jar \
  --driver-library-path additional1.jar:additional2.jar \
  --conf spark.executor.extraLibraryPath=additional1.jar:additional2.jar \
  --class MyClass main-application.jar
```
上找到回答另一个发布。但是没有什么新的教训。海报确实使对本地驱动器（纱的客户端）和远程驱动器（丝簇）之间的区别了良好的话。绝对重要的是要记住。

Found a nice article on an answer to another posting. However nothing new learned. The poster does make a good remark on the difference between Local driver (yarn-client) and Remote Driver (yarn-cluster). Definitely important to keep in mind.

推荐答案

类路径是取决于你提供什么样的影响。有一对夫妇的方式来设置classpath上的内容：

ClassPath:

ClassPath is affected depending on what you provide. There are a couple of ways to set something on the classpath:
- spark.driver.extraClassPath 或它的别名 - 驱动程序类路径来设置额外的类路径主节点。
- spark.executor.extraClassPath 来设置额外的类路径中的工作节点上。
- spark.driver.extraClassPath or it's alias --driver-class-path to set extra classpaths on the Master node.
- spark.executor.extraClassPath to set extra class path on the Worker nodes.
如果你想成为的法师和工人都实现了一定的JAR，你必须在这两个标志分别指定这些。

If you want a certain JAR to be effected on both the Master and the Worker, you have to specify these separately in BOTH flags.

继相同的规则JVM
- 的Linux：一个collon ：
  - 例如： - 配置\"spark.driver.extraClassPath=/opt/prog/hadoop-aws-2.7.1.jar:/opt/prog/aws-java-sdk-1.10.50.jar\"
  例如： - 配置\"spark.driver.extraClassPath=/opt/prog/hadoop-aws-2.7.1.jar;/opt/prog/aws-java-sdk-1.10.50.jar\" 这要看你是下运行作业的方式： This depends on the mode which you're running your job under: 客户端模式 - 第一星火起来负责分发文件的Netty HTTP服务器上启动了每个工人节点。你可以看到，当您启动星火工作： Client mode - Spark first up a Netty HTTP server which distributes the files on start up for each of the worker nodes. You can see that when you start your Spark job: 16/05/08 17:29:12 INFO HttpFileServer: HTTP File server directory is /tmp/spark-48911afa-db63-4ffc-a298-015e8b96bc55/httpd-84ae312b-5863-4f4c-a1ea-537bfca2bc2b 16/05/08 17:29:12 INFO HttpServer: Starting HTTP Server 16/05/08 17:29:12 INFO Utils: Successfully started service 'HTTP file server' on port 58922. 16/05/08 17:29:12 INFO SparkContext: Added JAR /opt/foo.jar at http://***:58922/jars/com.clicktale.ai.pageview-creator_0.0.3.0.jar with timestamp 1462728552732 16/05/08 17:29:12 INFO SparkContext: Added JAR /opt/aws-java-sdk-1.10.50.jar at http://***:58922/jars/aws-java-sdk-1.10.50.jar with timestamp 1462728552767


                    
                        查看全文

添加罐子到火花的工作 - 火花提交 [英] Add jars to a Spark Job - spark-submit

问题描述

推荐答案

ClassPath:

Accepted URI's for files

受影响的选项：

Affected options:

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

添加罐子到火花的工作 - 火花提交 [英] Add jars to a Spark Job - spark-submit

问题描述

推荐答案

ClassPath:

Accepted URI's for files

受影响的选项：

Affected options:

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭