SparkContext.addFile与spark-submit --files [英] SparkContext.addFile vs spark-submit --files

查看:512
本文介绍了SparkContext.addFile与spark-submit --files的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spark 1.6.0.我想传递一些属性文件,例如log4j.properties和其他一些客户属性文件.我看到我们可以使用--files,但我也看到SparkContext中有一个addFile方法.我确实更喜欢使用--files而不是以编程方式添加文件,前提是两个选项都相同?

I am using Spark 1.6.0. I want to pass some properties files like log4j.properties and some other customer properties file. I see that we can use --files but I also saw that there is a method addFile in SparkContext. I did prefer to use --files instead of programatically adding the files, assuming both the options are same ?

我没有找到太多有关--files的文档,所以--files&也是如此. SparkContext.addFile两个选项是否相同?

I did not find much documentation about --files, so is --files & SparkContext.addFile both options same ?

我发现的有关-文件,对于推荐答案

这取决于您的Spark应用程序是以客户端模式还是群集模式运行.

It depends whether your Spark application is running in client or cluster mode.

客户端模式中,驱动程序(应用程序主程序)在本地运行,并且可以从项目中访问这些文件,因为它们在本地文件系统中可用. SparkContext.addFile 应该会找到您的本地文件,并且可以按预期工作.

In client mode the driver (application master) is running locally and can access those files from your project, because they are available within the local file system. SparkContext.addFile should find your local files and work like expected.

如果您的应用程序以集群模式运行.该申请是通过spark-submit提交的.这意味着您的整个应用程序将被转移到Spark master或Yarn,这将在特定节点上的群集中以及在单独环境中启动驱动程序(应用程序master).此环境无权访问您的本地项目目录.因此,所有必需的文件也必须被传输.这可以通过--files选项来实现.相同的概念适用于jar文件(Spark应用程序的依赖项).在集群模式下,需要使用--jars选项添加它们,以便在应用程序主目录的类路径中可用.如果您使用PySpark,则有--py-files选项.

If your application is running in cluster mode. The application is submitted via spark-submit. This means that your whole application is transfered to the Spark master or Yarn, which starts the driver (application master) within the cluster on a specific node and within an separated environment. This environment has no access to your local project directory. So all necessary files has to be transfered as well. This can be achieved with the --files option. The same concept applies to jar files (dependencies of your Spark application). In cluster mode, they need to be added with the --jars option to be available within the classpath of the application master. If you use PySpark there is a --py-files option.

这篇关于SparkContext.addFile与spark-submit --files的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆