在纱上使用spark-submits的--archives选项上载zip文件 [英] Upload zip file using --archives option of spark-submit on yarn

查看:1585
本文介绍了在纱上使用spark-submits的--archives选项上载zip文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些模型文件的目录,由于某种原因,我的应用程序必须访问本地文件系统中的这些模型文件.

I have a directory with some model files and my application has to access these models files in local file system due to some reason.

当然我知道spark-submit--files选项可以将文件上传到每个执行程序的工作目录中,并且它确实可以工作.

Of course I know that --files option of spark-submit can upload file to the working directory of each executor and it does work.

但是,我想保留文件的目录结构,所以我想出了--archives选项,即

However, I want keep the directory structure of my files so I come up with --archives option, which is said

YARN-only:
......
--archives ARCHIVES         Comma separated list of archives to be extracted into the working directory of each executor.
......

但是当我实际使用它上载models.zip时,我发现毛线只是像不使用--files那样将其放在没有提取的地方.我是否误解了to be extracted或滥用了此选项?

But when I actually use it to upload models.zip, I found yarn just put it there without extraction, like what it did with --files. Have I misunderstood to be extracted or misused this option?

推荐答案

我自己找到了答案.

YARN确实提取了档案,但是添加了一个与档案名称相同的额外文件夹.为了明确起见,如果将models/model1models/models2放在models.zip中,则必须通过models.zip/models/model1models.zip/models/model2访问我的模型.

YARN does extract the archive but add an extra folder with the same name of the archive. To make it clear, If I put models/model1 and models/models2 in models.zip, then I have to access my models by models.zip/models/model1 and models.zip/models/model2.

此外,我们可以使用#语法使其更美观.

Moreover, we can make this more beautiful using the # syntax.

--files和--archives选项支持使用#与Hadoop类似来指定文件名.例如,您可以指定:--files localtest.txt#appSees.txt,这会将您本地命名为localtest.txt的文件上传到HDFS,但这将通过名称appSees.txt链接到该文件,并且您的应用程序应使用在YARN上运行时,将其命名为appSees.txt以进行引用.

The --files and --archives options support specifying file names with the # similar to Hadoop. For example you can specify: --files localtest.txt#appSees.txt and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name appSees.txt, and your application should use the name as appSees.txt to reference it when running on YARN.

此答案已在spark 2.0.0上进行了测试,我不确定其他版本中的行为.

This answer was tested on spark 2.0.0 and I'm not sure the behavior in other versions.

这篇关于在纱上使用spark-submits的--archives选项上载zip文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆