getCacheFiles()和getLocalCacheFiles()是一样的吗? [英] Are getCacheFiles() and getLocalCacheFiles() the same?

查看:1708
本文介绍了getCacheFiles()和getLocalCacheFiles()是一样的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如 getLocalCacheFiles ()已被弃用,我试图找到一个替代方案。 getCacheFiles()似乎是一个,但我怀疑它们是否相同。

当您调用 addCacheFile(),HDFS中的文件将被下载到每个节点并使用 getLocalCacheFiles()本地化文件路径,并且可以从本地文件系统读取它。但是, getCacheFiles()返回的是HDFS中文件的URI。如果你通过这个URI读取文件,我怀疑你仍然从HDFS读取而不是本地文件系统。



以上是我的理解,我不知道它是否是正确。如果是这样, getLocalCacheFiles()的替代选择是什么?为什么Hadoop首先不赞成它?

解决方案

它是开源的。你总是可以找到引入 @Deprectated 的git blame:commit 735b50e8bd23f7fbeff3a08cf8f3fff8cbff7449 ,它用于 MAPREDUCE-4493 。在JIRA的尾部,你会发现这个讨论:


Omkar Vinit Joshi添加评论 - 13 / Jul / 13 00:18

Robert Joseph Evans如果我们在$ b $ jobContext()中弃用getLocalCacheFiles和getCacheFiles,那么用户如何获得本地缓存文件在
map任务中? YARN-916是相关的问题..谢谢。



Robert Joseph Evans添加评论 - 19 / Jul / 13 15:27

Omkar Vinit Joshi 通过打开符号链接当前工作目录。在YARN之前,
的默认行为是不在当前工作的
目录中创建指向分布式缓存中项目的符号链接。如果您的
需要链接,则必须特别打开该选项,并提供
所需符号链接的名称。获取文件
而没有符号链接的唯一方法是调用getLocalCacheFiles和getCacheFiles。在
YARN中,所有文件都将创建一个符号链接。
文件/目录的名称将是符号链接的名称。然而,在我想要hdfs://foo/bar.zip
和hdfs://bar/bar.zip的情况下,发生名称冲突的可能性是
。在1.0版本中,这两个文件都将被下载
并可通过弃用的API访问,但在YARN中将输出
警告,并且只会下载其中的一个。另外
因为这些API写入映射器代码的方式可能不会
知道其中只有一个被下载,并且将无法找到
丢失的并且炸毁。这就是为什么我不推荐他们赞成
推动人们总是使用符号链接,所以行为始终是
一致的。



Omkar Vinit Joshi添加评论 - 19 / Jul / 13 16:56

Robert Joseph Evans听起来不错..但是,把基于文件名的
限制..但这听起来是合理的考虑
的事实,这将阻止潜在的错误在地图代码和用户可以
绝对版本,以避免它...谢谢。 ..


所以你应该打开文件,它会在那里。没有专门的API。


As getLocalCacheFiles() is deprecated, I'm trying to find an alternative. getCacheFiles() seems to be one, but I doubt whether they are the same.

When you call addCacheFile(), the file in HDFS would be downloaded to every node and using getLocalCacheFiles() you can get the localized file path and you can read it from local file system. However, what getCacheFiles() returns is the URI of the file in HDFS. If you read file by this URI, I doubt that you still read from HDFS instead of local file system.

The above is my understanding, I don't know whether it's correct. If so, what's the alternative for getLocalCacheFiles()? And why Hadoop deprecate it in the first place?

解决方案

It's open source. You can always find the git blame that introduced the @Deprectated: commit 735b50e8bd23f7fbeff3a08cf8f3fff8cbff7449, which is for MAPREDUCE-4493. At the tail of the JIRA you'll find this discussion:

Omkar Vinit Joshi added a comment - 13/Jul/13 00:18
Robert Joseph Evans if we are deprecating getLocalCacheFiles and getCacheFiles in jobContext() then how the user is going to get local cached files in map task? YARN-916 is the related issue.. Thanks.

Robert Joseph Evans added a comment - 19/Jul/13 15:27
Omkar Vinit Joshi By opening the symbolic link in the current working directory. Prior to YARN the default behavior was to not create symlinks in the current working directory pointing to the items in the distributed cache. If you wanted links you had to specifically turn that option on and provide the name of the symlink you wanted. The only way to get to files without symlinks was to call getLocalCacheFiles and getCacheFiles. In YARN all files will have a symlink created. The name of the file/directory will be the name of the symlink. However, it is possible to have a name collision where I wanted hdfs://foo/bar.zip and hdfs://bar/bar.zip. In 1.0 both of these would have been downloaded and accessible through the deprecated APIs, but in YARN a warning will be output and only one of them will be downloaded. Also because of the way these APIs were written the mapper code may not know that only one of them was downloaded and will not be able to find the missing one and blow up. That is why I deprecated them in favor of nudging people to always use the symlinks so the behavior is always consistent.

Omkar Vinit Joshi added a comment - 19/Jul/13 16:56
Robert Joseph Evans sounds good.. however by this we will be putting limitation based on file name ..but that sounds reasonable considering the fact that this will stop potential bugs in map code and users can definitely version them to avoid it... Thanks...

So you're supposed to just open the file, it will be there. No dedicated API.

这篇关于getCacheFiles()和getLocalCacheFiles()是一样的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆