关于Hadoop分布式缓存的困惑 [英] Confusion about distributed cache in Hadoop

查看:27
本文介绍了关于Hadoop分布式缓存的困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分发缓存实际上是什么意思?在分布式缓存中拥有一个文件意味着它在每个数据节点中都可用,因此该数据将没有节点间通信,或者这是否意味着该文件在每个节点的内存中?如果没有,我可以通过什么方式在整个工作的内存中保存一个文件?这可以为 map-reduce 和 UDF 完成吗..

What does the distribute cache actually mean? Having a file in distributed cache means that is it available in every datanode and hence there will be no internode communication for that data, or does it mean that the file is in memory in every node? If not, by what means can I have a file in memory for the entire job? Can this be done both for map-reduce, as well as for a UDF..

(特别是有一些配置数据,相对较小,我想保留在内存中,因为 UDF 应用于 hive 查询......?)

(In particular there is some configuration data, comparatively small that I would like to keep in memory as a UDF applies on hive query...? )

谢谢和问候,德鲁夫·卡普尔.

Thanks and regards, Dhruv Kapur.

推荐答案

DistributedCache 是 Map-Reduce 框架提供的一种工具,用于缓存应用程序所需的文件.一旦你为你的工作缓存了一个文件,hadoop 框架就会让它在你运行 map/reduce 任务的每个数据节点(在文件系统中,而不是在内存中)上可用.然后,您可以在 Mapper 或 Reducer 作业中将缓存文件作为本地文件访问.现在您可以轻松读取缓存文件并在代码中填充一些集合(例如数组、Hashmap 等).

DistributedCache is a facility provided by the Map-Reduce framework to cache files needed by applications. Once you cache a file for your job, hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Then you can access the cache file as local file in your Mapper Or Reducer job. Now you can easily read the cache file and populate some collection (e.g Array, Hashmap etc.) in your code.

参考 https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/filecache/DistributedCache.html

如果您还有疑问,请告诉我.

Let me know if still you have some questions.

您可以在 UDF 代码中将缓存文件作为本地文件读取.使用 JAVA API 读取文件后,只需填充任何集合(在内存中).

You can read the cache file as local file in your UDF code. After reading the file using JAVA APIs just populate any collection (In memory).

参考网址 http://www.lichun.cc/blog/2013/06/use-a-lookup-hashmap-in-hive-script/

-Ashish

这篇关于关于Hadoop分布式缓存的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆