Hadoop 2.x中的DistributedCache [英] DistributedCache in Hadoop 2.x

查看:195
本文介绍了Hadoop 2.x中的DistributedCache的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hadoop 2.x中的DistributedCache中遇到了一个新API,我发现有些人在解决这个问题,但它并不能解决我的问题。 / hadoop-2-distributedcache-deprecated-and-doesnt-work-is-there-a-replacement / 20480460#20480460>示例

此解决方案因为我在尝试检索DistributedCache中的数据时遇到了NullPointerException

我的配置如下:

Driver



  public int run(String [] arg)throws Exception {
Configuration conf = this。 getConf();
工作职位=新职位(conf,工作名称);
...
job.addCacheFile(new URI(arg [1]);





  protected void setup(Context context)
throws IOException,InterruptedException {
Configuration conf = context.getConfiguration();
URI [] cacheFiles = context.getCacheFiles();
BufferedReader dtardr = new BufferedReader(new FileReader(cacheFiles [0] .toString()));

在这里,当它开始创建缓冲读取器时,它抛出NullPointerException异常,这是因为 context.getCacheFiles(); 总是返回NULL。如何解决这个问题,以及缓存文件存储在哪里(HDFS或者本地文件系统)?解决方案

如果您在Hadoop中使用本地JobRunner(非分布式模式,作为单个Java进程),则不会创建本地数据目录; getLocalCacheFiles()或getCacheFiles()调用将返回一个em您可以确保您以分布式或伪分布模式运行您的工作。



Hadoop框架工作将复制分布式缓存中设置的文件到作业中每个任务的本地工作目录。
所有缓存文件都存放在每台工作机的本地文件系统中。 (它们将位于mapred.local.dir的子目录中。)



您可以参考这个 link 以了解关于DistributedCache的更多信息。


I have a problem in DistributedCache in Hadoop 2.x the new API, I found some people working around this issue, but it does not solve my problem example

this solution does not work with me Because i got a NullPointerException when trying to retrieve the data in DistributedCache

My Configuration is as follows:

Driver

    public int run(String[] arg) throws Exception {
        Configuration conf = this.getConf();
        Job job= new Job(conf,"job Name");
        ...
        job.addCacheFile(new URI(arg[1]);

Setup

    protected void setup(Context context)
            throws IOException, InterruptedException {
        Configuration conf = context.getConfiguration();
        URI[] cacheFiles = context.getCacheFiles();
        BufferedReader dtardr = new BufferedReader(new FileReader(cacheFiles[0].toString()));

Here when it starts creating the buffered reader it throws the NullPointerException, this happenning because context.getCacheFiles(); returns always NULL. How to solve this problem, and where is the cache files stored(HDFS, or local file system)

解决方案

If you use the local JobRunner in Hadoop (non-distributed mode, as a single Java process), then no local data directory is created; the getLocalCacheFiles() or getCacheFiles() call will return an empty set of results.Can you make sure that you are running your job in a Distributed or Pseudo-Distributed mode.

Hadoop frame work will copy files set in the distributed cache to the local working directory of each task in the job. There are copies of all cached files, placed in the local file system of each worker machine. (They will be in a subdirectory of mapred.local.dir.)

Can you refer this link for understanding more about DistributedCache.

这篇关于Hadoop 2.x中的DistributedCache的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆