运行本地文件系统目录作为集群中映射器的输入 [英] Run a Local file system directory as input of a Mapper in cluster

查看:29
本文介绍了运行本地文件系统目录作为集群中映射器的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从本地文件系统向映射器提供了一个输入.它从 eclipse 成功运行,但没有从集群运行,因为它无法找到本地输入路径说:输入路径不存在.请任何人帮忙我如何为映射器提供本地文件路径,以便它可以在集群中运行,并且我可以在 hdfs 中获取输出

I gave an input to the mapper from a local filesystem.It is running successfully from eclipse,But not running from the cluster as it is unable to find the local input path saying:input path does not exist.Please can anybody help me how to give a local file path to a mapper so that it can run in the cluster and i can get the output in hdfs

推荐答案

这是一个非常古老的问题.最近遇到了同样的问题.我不知道如何正确这个解决方案对我有用.如果有任何缺点,请注意.这就是我所做的.

This is a very old question. Recently faced the same issue. I am not aware of how correct this solution is it worked for me though. Please bring to notice if there are any drawbacks of this.Here's what I did.

Readno arefer solution="从邮件档案,我意识到如果我将 fs.default.namehdfs://localhost:8020/ 修改为 file:/// 可以访问本地文件系统.但是,我不希望我所有的 mapreduce 工作都这样做.因此,我在本地系统文件夹中制作了 core-site.xml 的副本(与我将 MR jar 提交到 hadoop jar 的文件夹相同).

Reading a solution from the mail-archives, I realised if i modify fs.default.name from hdfs://localhost:8020/ to file:/// it can access the local file system. However, I didnt want this for all my mapreduce jobs. So I made a copy of core-site.xml in a local system folder (same as the one from where I would submit my MR jar to hadoop jar).

在我的MRDriver类中,我添加了,

and in my Driver class for MR I added,

Configuration conf = new Configuration();
conf.addResource(new Path("/my/local/system/path/to/core-site.xml"));
conf.addResource(new Path("/usr/lib/hadoop-0.20-mapreduce/conf/hdfs-site.xml"));

MR 从本地系统获取输入并将输出写入 hdfs:

The MR takes input from local system and writes the output to hdfs:

这篇关于运行本地文件系统目录作为集群中映射器的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆