多次输入到hadoop中的Mapper中 [英] multiple input into a Mapper in hadoop

查看：110 发布时间：2018/5/31 20:10:07 hadoop mapreduce

本文介绍了多次输入到hadoop中的Mapper中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将两个文件发送到hadoop减速器。
我尝试过使用DistributedCache，但是在main中使用addCacheFile的任何东西似乎都没有在映射器中返回给getLocalCacheFiles。

现在我我正在使用FileSystem来读取文件，但我正在本地运行，因此我只能发送文件的名称。想知道如果我在真正的hadoop系统上运行，该怎么做。

是否有将值发送给映射器除了它正在读取的文件？

解决方案

我也有很多分发缓存问题，并且发送参数。选项为我工作如下：

对于分布式缓存使用：
对于我来说，在Map或HDFS中获取url / path是一个噩梦减少，但是使用符号链接，它可以在作业的run（）方法中工作

$ b $ $ $ $ $ $ $ $ $ $ $> DistributedCache.addCacheFile（new URI文件+＃rules.dat），conf）;
DistributedCache.createSymlink（conf）;

然后在方法之前在Map或Reduce
中读取
public static FileSystem hdfs;
然后在Map或Reduce的setup（）方法中使用

hdfs = FileSystem.get（new Configuration（））。open（new Path（rules.dat））;
参数：
将一些值发送到Map或Reduce（可以是要打开的文件名来自HDFS）：

public int run（String [] args）throws Exception { Configuration conf = new Configuration（）; ... conf.set（level，otherArgs [2]）; //从命令行设置变量级别，它可以是文件名 ... }
然后在Map或Reduce类中：

int level = Integer.parseInt（conf.get（level ））; //这是int，但你也可以读取字符串等。

I am trying to send two files to a hadoop reducer. I tried DistributedCache, but anything I put using addCacheFile in main, doesn't seem to be given back to with getLocalCacheFiles in the mapper.

right now I am using FileSystem to read the file, but I am running locally so I am able to just send the name of the file. Wondering how to do this if I was running on a real hadoop system.

is there anyway to send values to the mapper except the file that it's reading?
解决方案
I also had a lot of problems with distribution cache, and sending parameters. Options worked for me are below:

For distributed cache usage: For me it was a nightmare to get the url/path to file on HDFS in Map or Reduce, but with symlink it worked in run() method of the job
DistributedCache.addCacheFile(new URI(file+"#rules.dat"), conf); DistributedCache.createSymlink(conf);
and then read in Map or Reduce in header, before methods
public static FileSystem hdfs;
and then in setup() method of Map or Reduce
hdfs = FileSystem.get(new Configuration()).open(new Path ("rules.dat"));
For parameters: Send some values to Map or Reduce (could be a filename to open from HDFS):
public int run(String[] args) throws Exception { Configuration conf = new Configuration(); ... conf.set("level", otherArgs[2]); //sets variable level from command line, it could be a filename ... }
then in Map or Reduce class just:
int level = Integer.parseInt(conf.get("level")); //this is int, but you can read also strings, etc.

这篇关于多次输入到hadoop中的Mapper中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多次输入到hadoop中的Mapper中 [英] multiple input into a Mapper in hadoop

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

多次输入到hadoop中的Mapper中 [英] multiple input into a Mapper in hadoop

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭