多次输入到hadoop中的Mapper中 [英] multiple input into a Mapper in hadoop

查看:110
本文介绍了多次输入到hadoop中的Mapper中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将两个文件发送到hadoop减速器。
我尝试过使用DistributedCache,但是在main中使用addCacheFile的任何东西似乎都没有在映射器中返回给getLocalCacheFiles。



现在我我正在使用FileSystem来读取文件,但我正在本地运行,因此我只能发送文件的名称。想知道如果我在真正的hadoop系统上运行,该怎么做。



是否有将值发送给映射器除了它正在读取的文件?

解决方案

我也有很多分发缓存问题,并且发送参数。选项为我工作如下:



对于分布式缓存使用:
对于我来说,在Map或HDFS中获取url / path是一个噩梦减少,但是使用符号链接,它可以在作业的run()方法中工作

$ b $ $ $ $ $ $ $ $ $ $ $> DistributedCache.addCacheFile(new URI文件+#rules.dat),conf);
DistributedCache.createSymlink(conf);

然后在方法之前在Map或Reduce
中读取

  public static FileSystem hdfs; 

然后在Map或Reduce的setup()方法中使用

  hdfs = FileSystem.get(new Configuration())。open(new Path(rules.dat)); 

参数:
将一些值发送到Map或Reduce(可以是要打开的文件名来自HDFS):

  public int run(String [] args)throws Exception {
Configuration conf = new Configuration( );
...
conf.set(level,otherArgs [2]); //从命令行设置变量级别,它可以是文件名
...
}

然后在Map或Reduce类中:

  int level = Integer.parseInt(conf.get(level )); //这是int,但你也可以读取字符串等。


I am trying to send two files to a hadoop reducer. I tried DistributedCache, but anything I put using addCacheFile in main, doesn't seem to be given back to with getLocalCacheFiles in the mapper.

right now I am using FileSystem to read the file, but I am running locally so I am able to just send the name of the file. Wondering how to do this if I was running on a real hadoop system.

is there anyway to send values to the mapper except the file that it's reading?

解决方案

I also had a lot of problems with distribution cache, and sending parameters. Options worked for me are below:

For distributed cache usage: For me it was a nightmare to get the url/path to file on HDFS in Map or Reduce, but with symlink it worked in run() method of the job

DistributedCache.addCacheFile(new URI(file+"#rules.dat"), conf);
DistributedCache.createSymlink(conf);

and then read in Map or Reduce in header, before methods

public static FileSystem hdfs;

and then in setup() method of Map or Reduce

hdfs = FileSystem.get(new Configuration()).open(new Path ("rules.dat"));

For parameters: Send some values to Map or Reduce (could be a filename to open from HDFS):

public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    ...
    conf.set("level", otherArgs[2]); //sets variable level from command line, it could be a filename
    ...
}

then in Map or Reduce class just:

int level = Integer.parseInt(conf.get("level")); //this is int, but you can read also strings, etc.

这篇关于多次输入到hadoop中的Mapper中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆