将不同的参数传递给每个映射器 [英] Passing different parameters to each mapper

查看:105
本文介绍了将不同的参数传递给每个映射器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用多个mapper和一个reducer的工作。映射器几乎完全相同,只是它们用于产生结果的 String 的值不同。



目前我有几个类,其中每一个对于字符串我提到的每个值都有提及—感觉应该有更好的方法,不需要太多的代码重复。有没有办法将这些 String 值作为参数传递给映射器?



我的工作如下所示:

 输入文件A ---->映射器A使用
字符串Foo---- +
| ---> Reducer
映射器B使用---- +
输入文件B ---->字符串Bar

我想把它变成这样的东西:

 输入文件A ----> GenericMapper使用StringFoo参数化
---- +
| ---> Reducer
GenericMapper参数化---- +
输入文件B ---->带有字符串Bar

编辑:这里有两个简化的mapper类,我现在有。它们准确地代表了我的实际情况。

  class MapperA扩展了Mapper<文本,文本,文本,文本> {
public void map(Text key,Text value,Context context){
context.write(key,new Text(value.toString()+Foo));
}
}

MapperB扩展了Mapper<文本,文本,文本,文本> {
public void map(Text key,Text value,Context context){
context.write(key,new Text(value.toString()+Bar));


编辑:每个映射器应该使用的仅取决于数据来自哪个文件。除非通过文件名,否则无法区分这些文件。 解决方案

假设您使用文件输入格式,则可以(b)
$ b

  if(context.getInputSplit()instanceof FileSplit){
FileSplit fileSplit =(FileSplit)context.getInputSplit();
Path inputPath = fileSplit.getPath();
String fileId = ... //将inputPath解析为文件id
...
}

你可以解析你想要的inputPath,例如仅使用文件名或仅使用分区ID等来生成标识输入文件的唯一ID。
例如:

  / some / path / A  - > A 
/ some / path / B - > B

在驱动程序中为每个可能的文件id配置属性:

  conf.set(my.property.A,foo); 
conf.set(my.property.B,bar);

在mapper计算文件id中,如上所述,并获得值:

  conf.get(my.property。+ fileId); 


I have a job that uses multiple mappers and one reducer. The mappers are almost identical, except they differ in the value of a String that they use to produce the result.

Currently I have several classes, one for each value of the String I mentioned — it feels like there should be a better way, that doesn't require so much code duplication. Is there a way to pass these String values as parameters to the mappers?

My job looks like this:

Input File A  ---->  Mapper A using
                       String "Foo"  ----+
                                         |--->  Reducer
                     Mapper B using  ----+
Input File B  ---->    String "Bar" 

I want to turn it into something like this:

Input File A  ---->  GenericMapper parameterized
                               with String "Foo" ----+
                                                     |--->  Reducer
                     GenericMapper parameterized ----+ 
Input File B  ---->            with String "Bar"

Edit: Here are two simplified mapper class that I currently have. They accurately represent my actual situation.

class MapperA extends Mapper<Text, Text, Text, Text> {
    public void map(Text key, Text value, Context context) {
        context.write(key, new Text(value.toString() + "Foo"));
    }
}

class MapperB extends Mapper<Text, Text, Text, Text> {
    public void map(Text key, Text value, Context context) {
        context.write(key, new Text(value.toString() + "Bar"));
    }
}

Edit: What string each mapper should use depends only on which file the data comes from. There is no way to differentiate between the files, except through the file name.

解决方案

Assuming you use file input formats, you can get you current input file name in the mapper like this:

if (context.getInputSplit() instanceof FileSplit) {
    FileSplit fileSplit = (FileSplit) context.getInputSplit();
    Path inputPath = fileSplit.getPath();
    String fileId = ... //parse inputPath into a file id
    ...
}

You can parse inputPath however you want, e.g. use file name only or partition id only, etc to generate a unique id identifying the input file. For example:

/some/path/A -> A
/some/path/B -> B

Configure your properties for each possible file "id" in your driver:

conf.set("my.property.A", "foo");
conf.set("my.property.B", "bar"); 

In the mapper compute file "id" as stated above and get the value:

conf.get("my.property." + fileId);

这篇关于将不同的参数传递给每个映射器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆