控制MultipleOutputFormat文件子路径 [英] Control the MultipleOutputFormat files sub-path

查看:108
本文介绍了控制MultipleOutputFormat文件子路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据Reducer键控制由MultipleOutputFormat管理的不同文件的子路径。

基本上我想根据给定的缩放器的键来设置文件的子路径。



我可以通过重写MultipleOutputFormatbut的generateFileNameForKeyValue方法来更改文件名,但我怎样才能更改这些文件的子路径?



我的意思是重写generateFileNameForKeyValue,我得到

  mySetJobConfigOutputPath / fileNameBasedKey1.dat 
/ fileNameBasedKey2.dat
/fileNameBasedKey3.dat
...

但我想使它成为下面的组织文件

  mySetJobConfigOutputPath / path0ConfiguredInsideReducerBasedOnKey / fileNameBasedKey1.dat 

/ path1ConfiguredInsideReducerBasedOnKey /fileNameBasedKey2.dat
/fileNameBasedKey3.dat

/path2ConfiguredInsideReducerBasedOnKey/fileNameBasedKey8.dat

as可以看出,子路径和文件名都是通过减速器内部的键来计算出来的。



我知道如何配置文件名,但想知道是否可以在mySetJobConfigOutputPath文件夹下配置每个文件的子路径?

$ b $我发现我也可以覆盖 getInputFileBasedOutputFileName 方法,并为其指定子方法,

  @Override 
protected String getInputFileBasedOutputFileName(JobConf conf,String Name)
{
//你的逻辑在这里。只需添加名称的子路径并返回
}

您仍然应该执行 generateFileNameForKeyValue 将您的主档案名称转换为密钥



UPDATE:基本上这解释了所有 http://www.infoq.com/articles/HadoopOutputFormat


I need to control the sub-path of the different different files being managed by MultipleOutputFormat based on the reducer key.

I basically want to set the sub path of the file based on the key given to the reducer.

I can changed the file name by overwrting the generateFileNameForKeyValue method of MultipleOutputFormatbut how can I also change the sub-path of these files?

I mean with just overriding the generateFileNameForKeyValue, I get

mySetJobConfigOutputPath/fileNameBasedKey1.dat
                        /fileNameBasedKey2.dat
                        /fileNameBasedKey3.dat
                        ...

but I want to make it to be organize files like below

 mySetJobConfigOutputPath/path0ConfiguredInsideReducerBasedOnKey/fileNameBasedKey1.dat

                         /path1ConfiguredInsideReducerBasedOnKey/fileNameBasedKey2.dat
                                                                /fileNameBasedKey3.dat

                         /path2ConfiguredInsideReducerBasedOnKey/fileNameBasedKey8.dat

as seen, the sub-path and the file name are both figured out by the key inside the reducer.

I know how to configure the file name but was wondering if I can configure the sub-path of the each file under the mySetJobConfigOutputPath folder?

解决方案

I found out that that I can override the getInputFileBasedOutputFileName method also and give it the sub-Path in there.

@Override
protected String getInputFileBasedOutputFileName(JobConf conf, String Name)
{
     //your logic goes here. Simply addd the sub path to the name and return    
}

You should still implement the generateFileNameForKeyValue to convert your lead file name to the key

UPDATE: Basically this explains it all http://www.infoq.com/articles/HadoopOutputFormat

这篇关于控制MultipleOutputFormat文件子路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆