如何在Hadoop程序的映射器中获取输入文件名? [英] How to get the input file name in the mapper in a Hadoop program?
问题描述
如何在映射器中获取输入文件的名称?我在输入目录下存储了多个输入文件,每个mapper可能读取不同的文件,我需要知道mapper读取的是哪个文件.
How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to know which file the mapper has read.
推荐答案
首先,您需要将输入拆分,使用较新的 mapreduce API 将按如下方式完成:
First you need to get the input split, using the newer mapreduce API it would be done as follows:
context.getInputSplit();
但是为了获得文件路径和文件名,您需要首先将结果类型转换为 FileSplit.
But in order to get the file path and the file name you will need to first typecast the result into FileSplit.
因此,为了获得输入文件路径,您可以执行以下操作:
So, in order to get the input file path you may do the following:
Path filePath = ((FileSplit) context.getInputSplit()).getPath();
String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString();
同样,要获取文件名,您可以调用 getName(),如下所示:
Similarly, to get the file name, you may just call upon getName(), like this:
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
这篇关于如何在Hadoop程序的映射器中获取输入文件名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!