在流式传输hadoop程序中获取输入文件名称 [英] Get input file name in streaming hadoop program
问题描述
有没有相应的方法可以做到这一点当我用Python编写程序时(使用流?)
我在apache的hadoop streaming文档中发现了以下内容:
请参阅已组态的参数。在执行流式作业期间,
将转换mapred参数的名称。点(。)
变为下划线(_)。例如,mapred.job.id变成
mapred_job_id,而mapred.jar变成mapred_jar。在你的代码中,使用带有下划线的
参数名。
但是我仍然无法理解如何使用这个在我的映射器中。
任何帮助都非常感谢。
谢谢
Hadoop将作业配置参数设置为Streaming程序的环境变量。但是,它会用下划线替换非字母数字字符,以确保它们是有效的名称。以下Python表达式说明了如何从Python Streaming脚本中检索mapred.job.id属性的值:
$ b os.environ [mapred_job_id]
您还可以通过将-cmdenv选项应用于Streaming启动器程序(您希望设置的每个变量一次),为MapReduce启动的Streaming进程设置环境变量。例如,以下设置MAGIC_PARAMETER环境变量:
-cmdenv MAGIC_PARAMETER = abracadabra
I am able to find the name if the input file in a mapper class using FileSplit when writing the program in Java.
Is there a corresponding way to do this when I write a program in Python (using streaming?)
I found the following in the hadoop streaming document on apache:
See Configured Parameters. During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. In your code, use the parameter names with the underscores.
But I still cant understand how to make use of this inside my mapper.
Any help is highly appreciated.
Thanks
According to the "Hadoop : The Definitive Guide"
Hadoop sets job configuration parameters as environment variables for Streaming programs. However, it replaces non-alphanumeric character with underscores to make sure they are valid names. The following Python expression illustrates how you can retrieve the value of the mapred.job.id property from within a Python Streaming script:
os.environ["mapred_job_id"]
You can also set environment variables for the Streaming process launched by MapReduce by applying the -cmdenv option to the Streaming launcher program (once for each variable you wish to set). For example, the following sets the MAGIC_PARAMETER environment variable:
-cmdenv MAGIC_PARAMETER=abracadabra
这篇关于在流式传输hadoop程序中获取输入文件名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!