如何为Python的跑步者指定输入文件? [英] How does one specify the input file for a runner from Python?
问题描述
我正在编写一个外部脚本,以通过笔记本电脑(而不是Amazon Elastic Compute Cloud或任何大型集群)上的Python mrjob模块运行mapreduce作业.
I am writing an external script to run a mapreduce job via the Python mrjob module on my laptop (not on Amazon Elastic Compute Cloud or any large cluster).
我从 mrjob文档中读到,我应该使用MRJob.make_runner()
来从单独的位置运行mapreduce作业python脚本如下.
I read from the mrjob documentation that I should use MRJob.make_runner()
to run a mapreduce job from a separate python script as follows.
mr_job = MRYourJob(args=['-r', 'emr'])
with mr_job.make_runner() as runner:
...
但是,如何指定要使用的输入文件?我想在与mapreduce脚本和其他运行map reduce的python脚本相同的目录中使用文件"datalines.txt".此外,如何指定输出?
However, how do I specify which input file to use? I want to use a file "datalines.txt" in the same directory as my mapreduce script and other python script that runs the map reduce. Furthermore, how do I specify the output?
我无法在mrjob文档中找到可以指定这些参数的函数.
I could not find a function in the mrjob documentation that allows me to specify these parameters.
推荐答案
入门指南建议从标准输入或命令行提供的文件中读取输入:
Getting started guide suggests that the input is read from stdin or files supplied at the command-line:
mr_job = MRYourJob(args=["datalines.txt"])
这篇关于如何为Python的跑步者指定输入文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!