如何为Python的跑步者指定输入文件? [英] How does one specify the input file for a runner from Python?

查看:104
本文介绍了如何为Python的跑步者指定输入文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个外部脚本,以通过笔记本电脑(而不是Amazon Elastic Compute Cloud或任何大型集群)上的Python mrjob模块运行mapreduce作业.

I am writing an external script to run a mapreduce job via the Python mrjob module on my laptop (not on Amazon Elastic Compute Cloud or any large cluster).

我从 mrjob文档中读到,我应该使用MRJob.make_runner()来从单独的位置运行mapreduce作业python脚本如下.

I read from the mrjob documentation that I should use MRJob.make_runner() to run a mapreduce job from a separate python script as follows.

mr_job = MRYourJob(args=['-r', 'emr'])
with mr_job.make_runner() as runner:
    ...

但是,如何指定要使用的输入文件?我想在与mapreduce脚本和其他运行map reduce的python脚本相同的目录中使用文件"datalines.txt".此外,如何指定输出?

However, how do I specify which input file to use? I want to use a file "datalines.txt" in the same directory as my mapreduce script and other python script that runs the map reduce. Furthermore, how do I specify the output?

我无法在mrjob文档中找到可以指定这些参数的函数.

I could not find a function in the mrjob documentation that allows me to specify these parameters.

推荐答案

入门指南建议从标准输入或命令行提供的文件中读取输入:

Getting started guide suggests that the input is read from stdin or files supplied at the command-line:

mr_job = MRYourJob(args=["datalines.txt"])

这篇关于如何为Python的跑步者指定输入文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆