如何为Python的跑步者指定输入文件? [英] How does one specify the input file for a runner from Python?

查看：104 发布时间：2020/5/5 15:40:16 python mapreduce mrjob

本文介绍了如何为Python的跑步者指定输入文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个外部脚本，以通过笔记本电脑(而不是Amazon Elastic Compute Cloud或任何大型集群)上的Python mrjob模块运行mapreduce作业.

I am writing an external script to run a mapreduce job via the Python mrjob module on my laptop (not on Amazon Elastic Compute Cloud or any large cluster).

我从 mrjob文档中读到，我应该使用MRJob.make_runner()来从单独的位置运行mapreduce作业python脚本如下.

I read from the mrjob documentation that I should use MRJob.make_runner() to run a mapreduce job from a separate python script as follows.

mr_job = MRYourJob(args=['-r', 'emr'])
with mr_job.make_runner() as runner:
    ...

但是，如何指定要使用的输入文件?我想在与mapreduce脚本和其他运行map reduce的python脚本相同的目录中使用文件"datalines.txt".此外，如何指定输出?

However, how do I specify which input file to use? I want to use a file "datalines.txt" in the same directory as my mapreduce script and other python script that runs the map reduce. Furthermore, how do I specify the output?

我无法在mrjob文档中找到可以指定这些参数的函数.

I could not find a function in the mrjob documentation that allows me to specify these parameters.

推荐答案

入门指南建议从标准输入或命令行提供的文件中读取输入:

Getting started guide suggests that the input is read from stdin or files supplied at the command-line:

mr_job = MRYourJob(args=["datalines.txt"])

这篇关于如何为Python的跑步者指定输入文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何为Python的跑步者指定输入文件? [英] How does one specify the input file for a runner from Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何为Python的跑步者指定输入文件? [英] How does one specify the input file for a runner from Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭