子进程popen运行命令（HDFS / hadoop） [英] subprocess popen to run commands (HDFS/hadoop)

查看：411 发布时间：2018/5/31 20:03:44 python hadoop subprocess popen

本文介绍了子进程popen运行命令（HDFS / hadoop）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用 subprocess.popen 在我的机器上运行命令。

这就是我到目前为止

  cmdvec = ['/ usr / bin / hdfs'，'dfs'，'--text'，'/ data / ds_abc / clickstream / {d_20151221-2300} / *'，'|'，'wc'，'-l'] 
 
 subproc = subprocess.Popen（cmdvec，stdout = subprocess.PIPE， stdin = None，stderr = subprocess.STDOUT）

如果我在终端运行命令，

  15/12/21 16:09:31信息lzo.GPLNativeCodeLoader：加载的本地gpl库
 15 / 12/21 16:09:31 INFO lzo.LzoCodec：成功加载&初始化的本地lzo库[hadoop-lzo rev 9cd4009fb896ac12418449e4678e16eaaa3d5e0a] 
 15/12/21 16:09:31 INFO compress.CodecPool：有了全新的解压缩程序[.snappy] 
 15305

数字 15305 是我想要的值。

当我通过分割并将其转换为列表来运行该命令时，我这样做是为了获得这些行：
for subproc.stdout： print（i）

然而，这给了我数据，就好像这个命令已经运行一样，因为文件中的所有数据都被显示出来了。

/ usr / bin / hdfs dfs -text / data / ds_abc / clickstream / {d_20151221-2300} / *

t看起来像管道 | 已用于计算所有文件中的行数

解决在您的示例中，将管道 | 作为参数传递给 subprocess.Popen 并不像Bash那样创建一个流程管道。相反，管道 | 字符正被传递给一个进程的参数。

相反，您需要将两个独立的 subprocess.Popen 调用链接在一起以模拟Bash风格的管道。这个关于子流程模块的文档包含更多细节。

https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

I am trying to use subprocess.popen to run commands on my machine.

This is what I have so far

cmdvec = ['/usr/bin/hdfs', 'dfs', '-text', '/data/ds_abc/clickstream/{d_20151221-2300}/*', '|', 'wc', '-l']

subproc = subprocess.Popen(cmdvec, stdout=subprocess.PIPE, stdin=None, stderr=subprocess.STDOUT)

If I run the command in my terminal I get an output of

15/12/21 16:09:31 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
15/12/21 16:09:31 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 9cd4009fb896ac12418449e4678e16eaaa3d5e0a]
15/12/21 16:09:31 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
15305

The number 15305 is the desired value I want.

When I run the command by splitting it and converting it into a list, I do this to try to get the lines:

for i in subproc.stdout:
    print(i)

However this gives me the data as if this command was ran because all the data from the file is being displayed.

/usr/bin/hdfs dfs -text /data/ds_abc/clickstream/{d_20151221-2300}/*

It doesn't seem like the pipe | has been used to count the number of lines are in all the files

解决方案

In your example, passing the pipe | character as an argument to subprocess.Popen does not create a pipeline of processes the same way that it would in something like Bash. Instead, the pipe | character is being passed an argument to a single process.

Instead, you would need to chain together 2 separate subprocess.Popen calls to simulate a Bash-style pipeline. This documentation on the subprocess module contains more details.

https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

这篇关于子进程popen运行命令（HDFS / hadoop）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

子进程popen运行命令（HDFS / hadoop） [英] subprocess popen to run commands (HDFS/hadoop)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

子进程popen运行命令（HDFS / hadoop） [英] subprocess popen to run commands (HDFS/hadoop)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭