hadoop distcp使用subprocess.Popen [英] hadoop distcp using subprocess.Popen

查看:126
本文介绍了hadoop distcp使用subprocess.Popen的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用subprocess.Popen在python中运行hadoop distcp命令并获取错误-输入无效.如果我作为Hadoop命令运行,则同一命令运行良好.

I am trying to run hadoop distcp command using subprocess.Popen in python and get error - Invalid input. The same command runs fine if I run as Hadoop command.

Hadoop命令:

hadoop distcp -log /user/name/distcp_log -skipcrccheck -update hdfs://xxxxx:8020/sourceDir hdfs://xxxxx:8020/destDir

在python中:

from subprocess import Popen, PIPE
proc1 = Popen(['hadoop','distcp','-log /user/name/distcp_log -skipcrccheck -update', 'hdfs://xxxxx:8020/sourceDir', 'hdfs://xxxxx:8020/destDir'], stdout=subprocess.PIPE)

日志消息:

INFO tools.OptionsParser: parseChunkSize: blocksperchunk false
INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[-log /user/name/distcp_log -skipcrccheck -update, hdfs://xxxxx:8020/sourceDir], targetPath=hdfs://xxxxx:8020/destDir, targetPathExists=true, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192}
ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: -log /user/name/distcp_log -skipcrccheck -update doesn't exist

它将选项视为源目录.如何告诉子进程这些是选项,而不应视为源(sourcePaths = [-log/user/name/distcp_log -skipcrccheck -update,hdfs://xxxxx:8020/sourceDir])?

It considering the options as the source directory. How to tell the subprocess these are options and should not be considered as source(sourcePaths=[-log /user/name/distcp_log -skipcrccheck -update, hdfs://xxxxx:8020/sourceDir] )?

我正在使用Python2.7,但无权访问pip安装及其Kerberos群集.想要运行此命令以进行集群内传输,但在此之前想在群集中尝试使用此简单命令.

I am using Python2.7 and do not have access to pip install and its Kerberos cluster. Wanted to run this command for intra-cluster transfer but before that wanted to try this simple command within cluster.

谢谢

推荐答案

将命令行的 all 个参数拆分为Popen第一个参数列表的单独元素:

Split all arguments of your command line into separate elements of Popen first argument list:

from subprocess import Popen, PIPE
proc1 = Popen(['hadoop','distcp','-log', '/user/name/distcp_log', '-skipcrccheck', '-update', 'hdfs://xxxxx:8020/sourceDir', 'hdfs://xxxxx:8020/destDir'], stdout=subprocess.PIPE)

在这里,您可以找到 Popen 文档,说 args 应该是通过用''拆分所有参数而创建的列表.

Here you can find Popen documentation, saying that args should be a list created by splitting all arguments by ' '.

这篇关于hadoop distcp使用subprocess.Popen的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆