Python与oozie子进程 [英] Python subprocess with oozie
问题描述
我试图在一个 python
脚本中使用 subprocess
,我在 oozie
shell动作。 Subprocess
应该读取存储在Hadoop HDFS中的文件。
我正在使用hadoop-1.2。 1以伪分布模式和oozie-3.3.2。
这是 python
脚本,名为 connected_subprocess.py
:
#!/ usr / bin / python
导入子流程
导入networkx为nx
liste = subprocess.check_output(hadoop fs -cat /user/root/output-data/calcul-proba/final.txt ,shell = True).split('\\\
')
G = nx.DiGraph()
f = open(/ home / rlk / liste_strongly_connected.txt,wb)
for项目in:
try:
app1,app2 = item.split('\t')
G.add_edge(app1,app2)
除外:
传递
liste_connected = nx.strongly_connected_components(G)
用于liste_connected中的项目:
如果len(item)> 1:
f.write('{} \\ \\ n'.format('\t'.join(item)))
f.close()
相关在Oozie的workflow.xml中找到shell动作如下:
< action name =final>
< shell xmlns =uri:oozie:shell-action:0.1>
< job-tracker> $ {jobTracker}< / job-tracker>
< name-node> $ {nameNode}< / name-node>
<配置>
<属性>
< name> mapred.job.queue.name< / name>
<值> $ {queueName}< /值>
< / property>
< / configuration>
< exec> connected_subprocess.py< / exec>
< file> connected_subprocess.py< / file>
< / shell>
< error to =kill/>
< / action>
当我运行oozie作业时,tasktracker日志会读取这些错误:
错误:无法找到或加载主类org.apache.hadoop.fs.FsShell
Traceback(最近一次调用最后一次):
文件./connected_subprocess.py,第6行,位于< module>
liste = subprocess.check_output(hadoop fs -cat /user/root/output-data/calcul-proba/final.txt\",shell=True).split('\\\
')
文件/usr/lib64/python2.7/subprocess.py,第575行,在check_output
中引发CalledProcessError(retcode,cmd,output = output)
subprocess.CalledProcessError:命令'hadoop fs -cat /user/root/output-data/calcul-proba/final.txt'返回的非零退出状态1
失败的Oozie启动器,主类[org.apache.oozie.action.hadoop.ShellMain],退出代码[1]
似乎我无法在python脚本中运行shell命令行嵌入在oozie动作中,因为当我在我的交互式shell中运行我的python脚本时,一切正常。
有什么办法可以绕过这个限制吗?
不知您的脚本是否无法访问您的PATH环境变量(通过Oozie执行时)和在定位hadoop命令时遇到问题。你可以尝试修改你的python脚本的subprocess.check_output调用并添加完整的路径到hadoop fs命令吗?
I'm trying to use subprocess
in a python
script which I call within an oozie
shell action. Subprocess
is supposed to read a file which is stored in Hadoop's HDFS.
I'm using hadoop-1.2.1 in pseudo-distributed mode and oozie-3.3.2.
Here is the python
script, named connected_subprocess.py
:
#!/usr/bin/python
import subprocess
import networkx as nx
liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
G=nx.DiGraph()
f=open("/home/rlk/liste_strongly_connected.txt","wb")
for item in liste:
try:
app1,app2=item.split('\t')
G.add_edge(app1,app2)
except:
pass
liste_connected=nx.strongly_connected_components(G)
for item in liste_connected:
if len(item)>1:
f.write('{}\n'.format('\t'.join(item)))
f.close()
The corresponding shell action in Oozie's workflow.xml is the following :
<action name="final">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>connected_subprocess.py</exec>
<file>connected_subprocess.py</file>
</shell>
<ok to="end" />
<error to="kill" />
</action>
When I run the oozie job the tasktracker log reads theses errors:
Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Traceback (most recent call last):
File "./connected_subprocess.py", line 6, in <module>
liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'hadoop fs -cat /user/root/output-data/calcul-proba/final.txt' returned non-zero exit status 1
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
It seems that I cannot run a shell command line within my python script when the python script is embedded within an oozie action since everything works fine when I run my python script within my interactive shell.
Is there any way I can bypass this limitation ?
I wonder if your script just doesn't have access to your PATH environment variable (when executed through Oozie) and is having trouble locating the "hadoop" command. Can you try modifying your python script's subprocess.check_output call and adding the full path to the hadoop fs command?
这篇关于Python与oozie子进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!