Python与oozie子进程 [英] Python subprocess with oozie

查看:587
本文介绍了Python与oozie子进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在一个 python 脚本中使用 subprocess ,我在 oozie shell动作。 Subprocess 应该读取存储在Hadoop HDFS中的文件。



我正在使用hadoop-1.2。 1以伪分布模式和oozie-3.3.2。

这是 python 脚本,名为 connected_subprocess.py

 #!/ usr / bin / python 

导入子流程
导入networkx为nx

liste = subprocess.check_output(hadoop fs -cat /user/root/output-data/calcul-proba/final.txt ,shell = True).split('\\\
')
G = nx.DiGraph()
f = open(/ home / rlk / liste_strongly_connected.txt,wb)
for项目in:
try:
app1,app2 = item.split('\t')
G.add_edge(app1,app2)
除外:
传递
liste_connected = nx.strongly_connected_components(G)
用于liste_connected中的项目:
如果len(item)> 1:
f.write('{} \\ \\ n'.format('\t'.join(item)))
f.close()

相关在Oozie的workflow.xml中找到shell动作如下:

 < action name =final> 
< shell xmlns =uri:oozie:shell-action:0.1>
< job-tracker> $ {jobTracker}< / job-tracker>
< name-node> $ {nameNode}< / name-node>
<配置>
<属性>
< name> mapred.job.queue.name< / name>
<值> $ {queueName}< /值>
< / property>
< / configuration>
< exec> connected_subprocess.py< / exec>
< file> connected_subprocess.py< / file>
< / shell>
< error to =kill/>
< / action>

当我运行oozie作业时,tasktracker日志会读取这些错误:

 错误:无法找到或加载主类org.apache.hadoop.fs.FsShell 
Traceback(最近一次调用最后一次):
文件./connected_subprocess.py,第6行,位于< module>
liste = subprocess.check_output(hadoop fs -cat /user/root/output-data/calcul-proba/final.txt\",shell=True).split('\\\
')
文件/usr/lib64/python2.7/subprocess.py,第575行,在check_output
中引发CalledProcessError(retcode,cmd,output = output)
subprocess.CalledProcessError:命令'hadoop fs -cat /user/root/output-data/calcul-proba/final.txt'返回的非零退出状态1
失败的Oozie启动器,主类[org.apache.oozie.action.hadoop.ShellMain],退出代码[1]

似乎我无法在python脚本中运行shell命令行嵌入在oozie动作中,因为当我在我的交互式shell中运行我的python脚本时,一切正常。



有什么办法可以绕过这个限制吗?

解决方案

不知您的脚本是否无法访问您的PATH环境变量(通过Oozie执行时)和在定位hadoop命令时遇到问题。你可以尝试修改你的python脚本的subprocess.check_output调用并添加完整的路径到hadoop fs命令吗?


I'm trying to use subprocess in a python script which I call within an oozie shell action. Subprocessis supposed to read a file which is stored in Hadoop's HDFS.

I'm using hadoop-1.2.1 in pseudo-distributed mode and oozie-3.3.2.

Here is the pythonscript, named connected_subprocess.py :

#!/usr/bin/python

import subprocess
import networkx as nx

liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
G=nx.DiGraph()
f=open("/home/rlk/liste_strongly_connected.txt","wb")
for item in liste:
    try:
        app1,app2=item.split('\t')
        G.add_edge(app1,app2)
    except:
        pass
liste_connected=nx.strongly_connected_components(G)
for item in liste_connected:
    if len(item)>1:
        f.write('{}\n'.format('\t'.join(item)))
f.close()

The corresponding shell action in Oozie's workflow.xml is the following :

 <action name="final">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>connected_subprocess.py</exec>
            <file>connected_subprocess.py</file>
         </shell>
         <ok to="end" />
         <error to="kill" />
    </action>

When I run the oozie job the tasktracker log reads theses errors:

Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Traceback (most recent call last):
  File "./connected_subprocess.py", line 6, in <module>
    liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'hadoop fs -cat /user/root/output-data/calcul-proba/final.txt' returned non-zero exit status 1
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

It seems that I cannot run a shell command line within my python script when the python script is embedded within an oozie action since everything works fine when I run my python script within my interactive shell.

Is there any way I can bypass this limitation ?

解决方案

I wonder if your script just doesn't have access to your PATH environment variable (when executed through Oozie) and is having trouble locating the "hadoop" command. Can you try modifying your python script's subprocess.check_output call and adding the full path to the hadoop fs command?

这篇关于Python与oozie子进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆