读取/写入文件从hdfs使用python与子进程,管道,Popen给出错误 [英] Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error

查看:162
本文介绍了读取/写入文件从hdfs使用python与子进程,管道,Popen给出错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图读取(打开)并在python脚本中写入hdfs中的文件。但有错误。有人可以告诉我这里有什么问题。



Code(full):sample.py

 #!/ usr / bin / python 




















$ = Popen([hadoop,fs,-cat,./sample.txt],
stdout = PIPE)

print循环1后
put = Popen([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = PIPE)

print 循环2后
为cat.stdout中的行:
line + =blah
打印行
打印Inside Loop
put.stdin.write ()

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

执行时:

  hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper'./sample.py'-input sample。 txt -output fileRead 

它执行正常我无法找到文件wh ich应该在hdfs修改文件中创建



并且当我执行时:

  hadoop fs -getmerge ./fileRead/file.txt 

在file.txt中,我得到:

 循环前
循环前
循环1后
循环1后
After Loop 2
After Loop 2

有人可以告诉我我在做什么??我不认为它是从sample.txt中读取的。尝试改变你的 put $ c>子进程通过改变这个

 <$ c来自行获取 cat  stdout $ c> put = Popen([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = PIPE)





  put = Popen ([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = cat.stdout)

完整脚本:

 #!/ usr / bin / python 




















$ -cat,./sample.txt],
stdout = PIPE)

print循环1后
put = Popen([hadoop, fs,-put, - ,./modifiedfile.txt],
stdin = cat.stdout)
put.communicate()


I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.

Code (full): sample.py

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

print "After Loop 2"
for line in cat.stdout:
    line += "Blah"
    print line
    print "Inside Loop"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

When I execute :

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile

And When I execute :

 hadoop fs -getmerge ./fileRead/ file.txt

Inside the file.txt, I got :

Before Loop 
Before Loop 
After Loop 1    
After Loop 1    
After Loop 2    
After Loop 2

Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt

解决方案

Try to change your put sub process to take the cat stdout on its own by changing this

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

into this

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)

Full script:

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
put.communicate()

这篇关于读取/写入文件从hdfs使用python与子进程,管道,Popen给出错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆