读取/写入文件从hdfs使用python与子进程,管道,Popen给出错误 [英] Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error
问题描述
我试图读取(打开)并在python脚本中写入hdfs中的文件。但有错误。有人可以告诉我这里有什么问题。
Code(full):sample.py
#!/ usr / bin / python
$ = Popen([hadoop,fs,-cat,./sample.txt],
stdout = PIPE)
print循环1后
put = Popen([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = PIPE)
print 循环2后
为cat.stdout中的行:
line + =blah
打印行
打印Inside Loop
put.stdin.write ()
cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()
执行时:
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper'./sample.py'-input sample。 txt -output fileRead
它执行正常我无法找到文件wh ich应该在hdfs修改文件中创建
并且当我执行时:
hadoop fs -getmerge ./fileRead/file.txt
在file.txt中,我得到:
循环前
循环前
循环1后
循环1后
After Loop 2
After Loop 2
有人可以告诉我我在做什么??我不认为它是从sample.txt中读取的。尝试改变你的 put $ c> $ c>子进程通过改变这个
<$ c来自行获取cat
stdout $ c> put = Popen([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = PIPE)
$ c
put = Popen ([hadoop,fs,-put, - ,./modifiedfile.txt],
stdin = cat.stdout)
完整脚本:
#!/ usr / bin / python
$ -cat,./sample.txt],
stdout = PIPE)
print循环1后
put = Popen([hadoop, fs,-put, - ,./modifiedfile.txt],
stdin = cat.stdout)
put.communicate()
I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.
Code (full): sample.py
#!/usr/bin/python from subprocess import Popen, PIPE print "Before Loop" cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"], stdout=PIPE) print "After Loop 1" put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=PIPE) print "After Loop 2" for line in cat.stdout: line += "Blah" print line print "Inside Loop" put.stdin.write(line) cat.stdout.close() cat.wait() put.stdin.close() put.wait()
When I execute :
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead
It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile
And When I execute :
hadoop fs -getmerge ./fileRead/ file.txt
Inside the file.txt, I got :
Before Loop Before Loop After Loop 1 After Loop 1 After Loop 2 After Loop 2
Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt
解决方案Try to change your
put
sub process to take thecat
stdout on its own by changing thisput = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=PIPE)
into this
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=cat.stdout)
Full script:
#!/usr/bin/python from subprocess import Popen, PIPE print "Before Loop" cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"], stdout=PIPE) print "After Loop 1" put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=cat.stdout) put.communicate()
这篇关于读取/写入文件从hdfs使用python与子进程,管道,Popen给出错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!