读取/写入文件从hdfs使用python与子进程，管道，Popen给出错误 [英] Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error

查看：162 发布时间：2018/5/31 18:48:01 python hadoop hdfs popen hadoop-streaming

本文介绍了读取/写入文件从hdfs使用python与子进程，管道，Popen给出错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图读取（打开）并在python脚本中写入hdfs中的文件。但有错误。有人可以告诉我这里有什么问题。

Code（full）：sample.py

 ＃！/ usr / bin / python 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 $ = Popen（[hadoop，fs，-cat，./sample.txt]，
 stdout = PIPE）
 
 print循环1后
 put = Popen（[hadoop，fs，-put， - ，./modifiedfile.txt]，
 stdin = PIPE）
 
 print 循环2后
为cat.stdout中的行：
 line + =blah
打印行
打印Inside Loop
 put.stdin.write （）
 
 cat.stdout.close（）
 cat.wait（）
 put.stdin.close（）
 put.wait（）

执行时：

  hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper'./sample.py'-input sample。 txt -output fileRead

它执行正常我无法找到文件wh ich应该在hdfs修改文件中创建

并且当我执行时：

hadoop fs -getmerge ./fileRead/file.txt
在file.txt中，我得到：
循环前循环前循环1后循环1后 After Loop 2 After Loop 2
有人可以告诉我我在做什么??我不认为它是从sample.txt中读取的。尝试改变你的put $ c>子进程通过改变这个
<$ c来自行获取 cat stdout $ c> put = Popen（[hadoop，fs，-put， - ，./modifiedfile.txt]， stdin = PIPE） put = Popen （[hadoop，fs，-put， - ，./modifiedfile.txt]， stdin = cat.stdout）完整脚本：＃！/ usr / bin / python $ -cat，./sample.txt]， stdout = PIPE） print循环1后 put = Popen（[hadoop， fs，-put， - ，./modifiedfile.txt]， stdin = cat.stdout） put.communicate（） I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here. Code (full): sample.py #!/usr/bin/python from subprocess import Popen, PIPE print "Before Loop" cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"], stdout=PIPE) print "After Loop 1" put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=PIPE) print "After Loop 2" for line in cat.stdout: line += "Blah" print line print "Inside Loop" put.stdin.write(line) cat.stdout.close() cat.wait() put.stdin.close() put.wait() When I execute : hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile And When I execute : hadoop fs -getmerge ./fileRead/ file.txt Inside the file.txt, I got : Before Loop Before Loop After Loop 1 After Loop 1 After Loop 2 After Loop 2 Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt 解决方案 Try to change your put sub process to take the cat stdout on its own by changing this put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=PIPE) into this put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=cat.stdout) Full script: #!/usr/bin/python from subprocess import Popen, PIPE print "Before Loop" cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"], stdout=PIPE) print "After Loop 1" put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=cat.stdout) put.communicate() 这篇关于读取/写入文件从hdfs使用python与子进程，管道，Popen给出错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

读取/写入文件从hdfs使用python与子进程，管道，Popen给出错误 [英] Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

读取/写入文件从hdfs使用python与子进程，管道，Popen给出错误 [英] Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭