我想调用HDFS REST API来上传文件 [英] I want to call HDFS REST api to upload a file
问题描述
我想使用 httplib
来调用HDFS REST api来上传文件。我的程序创建了文件,但没有内容。
=========================== ==========================
这是我的代码:
import httplib
conn = httplib.HTTPConnection(localhost:50070)
conn.request PUT,/ webhdfs / v1 / levi / 4?op = CREATE)
res = conn.getresponse()
print res.status,res.reason
conn.close )
conn = httplib.HTTPConnection(localhost:50075)
conn.connect()
conn.putrequest(PUT,/ webhdfs / v1 / levi / 4'op = CREATE& user.name = levi)
conn.endheaders()
a_file = open(/ home / levi / 4,rb)
a_file.seek (0)
data = a_file.read()
conn.send(data)
res = conn.getresponse()
print res.status,res.reason
conn.close()
=============== ===================================
这是返回:
307 TEMPORARY_REDIRECT
201创建
========== ===============================================
好的,文件已创建,但没有内容发送。
当我评论 #conn.send(data)
,结果是一样的,仍然没有内容。
也许文件读取或发送错误,不知道。
你知道这是怎么发生的吗?
它看起来像你的代码没有使用第二个PUT请求中307的位置头。
我一直在研究python WebHDFS包装器这可能是有用的,你可以在这里看到完整的代码: https ://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py
您感兴趣的方法是:
def copyfroml
url_path = WEBHDFS_CONTEXT_ROOT + target_path +'?op = CREATE& overwrite ='+'true'if overwrite else'false'
(自我,source_path,target_path,replication = 1,overwrite = True):
with _NameNodeHTTPClient('PUT',url_path,self.namenode_host,self.namenode_port,self.username)作为回应:
logger.debug(HTTP响应:%d,%s%(响应。状态,response.reason))
redirect_location = response.msg [location]
logger.debug(HTTP位置:%s%redirect_location)
(redirect_host,redirect_port,redirect_path, query)= self.parse_url(redirect_location)
#WebHDFS中的错误0.20.205 =>需要param,否则抛出NullPointerException
redirect_path = redirect_path +? + query +& replication =+ str(复制)
$ b $ logger.debug(重定向:主机:%s,端口:%s,路径:%s%(redirect_host,redirect_port, redirect_path))
fileUploadClient = HTTPConnection(redirect_host,redirect_port,timeout = 600)
#这需要当前Python 2.6或更高版本
fileUploadClient.request('PUT',redirect_path,open (source_path,r).read(),headers = {})
response = fileUploadClient.getresponse()
logger.debug(HTTP响应:%d,%s%(响应。状态,response.reason))
fileUploadClient.close()
返回json.loads(response.read())
I want to call HDFS REST api to upload a file using httplib
.
My program created the file, but no content is in it.
=====================================================
Here is my code:
import httplib
conn=httplib.HTTPConnection("localhost:50070")
conn.request("PUT","/webhdfs/v1/levi/4?op=CREATE")
res=conn.getresponse()
print res.status,res.reason
conn.close()
conn=httplib.HTTPConnection("localhost:50075")
conn.connect()
conn.putrequest("PUT","/webhdfs/v1/levi/4?op=CREATE&user.name=levi")
conn.endheaders()
a_file=open("/home/levi/4","rb")
a_file.seek(0)
data=a_file.read()
conn.send(data)
res=conn.getresponse()
print res.status,res.reason
conn.close()
==================================================
Here is the return:
307 TEMPORARY_REDIRECT 201 Created
=========================================================
OK, the file was created, but no content was sent.
When I comment the #conn.send(data)
, the result is the same, still no content.
Maybe the file read or the send is wrong, not sure.
Do you know how this happened?
It looks like your code is not using the "location" header from the 307 in the second PUT request.
I've been working on a fork of a python WebHDFS wrapper that may be of use, you can see the full code here: https://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py
The method you'd be interested in is:
def copyfromlocal(self, source_path, target_path, replication=1, overwrite=True):
url_path = WEBHDFS_CONTEXT_ROOT + target_path + '?op=CREATE&overwrite=' + 'true' if overwrite else 'false'
with _NameNodeHTTPClient('PUT', url_path, self.namenode_host, self.namenode_port, self.username) as response:
logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
redirect_location = response.msg["location"]
logger.debug("HTTP Location: %s" % redirect_location)
(redirect_host, redirect_port, redirect_path, query) = self.parse_url(redirect_location)
# Bug in WebHDFS 0.20.205 => requires param otherwise a NullPointerException is thrown
redirect_path = redirect_path + "?" + query + "&replication=" + str(replication)
logger.debug("Redirect: host: %s, port: %s, path: %s " % (redirect_host, redirect_port, redirect_path))
fileUploadClient = HTTPConnection(redirect_host, redirect_port, timeout=600)
# This requires currently Python 2.6 or higher
fileUploadClient.request('PUT', redirect_path, open(source_path, "r").read(), headers={})
response = fileUploadClient.getresponse()
logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
fileUploadClient.close()
return json.loads(response.read())
这篇关于我想调用HDFS REST API来上传文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!