我想调用HDFS REST API来上传文件 [英] I want to call HDFS REST api to upload a file

查看:202
本文介绍了我想调用HDFS REST API来上传文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 httplib 来调用HDFS REST api来上传文件。我的程序创建了文件,但没有内容。



=========================== ==========================



这是我的代码:

  import httplib 

conn = httplib.HTTPConnection(localhost:50070)
conn.request PUT,/ webhdfs / v1 / levi / 4?op = CREATE)
res = conn.getresponse()
print res.status,res.reason
conn.close )

conn = httplib.HTTPConnection(localhost:50075)
conn.connect()
conn.putrequest(PUT,/ webhdfs / v1 / levi / 4'op = CREATE& user.name = levi)
conn.endheaders()
a_file = open(/ home / levi / 4,rb)
a_file.seek (0)
data = a_file.read()
conn.send(data)
res = conn.getresponse()
print res.status,res.reason
conn.close()

=============== ===================================



这是返回:




307 TEMPORARY_REDIRECT
201创建

========== ===============================================



好的,文件已创建,但没有内容发送。



当我评论 #conn.send(data),结果是一样的,仍然没有内容。

也许文件读取或发送错误,不知道。



你知道这是怎么发生的吗?

解决方案

它看起来像你的代码没有使用第二个PUT请求中307的位置头。



我一直在研究python WebHDFS包装器这可能是有用的,你可以在这里看到完整的代码: https ://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py



您感兴趣的方法是:

  def copyfroml 
url_path = WEBHDFS_CONTEXT_ROOT + target_path +'?op = CREATE& overwrite ='+'true'if overwrite else'false'
(自我,source_path,target_path,replication = 1,overwrite = True):

with _NameNodeHTTPClient('PUT',url_path,self.namenode_host,self.namenode_port,self.username)作为回应:
logger.debug(HTTP响应:%d,%s%(响应。状态,response.reason))
redirect_location = response.msg [location]
logger.debug(HTTP位置:%s%redirect_location)
(redirect_host,redirect_port,redirect_path, query)= self.parse_url(redirect_location)

#WebHDFS中的错误0.20.205 =>需要param,否则抛出NullPointerException
redirect_path = redirect_path +? + query +& replication =+ str(复制)
$ b $ logger.debug(重定向:主机:%s,端口:%s,路径:%s%(redirect_host,redirect_port, redirect_path))
fileUploadClient = HTTPConnection(redirect_host,redirect_port,timeout = 600)

#这需要当前Python 2.6或更高版本
fileUploadClient.request('PUT',redirect_path,open (source_path,r).read(),headers = {})
response = fileUploadClient.getresponse()
logger.debug(HTTP响应:%d,%s%(响应。状态,response.reason))
fileUploadClient.close()

返回json.loads(response.read())


I want to call HDFS REST api to upload a file using httplib.

My program created the file, but no content is in it.

=====================================================

Here is my code:

import httplib

conn=httplib.HTTPConnection("localhost:50070")
conn.request("PUT","/webhdfs/v1/levi/4?op=CREATE")
res=conn.getresponse()
print res.status,res.reason
conn.close()

conn=httplib.HTTPConnection("localhost:50075")
conn.connect()
conn.putrequest("PUT","/webhdfs/v1/levi/4?op=CREATE&user.name=levi")
conn.endheaders()
a_file=open("/home/levi/4","rb")
a_file.seek(0)
data=a_file.read()
conn.send(data)
res=conn.getresponse()
print res.status,res.reason
conn.close()

==================================================

Here is the return:

307 TEMPORARY_REDIRECT 201 Created

=========================================================

OK, the file was created, but no content was sent.

When I comment the #conn.send(data), the result is the same, still no content.

Maybe the file read or the send is wrong, not sure.

Do you know how this happened?

解决方案

It looks like your code is not using the "location" header from the 307 in the second PUT request.

I've been working on a fork of a python WebHDFS wrapper that may be of use, you can see the full code here: https://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py

The method you'd be interested in is:

def copyfromlocal(self, source_path, target_path, replication=1, overwrite=True):
    url_path = WEBHDFS_CONTEXT_ROOT + target_path + '?op=CREATE&overwrite=' + 'true' if overwrite else 'false'

    with _NameNodeHTTPClient('PUT', url_path, self.namenode_host, self.namenode_port, self.username) as response:
        logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
        redirect_location = response.msg["location"]
        logger.debug("HTTP Location: %s" % redirect_location)
        (redirect_host, redirect_port, redirect_path, query) = self.parse_url(redirect_location)

        # Bug in WebHDFS 0.20.205 => requires param otherwise a NullPointerException is thrown
        redirect_path = redirect_path + "?" + query + "&replication=" + str(replication)

        logger.debug("Redirect: host: %s, port: %s, path: %s " % (redirect_host, redirect_port, redirect_path))
        fileUploadClient = HTTPConnection(redirect_host, redirect_port, timeout=600)

        # This requires currently Python 2.6 or higher
        fileUploadClient.request('PUT', redirect_path, open(source_path, "r").read(), headers={})
        response = fileUploadClient.getresponse()
        logger.debug("HTTP Response: %d, %s" % (response.status, response.reason))
        fileUploadClient.close()

        return json.loads(response.read())

这篇关于我想调用HDFS REST API来上传文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆