是否可以循环访问httplib.HTTPResponse的数据? [英] Is it possible to loop over an httplib.HTTPResponse's data?

查看:157
本文介绍了是否可以循环访问httplib.HTTPResponse的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一种非常简单的概念证明,以流方式检索和处理数据.我请求的服务器将按块发送数据,这很好,但是使用httplib遍历块时遇到了问题.

这是我正在尝试的: 导入httplib

def getData(src):
    d = src.read(1024)
    while d and len(d) > 0:
        yield d
        d = src.read(1024)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('example.com', port='8443', cert_file='...', key_file='...')
    con.putrequest('GET', '/path/to/resource')
    response = con.getresponse()

    for s in getData(response):
        print s
        raw_input() # Just to give me a moment to examine each packet

非常简单.只需打开与服务器的HTTPS连接,请求资源,然后一次获取1024个字节的结果即可.我绝对可以成功建立HTTPS连接,所以这根本不是问题.

但是,我发现对src.read(1024)的调用每次都会返回相同的内容.它只会返回响应的前1024个字节,显然不会跟踪文件中的游标.

那么我应该如何一次接收1024个字节? read()上的文档很少.我曾考虑过使用urllib或urllib2,但似乎都无法建立HTTPS连接.

HTTPS是必需的,并且我在一个相当受限制的公司环境中工作,在该环境中,诸如请求很难动手.如果可能的话,我想在Python的标准库中找到一个解决方案.

//Big Old Fat Edit

原来的代码中我只是忘记了更新d变量.我在yield循环外部进行了读取,从而对其进行了初始化,但从未在循环中进行过更改.一旦我将其重新添加到那里,它就可以很好地工作.

所以,简而言之,我只是个白痴.

解决方案

您的con.putrequest()是否有效?使用该方法执行请求还需要您调用其他方法,如在官方httplib文档中所见:

http://docs.python.org/2/library/httplib.html

作为使用上述request()方法的替代方法,您可以 也可以使用以下四个功能逐步发送您的请求 在下面.

putrequest()
putheader()
endheaders()
send()

您是否没有使用默认的HTTPConnection.request()函数的任何原因?

这是我的工作版本,改用request():

import httlplib

def getData(src, chunk_size=1024):
    d = src.read(chunk_size)
    while d:
        yield d
        d = src.read(chunk_size)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('google.com')
    con.request('GET', '/')
    response = con.getresponse()

    for s in getData(response, 8):
        print s
        raw_input() # Just to give me a moment to examine each packet

I'm trying to develop a very simple proof-of-concept to retrieve and process data in a streaming manner. The server I'm requesting from will send data in chunks, which is good, but I'm having issues using httplib to iterate through the chunks.

Here's what I'm trying: import httplib

def getData(src):
    d = src.read(1024)
    while d and len(d) > 0:
        yield d
        d = src.read(1024)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('example.com', port='8443', cert_file='...', key_file='...')
    con.putrequest('GET', '/path/to/resource')
    response = con.getresponse()

    for s in getData(response):
        print s
        raw_input() # Just to give me a moment to examine each packet

Pretty simple. Just open an HTTPS connection to server, request a resource, and grab the result, 1024 bytes at a time. I'm definitely making the HTTPS connection successfully, so that's not a problem at all.

However, what I'm finding is that the call to src.read(1024) returns the same thing every time. It only ever returns the first 1024 bytes of the response, apparently never keeping track of a cursor within the file.

So how am I supposed to receive 1024 bytes at a time? The documentation on read() is pretty sparse. I've thought about using urllib or urllib2, but neither seems to be able to make an HTTPS connection.

HTTPS is required, and I am working in a rather restricted corporate environment where packages like Requests are a bit tough to get my hands on. If possible, I'd like to find a solution within Python's standard lib.

// Big Old Fat Edit

Turns out in my original code I had simply forgot to update the d variable. I initialized it with a read outside the yield loop and never changed it in the loop. Once I added it back in there it worked perfectly.

So, in short, I'm just a big idiot.

解决方案

Is your con.putrequest() actually working? Doing a request with that method requires you to also call a bunch of other methods as you can see in the official httplib documentation:

http://docs.python.org/2/library/httplib.html

As an alternative to using the request() method described above, you can also send your request step by step, by using the four functions below.

putrequest()
putheader()
endheaders()
send()

Is there any reason why you're not using the default HTTPConnection.request() function?

Here's a working version for me, using request() instead:

import httlplib

def getData(src, chunk_size=1024):
    d = src.read(chunk_size)
    while d:
        yield d
        d = src.read(chunk_size)

if __name__ == "__main__":
    con = httplib.HTTPSConnection('google.com')
    con.request('GET', '/')
    response = con.getresponse()

    for s in getData(response, 8):
        print s
        raw_input() # Just to give me a moment to examine each packet

这篇关于是否可以循环访问httplib.HTTPResponse的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆