python,没有得到完整的响应 [英] python,not getting full response

查看:31
本文介绍了python,没有得到完整的响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我想使用 urllib2 获取页面时,我没有获取完整页面.

when I want to get the page using urllib2, I don't get the full page.

这是python中的代码:

here is the code in python:

import urllib2
import urllib
import socket
from bs4 import BeautifulSoup
# define the frequency for http requests
socket.setdefaulttimeout(5)

    # getting the page
def get_page(url):
    """ loads a webpage into a string """
    src = ''

    req = urllib2.Request(url)

    try:
        response = urllib2.urlopen(req)
        src = response.read()
        response.close()
    except IOError:
        print 'can\'t open',url 
        return src

    return src

def write_to_file(soup):
    ''' i know that I should use try and catch'''
    # writing to file, you can check if you got the full page
    file = open('output','w')
    file.write(str(soup))
    file.close()



if __name__ == "__main__":
            # this is the page that I'm trying to get
    url = 'http://www.imdb.com/title/tt0118799/'
    src = get_page(url)

    soup = BeautifulSoup(src)

    write_to_file(soup)    # open the file and see what you get
    print "end"

我整个星期都在努力寻找问题!!为什么我没有得到完整的页面?

I have struggling to find the problem the whole week !! why I don't get the full page?

感谢帮助

推荐答案

你可能需要多次调用 read,只要它不返回一个表示 EOF 的空字符串:

You might have to call read multiple times, as long as it does not return an empty string indicating EOF:

def get_page(url):
    """ loads a webpage into a string """
    src = ''

    req = urllib2.Request(url)

    try:
        response = urllib2.urlopen(req)
        chunk = True
        while chunk:
            chunk = response.read(1024)
            src += chunk
        response.close()
    except IOError:
        print 'can\'t open',url 
        return src

    return src

这篇关于python,没有得到完整的响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆