如何处理IncompleteRead：Python中 [英] How to handle IncompleteRead: in python

查看：1248 发布时间：2016/8/5 18:55:39 python python-2.7 web-scraping beautifulsoup mechanize

本文介绍了如何处理IncompleteRead：Python中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从一个网站获取一些数据。然而，它返回我不完整的读。我想获得的数据是一个巨大的一套嵌套链接。我做了一些研究，并在网上发现，这可能是由于服务器错误（A块传输编码前整理
达到预期大小）。我还发现在这上面一个解决办法<一href=\"http://bobrochel.blogspot.in/2010/11/bad-servers-chunked-encoding-and.html?showComment=1358777800048\">link

I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some research online and found that this might be due to a server error (A chunked transfer encoding finishing before reaching the expected size). I also found a workaround for above on this link

不过，我不知道对如何使用这个我的情况。以下是code我的工作。

However, I am not sure as to how to use this for my case. Following is the code I am working on

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)')]
urls = "http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brands"
page = urllib2.urlopen(urls).read()
soup = BeautifulSoup(page)
links = soup.findAll('img',url=True)

for tag in links:
    name = tag['alt']
    tag['url'] = urlparse.urljoin(urls, tag['url'])
    r = br.open(tag['url'])
    page_child = br.response().read()
    soup_child = BeautifulSoup(page_child)
    contracts = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "tariff-duration"})]
    data_usage = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "allowance"})]
    print contracts
    print data_usage

请帮我this.Thanks

Please help me with this.Thanks

推荐答案

的<一个href=\"http://bobrochel.blogspot.in/2010/11/bad-servers-chunked-encoding-and.html?showComment=1358777800048\">link包括你在你的问题很简单，就是执行的urllib的read（）函数，它捕获任何不完整的读例外，你的包装。如果你不想要实现这整个补丁，你可以永远只是扔在一个try / catch循环，你读你的链接。例如：

The link you included in your question is simply a wrapper that executes urllib's read() function, which catches any incomplete read exceptions for you. If you don't want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example:

try:
    page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
    page = e.partial

这篇关于如何处理IncompleteRead：Python中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何处理IncompleteRead：Python中 [英] How to handle IncompleteRead: in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何处理IncompleteRead：Python中 [英] How to handle IncompleteRead: in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭