BeautifulSoup无法解析长视图状态 [英] BeautifulSoup fails to parse long view state

查看：227 发布时间：2016/8/5 19:01:24 python html-parsing beautifulsoup

本文介绍了BeautifulSoup无法解析长视图状态的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用BeautifulSoup4解析从 HTTP检索到的HTML：//exporter.nih 。州长/ ExPORTER_Catalog.aspx指数= 0 如果我打印出来的结果汤，它最终是这样的：

I try to use BeautifulSoup4 to parse the html retrieved from http://exporter.nih.gov/ExPORTER_Catalog.aspx?index=0 If I print out the resulting soup, it ends like this:

kZXI9IjAi"/></form></body></html>

在搜索中的原始HTML的最后一个字符 9IjaI ，我发现，这是一个巨大的视图状态的中间。 BeautifulSoup似乎与此一问题。任何暗示我可能是错的做或如何分析这样一个页面？

Searching for the last characters 9IjaI in the raw html, I found that it's in the middle of a huge viewstate. BeautifulSoup seems to have a problem with this. Any hint what I might be doing wrong or how to parse such a page?

推荐答案

BeautifulSoup使用的可插拔的HTML解析器打造'汤';你需要尝试不同的解析器，因为每次都会不同的方式对待一个破碎的页面。

BeautifulSoup uses a pluggable HTML parser to build the 'soup'; you need to try out different parsers, as each will treat a broken page differently.

我没有问题解析该页面与任何解析器，但是：

I had no problems parsing that page with any of the parsers, however:

>>> from beautifulsoup4 import BeautifulSoup
>>> import requests
>>> r = requests.get('http://exporter.nih.gov/ExPORTER_Catalog.aspx?index=0')
>>> for parser in ('html.parser', 'lxml', 'html5lib'):
...     print repr(str(BeautifulSoup(r.text, parser))[-60:])
... 
';\r\npageTracker._trackPageview();\r\n</script>\n</body>\n</html>\n'
'();\r\npageTracker._trackPageview();\r\n</script>\n</body></html>'
'();\npageTracker._trackPageview();\n</script>\n\n\n</body></html>'

请确保您已安装最新的 BeautifulSoup4 包，我已经在4.1系列4.2解决见到一致的问题。

Make sure you have the latest BeautifulSoup4 package installed, I have seen consistent problems in the 4.1 series solved in 4.2.

这篇关于BeautifulSoup无法解析长视图状态的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup无法解析长视图状态 [英] BeautifulSoup fails to parse long view state

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup无法解析长视图状态 [英] BeautifulSoup fails to parse long view state

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭