美丽的汤没有解析这个HTML [英] Beautiful soup failing to parse this HTML

查看：214 发布时间：2016/8/5 19:17:44 python html-parsing beautifulsoup

本文介绍了美丽的汤没有解析这个HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在使用美丽的汤成功解析许多网站，但也有少数被赋予我们的问题。一个例子是此页：

We're using Beautiful Soup to parse many websites successfully, but a few are given us problems. An example is this page:

<一个href=\"http://www.designsponge.com/2013/04/biz-ladies-how-to-use-networking-to-improve-your-search-engine-rankings.html\" rel=\"nofollow\">http://www.designsponge.com/2013/04/biz-ladies-how-to-use-networking-to-improve-your-search-engine-rankings.html

我们正在喂养的确切来源美丽的汤，但它返回一个HTML发育不良字符串，虽然没有错误...

We're feeding the exact source to beautiful soup, but it returns a stunted HTML string, though no errors...

code：

soup = BeautifulSoup(site_html)
print str(soup.html)

结果：

<html class="no-js" lang="en"> <!--&lt;![endif]--> </html>

我试图确定是什么绊倒了起来，但没有跳出我看html源代码。有没有人有一些见解？

I'm trying to determine what's tripping it up, but nothing jumps out at me looking at the html source. Does anyone have some insight?

推荐答案

尝试不同的解析器，页面解析精细与 html5lib 解析：

Try different parsers, the page parses fine with the html5lib parser:

>>> soup = BeautifulSoup(r.content, 'html5')
>>> len(soup.find_all('li'))
97

不是所有的解析器可以把破碎的HTML一样的。

Not all parsers can treat broken HTML the same.

这篇关于美丽的汤没有解析这个HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美丽的汤没有解析这个HTML [英] Beautiful soup failing to parse this HTML

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

美丽的汤没有解析这个HTML [英] Beautiful soup failing to parse this HTML

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭