在美丽的汤结果缺少的部分 [英] Missing parts on Beautiful Soup results

查看：139 发布时间：2016/8/5 18:54:36 python beautifulsoup

本文介绍了在美丽的汤结果缺少的部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想找回一些＆LT; P＆gt;在下面的HTML code 标记。这里是它只是部分

I am trying to retrieve few <p> tags in the following html code. Here is only the part of it

<td class="eelantext">
    <a class="fBlackLink"></a>
    <center></center>
    <span> … </span><br></br>
    <table width="402" vspace="5" cellspacing="0" cellpadding="3" 
        border="0" bgcolor="#ffffff" align="Left">
    <tbody> … </tbody></table>
      <!--edstart-->
    <p> … </p>
    <p> … </p>
    <p> … </p>
    <p> … </p>
    <p> … </p>
</td>

您可以找到网页这里

我的Python code是以下

My Python code is the following

soup = BeautifulSoup(page)
div = soup.find('td', attrs={'class': 'eelantext'})
print div
text = div.find_all('p')

但文字变量是空的，如果我打印的div变量，我有完全一样的HTML从上面除了＆LT; P＆GT; 标签

推荐答案

BeautifulSoup可以使用的不同的解析器来处理HTML输入。这里的HTML输入是一个小破，默认的HTMLParser 解析器不处理得很好。

BeautifulSoup can use different parsers to handle HTML input. The HTML input here is a little broken, and the default HTMLParser parser doesn't handle it very well.

使用 html5lib 解析器来代替：

Use the html5lib parser instead:

>>> len(BeautifulSoup(r.text, 'html').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'lxml').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'html5lib').find('td', attrs={'class': 'eelantext'}).find_all('p'))
22

这篇关于在美丽的汤结果缺少的部分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在美丽的汤结果缺少的部分 [英] Missing parts on Beautiful Soup results

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在美丽的汤结果缺少的部分 [英] Missing parts on Beautiful Soup results

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭