BeautifulSoup无法正确提取div [英] BeautifulSoup not extracting div properly

查看:98
本文介绍了BeautifulSoup无法正确提取div的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BeautifulSoup没有正确提取我想要的div.我不确定自己在做什么错.这是html:

BeautifulSoup is not extracting the div I want properly. I am not sure what I am doing wrong. Here is the html:

                <div id='display'>
                      <div class='result'>
                           <div>text0 </p></div>
                           <div>text1</div>
                           <div>text2</div>
                       </div>
                  </div>

这是我的代码:

div = soup.find("div", {"class": "result"})
print(div)

我看到了:

<div class="result">
<div>text0 </div></div>

我期望的是:

<div class="result">
<div>text0</div>
<div>text1</div>
<div>text2</div>
</div>

如果我删除</p> 标记,这将按预期工作.换句话说,</p> 标记似乎正在抛弃解析器.

This works as expected if I remove the </p> tag. In other words, the </p> tag seems to be throwing the parser off.

这可以按预期在Python 2.7.12,beautifulsoup4版本4.5.1上工作.但在Python 3.6.4,beautifulsoup4版本4.7.1上不起作用.不知道罪魁祸首是python版本还是bs4版本(更有可能).

This works as expected on Python 2.7.12, beautifulsoup4 version 4.5.1. But does not work on Python 3.6.4, beautifulsoup4 version 4.7.1. Not sure if the culprit is python version or bs4 version (more likely).

有人可以帮忙吗?

推荐答案

我认为使用select没问题

I see no problem using select

from bs4 import BeautifulSoup as bs
html = '''
<div id='display'>
                      <div class='result'>
                           <div>text0 </p></div>
                           <div>text1</div>
                           <div>text2</div>
                       </div>
                  </div>
                  '''
soup = bs(html)
soup.select('.result')

这篇关于BeautifulSoup无法正确提取div的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆