BeautifulSoup:findAll未找到标签 [英] BeautifulSoup: findAll doesn't find the tags

查看：547 发布时间：2020/9/20 8:24:19 python web-scraping beautifulsoup

本文介绍了BeautifulSoup:findAll未找到标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于我发布的许多问题感到抱歉，但是我不知道该错误的处理方法:测试此

I'm sorry about the many questions I post, but I have no idea what to do about this bug: when testing this page, with simple ps

ab=soup.find("article", {"itemprop":"articleBody"})
p=ab.findAll("p")
print(len(p))  #gives 1

有很多p标记，但是我只有第一个. 我试图将整个<article itemprop="articleBody"> html文本复制粘贴到字符串中，然后将其传递给新的BeautifulSoup对象.在该对象中搜索p给出了所有所需的标签(14).

There are many p tags, but I get only the first. I tried to copy-paste the whole <article itemprop="articleBody"> html text into a string and passed it to a new BeautifulSoup object. Searching that object for p gave all the desired tags (14).

为什么通常的方法行不通? p标签是否在此处动态加载(但html代码看起来很正常)?

Why the usual approach doesn't work? Are the p tags loaded dynamically here (but the html code looks pretty normal)?

推荐答案

问题在于解析器:

In [21]: req = requests.get("http://www.wired.com/2016/08/cape-watch-99/")

In [22]: soup = BeautifulSoup(req.content, "lxml")

In [23]: len(soup.select("article[itemprop=articleBody] p"))
Out[23]: 26

In [24]: soup = BeautifulSoup(req.content, "html.parser")

In [25]: len(soup.select("article[itemprop=articleBody] p"))
Out[25]: 1
In [26]: soup = BeautifulSoup(req.content, "html5lib")

In [27]: len(soup.select("article[itemprop=articleBody] p"))
Out[27]: 26

您可以看到 html5lib 和 lxml 获得了所有p标签，但是标准的 html.parser 也不能处理损坏的html.通过 validator.w3 来运行html文章，您会得到很多输出，尤其是:

You can see html5lib and lxml get all the p tags but the standard html.parser does not handle the broken html as well. Running the article html through validator.w3 you get a lot of output, in particular:

这篇关于BeautifulSoup:findAll未找到标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup:findAll未找到标签 [英] BeautifulSoup: findAll doesn't find the tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup:findAll未找到标签 [英] BeautifulSoup: findAll doesn&#39;t find the tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

BeautifulSoup:findAll未找到标签 [英] BeautifulSoup: findAll doesn't find the tags

登录关闭