为什么 bs4 返回标签，然后返回一个空列表到这个 find_all() 方法? [英] Why does bs4 return tags and then an empty list to this find_all() method?

查看：30 发布时间：2021/12/17 14:10:08 python html web-scraping beautifulsoup

本文介绍了为什么 bs4 返回标签，然后返回一个空列表到这个 find_all() 方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

查看美国人口普查 QFD 我试图抓住按县划分的种族百分比.我正在构建的循环超出了我的问题范围，这涉及此代码:

url = 'http://quickfacts.census.gov/qfd/states/48/48507.html'#德克萨斯州的最后一个县；出于某种原因，qfd # 的县只有奇数页面 = urllib2.urlopen(url)汤 = BeautifulSoup(页面)c_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[0] #c = 县 %s_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[1] #s = state %

抓取包含其标签的 html 元素，而不仅仅是其中的文本:

c_black_alone, s_black_alone(<td align="right" headers="rp9 p1" valign="bottom">96.9%<sup></sup></td>,<td align="right" headers="rp9 p2" valign="bottom">80.3%<sup></sup></td>)

在 ^ 以上，我只想要元素内的 %...

此外，为什么

test_black = soup.find_all("td", text = "Black")

不返回与上面相同的元素(或其文本)，而是返回一个空的 bs4 ResultSet 对象?(我一直在关注文档，所以我希望这个问题看起来不太模糊......)

解决方案

要从这些匹配中获取文本，请使用 .text 获取所有包含的文本:<预><代码>>>>汤.find_all("td", attrs={'headers':'rp9'})[0].text你'96.9%'>>>汤.find_all("td", attrs={'headers':'rp9'})[1].text你'80.3%'

您的 text 搜索不匹配任何内容，原因有两个:

文字字符串只匹配整个包含的文本，而不是部分匹配.它仅适用于 <td>Black</td> 作为 sole 内容的元素.
它将使用 .string 属性，但仅当文本是给定元素的 only 子元素时才设置该属性.如果存在其他元素，搜索将完全失败.

解决这个问题的方法是使用 lambda 来代替；它将传递整个元素，您可以验证每个元素:

soup.find_all(lambda e: e.name == 'td' and 'Black' in e.text)

演示:

<预><代码>>>>汤.find_all(lambda e: e.name == 'td' and 'Black' in e.text)[<td id="rp10" valign="top">黑人或非裔美国人，百分比，2013 (a) </td>, <td id="re6" valign="top">黑人拥有的公司，百分比，2007 </td>]

这两个匹配项在 <td> 元素中都有注释，使得使用 text 匹配项的搜索无效.

Looking at US Census QFD I'm trying to grab the race % by county. The loop I'm building is outside the scope of my question, which concerns this code:

url = 'http://quickfacts.census.gov/qfd/states/48/48507.html'
#last county in TX; for some reason the qfd #'s counties w/ only odd numbers
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

c_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[0] #c = county %
s_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[1] #s = state %

Which grabs the html element including its tags, not just the text within it:

c_black_alone, s_black_alone

(<td align="right" headers="rp9 p1" valign="bottom">96.9%<sup></sup></td>,
 <td align="right" headers="rp9 p2" valign="bottom">80.3%<sup></sup></td>)

Above ^, I only want the %'s inside the elements...

Furthermore, why does

test_black = soup.find_all("td", text = "Black")

not return the same element as above (or its text), but instead returns an empty bs4 ResultSet object? (Edit: I have been following along with the documentation, so I hope this question doesn't seem too vague...)

解决方案

To get the text from those matches, use .text to get all contained text:

>>> soup.find_all("td", attrs={'headers':'rp9'})[0].text
u'96.9%'
>>> soup.find_all("td", attrs={'headers':'rp9'})[1].text
u'80.3%'

Your text search doesn't match anything for two reasons:

A literal string only matches the whole contained text, not a partial match. It'll only work for element with <td>Black</td> as the sole contents.
It will use the .string property, but that property is only set if the text is the only child of a given element. If there are other elements present, the search will fail entirely.

The way around this is by using a lambda instead; it'll be passed the whole element and you can validate each element:

soup.find_all(lambda e: e.name == 'td' and 'Black' in e.text)

Demo:

>>> soup.find_all(lambda e: e.name == 'td' and 'Black' in e.text)
[<td id="rp10" valign="top">Black or African American alone, percent, 2013 (a)  <!-- RHI225213 --> </td>, <td id="re6" valign="top">Black-owned firms, percent, 2007  <!-- SBO315207 --> </td>]

Both of these matches have a comment in the <td> element, making a search with a text match ineffective.

这篇关于为什么 bs4 返回标签，然后返回一个空列表到这个 find_all() 方法?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 bs4 返回标签，然后返回一个空列表到这个 find_all() 方法? [英] Why does bs4 return tags and then an empty list to this find_all() method?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

为什么 bs4 返回标签，然后返回一个空列表到这个 find_all() 方法? [英] Why does bs4 return tags and then an empty list to this find_all() method?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭