beautifulsoup:find_all上bs4.element.ResultSet对象或列表? [英] beautifulsoup: find_all on bs4.element.ResultSet object or list?

查看:6862
本文介绍了beautifulsoup:find_all上bs4.element.ResultSet对象或列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我将find_all应用于 beautifulsoup对象,并找到一些 bs4.element.ResultSet对象或者 list



我想在那里进一步做find_all,但是它不允许在 bs4.element.ResultSet对象。我可以遍历 bs4.element.ResultSet对象的每个元素来执行find_all。但是,我可以避免循环,只是将其转换回美丽的对象



请参阅详细代码。谢谢

  html_1 =
< table>
< thead>
< tr class =myClass>
< th>
< th>
< th>
$ D

$ lt; / thead
< / table>

汤= BeautifulSoup(html_1,'html.parser')

类型(汤)#bs4.BeautifulSoup

#在findsall对象
th_all = soup上做find_all。 find_all('th')

#结果是类型bs4.element.ResultSet或类似列表
类型(th_all)#bs4.element.ResultSet
类型(th_all [ 0:1])#list

#现在我想进一步做find_all
th_all.find_all(text ='A')#不工作

#can我避免了这种循环的需要?
for th_all:
th.find_all(text ='A')#works


解决方案

ResultSet class是一个列表的子类,而不是 Tag class ,其中 find * 定义的方法。循环访问 find_all()是最常见的方法:

  th_all = soup.find_all('th')
result = []
for th_all:
result.extend(th.find_all(text ='A'))

通常, CSS选择器可以帮助您一次解决它,除非您可以使用 find_all()可以通过 select()方法来实现。例如,在 bs4 CSS选择器中没有文本搜索。但是,例如,如果您必须在 th 元素中找到所有元素,例如 b 元素,则可以执行:

  soup.select(th td)


Hi so I apply find_all on a beautifulsoup object, and find something, which is an bs4.element.ResultSet object or a list.

I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object. I can loop through each element of the bs4.element.ResultSet object to do find_all. But can I avoid looping and just convert it back to a beautifulsoup object?

See code for details please. Thanks

html_1 = """
<table>
    <thead>
        <tr class="myClass">
            <th>A</th>
            <th>B</th>
            <th>C</th>
            <th>D</th>
        </tr>
    </thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')

type(soup) #bs4.BeautifulSoup

# do find_all on beautifulsoup object
th_all = soup.find_all('th')

# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list

# now I want to further do find_all
th_all.find_all(text='A') #not work

# can I avoid this need of loop?
for th in th_all:
    th.find_all(text='A') #works

解决方案

ResultSet class is a subclass of a list and not a Tag class which has the find* methods defined. Looping through the results of find_all() is the most common approach:

th_all = soup.find_all('th')
result = []
for th in th_all:
    result.extend(th.find_all(text='A'))

Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all() is possible with the select() method. For instance, there is no "text" search available in bs4 CSS selectors. But, if, for example, you had to find all, say, b elements inside th elements, you could do:

soup.select("th td")

这篇关于beautifulsoup:find_all上bs4.element.ResultSet对象或列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆