beautifulsoup:find_all上bs4.element.ResultSet对象或列表? [英] beautifulsoup: find_all on bs4.element.ResultSet object or list?
问题描述
您好,我将find_all应用于 beautifulsoup对象
,并找到一些 bs4.element.ResultSet对象
或者 list
。
我想在那里进一步做find_all,但是它不允许在 bs4.element.ResultSet对象
。我可以遍历 bs4.element.ResultSet对象
的每个元素来执行find_all。但是,我可以避免循环,只是将其转换回美丽的对象
?
请参阅详细代码。谢谢
html_1 =
< table>
< thead>
< tr class =myClass>
< th>
< th>
< th>
$ D
$ lt; / thead
< / table>
汤= BeautifulSoup(html_1,'html.parser')
类型(汤)#bs4.BeautifulSoup
#在findsall对象
th_all = soup上做find_all。 find_all('th')
#结果是类型bs4.element.ResultSet或类似列表
类型(th_all)#bs4.element.ResultSet
类型(th_all [ 0:1])#list
#现在我想进一步做find_all
th_all.find_all(text ='A')#不工作
#can我避免了这种循环的需要?
for th_all:
th.find_all(text ='A')#works
Tag
class ,其中 find *
定义的方法。循环访问 find_all()
是最常见的方法:
th_all = soup.find_all('th')
result = []
for th_all:
result.extend(th.find_all(text ='A'))
通常, CSS选择器可以帮助您一次解决它,除非您可以使用 find_all()
可以通过 select()
方法来实现。例如,在 bs4
CSS选择器中没有文本搜索。但是,例如,如果您必须在 th
元素中找到所有元素,例如 b
元素,则可以执行:
soup.select(th td)
Hi so I apply find_all on a beautifulsoup object
, and find something, which is an bs4.element.ResultSet object
or a list
.
I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object
. I can loop through each element of the bs4.element.ResultSet object
to do find_all. But can I avoid looping and just convert it back to a beautifulsoup object
?
See code for details please. Thanks
html_1 = """
<table>
<thead>
<tr class="myClass">
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
</tr>
</thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')
type(soup) #bs4.BeautifulSoup
# do find_all on beautifulsoup object
th_all = soup.find_all('th')
# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list
# now I want to further do find_all
th_all.find_all(text='A') #not work
# can I avoid this need of loop?
for th in th_all:
th.find_all(text='A') #works
ResultSet
class is a subclass of a list and not a Tag
class which has the find*
methods defined. Looping through the results of find_all()
is the most common approach:
th_all = soup.find_all('th')
result = []
for th in th_all:
result.extend(th.find_all(text='A'))
Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all()
is possible with the select()
method. For instance, there is no "text" search available in bs4
CSS selectors. But, if, for example, you had to find all, say, b
elements inside th
elements, you could do:
soup.select("th td")
这篇关于beautifulsoup:find_all上bs4.element.ResultSet对象或列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!