是否有任何BeautifulSoup严格的findAll功能? [英] Is there any strict findAll function in BeautifulSoup?
问题描述
我使用Python- 2.7和BeautifulSoup
I am using Python- 2.7 and BeautifulSoup
如果我无法解释我想要什么道歉
Apologies if I am unable to explain what exactly I want
有是其中数据被嵌入在特定结构此html页面
我想拉数据忽略的第一个块
There is this html page in which data is embedded in specific structure I want to pull the data ignoring the first block
但问题是,当我这样做 -
But the problem is when I do-
self.tab = soup.findAll("div","listing-row")
这也给了我第一个块,这实际上是(不需要HTML块) -
It also gives me the first block which is actually (unwanted html block)-
("div","listing-row wide-featured-listing")
我不使用
soup.find(格,上市行)
因为我希望所有命名类的挂牌行只有在整个页面。
since I want all the classes named "listing-row" only in that entire page.
我怎么可以忽略名为类的挂牌排全功能的上市
How can I ignore the class named "listing-row wide-featured-listing"?
在任何形式的帮助/指导是AP preciated。非常感谢!
Help/Guidance in any form is appreciated. Thanks a lot !
推荐答案
或者,你会做一个的 CSS选择器到类完全匹配,以上市行
:
Or, you may make a CSS selector to match the class exactly to listing-row
:
soup.select("div[class=listing-row]")
演示:
>>> from bs4 import BeautifulSoup
>>>
>>> data = """
... <div>
... <div class="listing-row">result1</div>
... <div class="listing-row wide-featured-listing">result2</div>
... <div class="listing-row">result3</div>
... </div>
... """
>>>
>>> soup = BeautifulSoup(data, "html.parser")
>>> print [row.text for row in soup.select("div[class=listing-row]")]
[u'result1', u'result3']
这篇关于是否有任何BeautifulSoup严格的findAll功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!