是否有任何BeautifulSoup严格的findAll功能? [英] Is there any strict findAll function in BeautifulSoup?

查看:329
本文介绍了是否有任何BeautifulSoup严格的findAll功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Python- 2.7和BeautifulSoup

I am using Python- 2.7 and BeautifulSoup

如果我无法解释我想要什么道歉

Apologies if I am unable to explain what exactly I want

有是其中数据被嵌入在特定结构此html页面
我想拉数据忽略的第一个块

There is this html page in which data is embedded in specific structure I want to pull the data ignoring the first block

但问题是,当我这样做 -

But the problem is when I do-

self.tab = soup.findAll("div","listing-row") 

这也给了我第一个块,这实际上是(不需要HTML块) -

It also gives me the first block which is actually (unwanted html block)-

("div","listing-row wide-featured-listing")

我不使用

soup.find(格,上市行)

因为我希望所有命名类的挂牌行只有在整个页面。

since I want all the classes named "listing-row" only in that entire page.

我怎么可以忽略名为类的挂牌排全功能的上市

How can I ignore the class named "listing-row wide-featured-listing"?

在任何形式的帮助/指导是AP preciated。非常感谢!

Help/Guidance in any form is appreciated. Thanks a lot !

推荐答案

或者,你会做一个的 CSS选择器到类完全匹配,以上市行

Or, you may make a CSS selector to match the class exactly to listing-row:

soup.select("div[class=listing-row]")

演示:

>>> from bs4 import BeautifulSoup
>>> 
>>> data = """
... <div>
...     <div class="listing-row">result1</div>
...     <div class="listing-row wide-featured-listing">result2</div>
...     <div class="listing-row">result3</div>
... </div>
... """
>>> 
>>> soup = BeautifulSoup(data, "html.parser")
>>> print [row.text for row in soup.select("div[class=listing-row]")]
[u'result1', u'result3']

这篇关于是否有任何BeautifulSoup严格的findAll功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆