Python的beautifulsoup 1级纯文本 [英] Python beautifulsoup level 1 only text

查看:342
本文介绍了Python的beautifulsoup 1级纯文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看了其他beautifulsoup获得同级别类型的问题。好像我的略有不同。

I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different.

下面是网站http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31

我试图让右边的表。请注意表格的第一行是如何扩展成一个详细的分解数据的下降。我不希望这样的数据。我只希望最高层的数据。你还可以看到,其它行也可以扩展,但不是在这种情况下。所以只要循环和跳过 TR [2] 可能无法正常工作。我试过这样:

I'm trying to get that table on the right. Notice how the first row of the table expands into a detailed break down of that data. I don't want that data. I only want the very top level data. You can also see that the other rows also can be expanded, but not in this case. So just looping and skipping tr[2] might not work. I've tried this:

r = requests.get(page)
r.encoding = 'gb2312'
soup = BeautifulSoup(r.text,'html.parser')
table=soup.find('div', class_='right1').findAll('tr', {"class" : re.compile('list.*')})

但仍有多个嵌套列表* 在其他级别。如何获取只有第一个层次?

but there is still more nested list* at other levels. How to get only the first level?

推荐答案

将搜索范围限制直接元素的儿童只有通过设置的 递归参数来错误:

Limit your search to direct children of the table element only by setting the recursive argument to False:

table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)

这篇关于Python的beautifulsoup 1级纯文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆