Python的beautifulsoup 1级纯文本 [英] Python beautifulsoup level 1 only text
问题描述
我看了其他beautifulsoup获得同级别类型的问题。好像我的略有不同。
I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different.
下面是网站http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31
我试图让右边的表。请注意表格的第一行是如何扩展成一个详细的分解数据的下降。我不希望这样的数据。我只希望最高层的数据。你还可以看到,其它行也可以扩展,但不是在这种情况下。所以只要循环和跳过 TR [2]
可能无法正常工作。我试过这样:
I'm trying to get that table on the right. Notice how the first row of the table expands into a detailed break down of that data. I don't want that data. I only want the very top level data. You can also see that the other rows also can be expanded, but not in this case. So just looping and skipping tr[2]
might not work. I've tried this:
r = requests.get(page)
r.encoding = 'gb2312'
soup = BeautifulSoup(r.text,'html.parser')
table=soup.find('div', class_='right1').findAll('tr', {"class" : re.compile('list.*')})
但仍有多个嵌套列表*
在其他级别。如何获取只有第一个层次?
but there is still more nested list*
at other levels. How to get only the first level?
推荐答案
将搜索范围限制直接表
元素的儿童只有通过设置的 递归
参数来错误:
Limit your search to direct children of the table
element only by setting the recursive
argument to False:
table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)
这篇关于Python的beautifulsoup 1级纯文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!