在HTML,BeautifulSoup中匹配特定的表 [英] Matching specific table within HTML, BeautifulSoup
本文介绍了在HTML,BeautifulSoup中匹配特定的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个问题.我要抓取的页面上有几个类似的表格.
I have this problem. There're several similar tables on the page I'm trying to scrape.
<h2 class="tabellen_ueberschrift al">Points</h2>
<div class="fl" style="width:49%;">
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
它们之间的唯一区别是h2
标记内的文本,在这里:Points
The only difference between them is the text within h2
tags, here: Points
如何指定需要在哪个表中搜索?
How can I specifiy which table I need to search in?
我有这段代码,需要调整h2
标记因子:
I have this code and need to adjust the h2
tag factor:
my_tab = soup.find('table', {'class':'tabelle_grafik lh'})
需要一些帮助人员.
推荐答案
这对我有用.找到"previousSiblings",如果在带有不同文本内容的h2标签之前找到带有"Points"文本的h2,那么您已经找到了一个不错的表格
This works for me. Find the "previousSiblings" and if you find a h2 with the text "Points" before an h2 tag with a different text contents, you've found a good table
from BeautifulSoup import BeautifulSoup
t="""
<h2 class="tabellen_ueberschrift al">Points</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>yes me!</th></td></table>
<h2 class="tabellen_ueberschrift al">Bad</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>woo woo</td></th></table>
"""
soup = BeautifulSoup(t)
for ta in soup.findAll('table'):
for s in ta.findPreviousSiblings():
if s.name == u'h2':
if s.text == u'Points':
print ta
else:
break;
这篇关于在HTML,BeautifulSoup中匹配特定的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文