在HTML,BeautifulSoup中匹配特定的表 [英] Matching specific table within HTML, BeautifulSoup

查看:74
本文介绍了在HTML,BeautifulSoup中匹配特定的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个问题.我要抓取的页面上有几个类似的表格.

I have this problem. There're several similar tables on the page I'm trying to scrape.

<h2 class="tabellen_ueberschrift al">Points</h2>
<div class="fl" style="width:49%;">     
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">

它们之间的唯一区别是h2标记内的文本,在这里:Points

The only difference between them is the text within h2 tags, here: Points

如何指定需要在哪个表中搜索?

How can I specifiy which table I need to search in?

我有这段代码,需要调整h2标记因子:

I have this code and need to adjust the h2 tag factor:

my_tab = soup.find('table', {'class':'tabelle_grafik lh'})

需要一些帮助人员.

推荐答案

这对我有用.找到"previousSiblings",如果在带有不同文本内容的h2标签之前找到带有"Points"文本的h2,那么您已经找到了一个不错的表格

This works for me. Find the "previousSiblings" and if you find a h2 with the text "Points" before an h2 tag with a different text contents, you've found a good table

from BeautifulSoup import BeautifulSoup

t="""
<h2 class="tabellen_ueberschrift al">Points</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>yes me!</th></td></table>
<h2 class="tabellen_ueberschrift al">Bad</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>woo woo</td></th></table>
"""

soup = BeautifulSoup(t)

for ta in soup.findAll('table'):
    for s in ta.findPreviousSiblings():
        if s.name == u'h2':
            if s.text == u'Points':
                print ta 
            else:
                break;

这篇关于在HTML,BeautifulSoup中匹配特定的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆