如何从 python 美丽的汤表中获取 tbody ? [英] how to get tbody from table from python beautiful soup ?

查看:16
本文介绍了如何从 python 美丽的汤表中获取 tbody ?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试废弃 Year &获胜者(第一列和第二列)来自决赛比赛列表"表(第二个表)来自http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals:我正在使用下面的代码:

I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals: I'm using the code below:

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

通过上面的代码,我能够得到第一个 &第三列就好了.但是当我对 http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals 使用相同的代码时,它找不到 tbody 作为它的元素,但是当我检查元素时我可以看到 tbody.

With the above code, I was able to get first & thrid column just fine. But when I use the same code with http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals, It could not find tbody as its element, but I can see the tbody when I inspect the element.

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    soup.findAll('table')[2].tbody.findAll('tr')
    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

这是我从评论错误中得到的:

Here's what I got from comment error:

'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
      8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'

'

推荐答案

如果您通过浏览器中的检查工具进行检查,它将插入 tbody 标签.

If you are inspecting through the inspect tool in the browser it will insert the tbody tags.

源代码可能包含也可能不包含它们.如果您真的想知道,我建议您查看源代码视图.

The source code, may, or may not contain them. I suggest looking at the source view if you really want to know.

无论哪种方式,您都不需要遍历到 tbody,只需:

Either way, you do not need to traverse to the tbody, simply:

soup.findAll('table')[0].findAll('tr') 应该可以工作.

这篇关于如何从 python 美丽的汤表中获取 tbody ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆