为什么BeautifulSoup没有找到具体的表类? [英] Why is BeautifulSoup not finding a specific table class?

查看:218
本文介绍了为什么BeautifulSoup没有找到具体的表类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是用美丽的汤尝试刮商品表关闭Oil-Price.net的。我能找到的第一个div,表,表身和表体行。但是,在我无法找到使用美丽的汤行之一的列。当我告诉Python来打印特定的行中的所有表,它并不显示一个我想要的。这是我的code:

I am using Beautiful Soup to try and scrape the Commodities table off of Oil-Price.net. I can find the first div, table, table body, and the rows of the table body. But there is a column in one of the rows that I can't find using Beautiful soup. When I tell python to print all the tables in that particular row, it doesn't show the one I want. This is my code:

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://oil-price.net').read()
soup = BeautifulSoup(html)

div = soup.find("div",{"id":"cntPos"})
table1 = div.find("table",{"class":"cntTb"})
tb1_body = table1.find("tbody")
tb1_rows = tb1_body.find_all("tr")
tb1_row = tb1_rows[1]
td = tb1_row.find("td",{"class":"cntBoxGreyLnk"})
print td

所有它打印为无。我甚至尝试打印每一行,看看我是否可以手动和没有找到列。 ``它会告诉别人。但不是我想要的。

All it prints is None. I even try to print each of the rows to see if I can find the column manually and nothing. ``It will show others. But not the one I want.

推荐答案

该页面使用HTML破碎,而不同的解析器会尝试不同的修复。安装 LXML 分析器,它分析网页更好的:

The page uses broken HTML, and different parsers will try to repair it differently. Install the lxml parser, it parses that page better:

>>> BeautifulSoup(html, 'html.parser').find("div",{"id":"cntPos"}).find("table",{"class":"cntTb"}).tbody.find_all("tr")[1].find("td",{"class":"cntBoxGreyLnk"}) is None
True
>>> BeautifulSoup(html, 'lxml').find("div",{"id":"cntPos"}).find("table",{"class":"cntTb"}).tbody.find_all("tr")[1].find("td",{"class":"cntBoxGreyLnk"}) is None
False

这篇关于为什么BeautifulSoup没有找到具体的表类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆