为什么BeautifulSoup没有找到具体的表类? [英] Why is BeautifulSoup not finding a specific table class?
问题描述
我是用美丽的汤尝试刮商品表关闭Oil-Price.net的。我能找到的第一个div,表,表身和表体行。但是,在我无法找到使用美丽的汤行之一的列。当我告诉Python来打印特定的行中的所有表,它并不显示一个我想要的。这是我的code:
I am using Beautiful Soup to try and scrape the Commodities table off of Oil-Price.net. I can find the first div, table, table body, and the rows of the table body. But there is a column in one of the rows that I can't find using Beautiful soup. When I tell python to print all the tables in that particular row, it doesn't show the one I want. This is my code:
from urllib2 import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://oil-price.net').read()
soup = BeautifulSoup(html)
div = soup.find("div",{"id":"cntPos"})
table1 = div.find("table",{"class":"cntTb"})
tb1_body = table1.find("tbody")
tb1_rows = tb1_body.find_all("tr")
tb1_row = tb1_rows[1]
td = tb1_row.find("td",{"class":"cntBoxGreyLnk"})
print td
所有它打印为无。我甚至尝试打印每一行,看看我是否可以手动和没有找到列。 ``它会告诉别人。但不是我想要的。
All it prints is None. I even try to print each of the rows to see if I can find the column manually and nothing. ``It will show others. But not the one I want.
推荐答案
该页面使用HTML破碎,而不同的解析器会尝试不同的修复。安装 LXML
分析器,它分析网页更好的:
The page uses broken HTML, and different parsers will try to repair it differently. Install the lxml
parser, it parses that page better:
>>> BeautifulSoup(html, 'html.parser').find("div",{"id":"cntPos"}).find("table",{"class":"cntTb"}).tbody.find_all("tr")[1].find("td",{"class":"cntBoxGreyLnk"}) is None
True
>>> BeautifulSoup(html, 'lxml').find("div",{"id":"cntPos"}).find("table",{"class":"cntTb"}).tbody.find_all("tr")[1].find("td",{"class":"cntBoxGreyLnk"}) is None
False
这篇关于为什么BeautifulSoup没有找到具体的表类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!