无法使用Python BeautifulSoup查找表 [英] Cannot find table using Python BeautifulSoup
问题描述
我正尝试从以下NOAA网站 https:中的表id = AWS中抓取数据://www.weather.gov/afc/alaskaObs ,但是当我尝试使用'.find'查找表时,我的结果显示为否.我可以返回父div,但似乎无法访问该表.下面是我的代码.
I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code.
from bs4 import BeautifulSoup
from urllib2 import urlopen
# Get soup set up
html = urlopen('https://www.weather.gov/afc/alaskaObs').read()
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table", {"id":"AWS"})
print soup
当我尝试仅找到父div"obDataDiv"时,它将返回以下内容.
When I try to just find the parent div, "obDataDiv", it returns the following.
<div id="obDataDiv">Â </div>
我对BeautifulSoup很陌生,这是一个错误吗?感谢您的任何帮助,谢谢!
I'm pretty new to BeautifulSoup, is this an error? Any help is appreciated, thank you!
推荐答案
urlopen仅会提供从服务器下载的DOM,而不会提供运行初始客户端脚本后最终得到的DOM.对于您的示例站点,该表是页面加载后由Javascript生成的.因此,您需要使用PhantomJS,Selenium等使必需的客户端JS首先运行.
urlopen will only give you the DOM that was downloaded from the server, not what it ends up being after running initial client-side scripts. In the case of your example site, the table is Javascript-generated after the page load. So you'll need to use PhantomJS, Selenium, etc to let the necessary client-side JS run first.
这篇关于无法使用Python BeautifulSoup查找表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!