无法使用Python BeautifulSoup查找表 [英] Cannot find table using Python BeautifulSoup

查看:85
本文介绍了无法使用Python BeautifulSoup查找表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试从以下NOAA网站 https:中的表id = AWS中抓取数据://www.weather.gov/afc/alaskaObs ,但是当我尝试使用'.find'查找表时,我的结果显示为否.我可以返回父div,但似乎无法访问该表.下面是我的代码.

I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code.

from bs4 import BeautifulSoup
from urllib2 import urlopen

# Get soup set up
html = urlopen('https://www.weather.gov/afc/alaskaObs').read()
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table", {"id":"AWS"})


print soup

当我尝试仅找到父div"obDataDiv"时,它将返回以下内容.

When I try to just find the parent div, "obDataDiv", it returns the following.

<div id="obDataDiv"> </div>

我对BeautifulSoup很陌生,这是一个错误吗?感谢您的任何帮助,谢谢!

I'm pretty new to BeautifulSoup, is this an error? Any help is appreciated, thank you!

推荐答案

urlopen仅会提供从服务器下载的DOM,而不会提供运行初始客户端脚本后最终得到的DOM.对于您的示例站点,该表是页面加载后由Javascript生成的.因此,您需要使用PhantomJS,Selenium等使必需的客户端JS首先运行.

urlopen will only give you the DOM that was downloaded from the server, not what it ends up being after running initial client-side scripts. In the case of your example site, the table is Javascript-generated after the page load. So you'll need to use PhantomJS, Selenium, etc to let the necessary client-side JS run first.

这篇关于无法使用Python BeautifulSoup查找表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆