即使元素存在,BeautifulSoup仍返回“无" [英] BeautifulSoup returns None even though the element exists

查看:92
本文介绍了即使元素存在,BeautifulSoup仍返回“无"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通过了大多数解决类似问题的解决方案,但是还没有找到一种可行的方法,更重要的是,没有找到一种解释为什么这种情况发生在被抓取网站上的Javascript或其他内容之外时

I have gone through most of the solutions for similar issues but haven't found one that works and more importantly haven't found an explanation of why this occurs outside of when Javascript or something else is being called on the site being scraped.

我正在尝试从网站上刮掉"Officials"游戏的桌子: http://www.pro-football-reference.com/boxscores/201309050den.htm

I am trying to scrape the table for game "Officials" from the site: http://www.pro-football-reference.com/boxscores/201309050den.htm

我的代码是:

url = "http://www.pro-football-reference.com/boxscores/201309050den.htm"
html = urlopen(url)    
bsObj = BeautifulSoup(html, "lxml")
officials = bsObj.findAll("table",{"id":"officials"})

for entry in officials:
    print(str(entry))

我现在只是打印到控制台,但是使用findAll或使用find的None会得到一个空列表. 我也尝试过使用基本的html.parser进行此操作,但是没有运气.

I am just printing to the console for now, but I get an empty list with findAll or None with find. I have also tried this with the basic html.parser with no luck.

能更好地了解html的人可以就此网页有什么特别的方面教育我吗?预先感谢!

Can someone with a better understanding of html educate me on what is different about this webpage specifically? Thanks in advance!

推荐答案

尝试以下代码:

from selenium import webdriver
import time
from bs4 import BeautifulSoup


driver = webdriver.Chrome()
url= "http://www.pro-football-reference.com/boxscores/201309050den.htm"
driver.maximize_window()
driver.get(url)

time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("table",{"id":"officials"})

for entry in officials:
    print(str(entry))


driver.quit()

它将打印:

<table class="suppress_all sortable stats_table now_sortable" data-cols-to-freeze="0" id="officials"><thead><tr class="thead onecell"><td class=" center" colspan="2" data-stat="onecell">Officials</td></tr></thead><caption>Officials Table</caption><tbody>
<tr data-row="0"><th class=" " data-stat="ref_pos" scope="row">Referee</th><td class=" " data-stat="name"><a href="/officials/ColeWa0r.htm">Walt Coleman</a></td></tr>
<tr data-row="1"><th class=" " data-stat="ref_pos" scope="row">Umpire</th><td class=" " data-stat="name"><a href="/officials/ElliRo0r.htm">Roy Ellison</a></td></tr>
<tr data-row="2"><th class=" " data-stat="ref_pos" scope="row">Head Linesman</th><td class=" " data-stat="name"><a href="/officials/BergJe1r.htm">Jerry Bergman</a></td></tr>
<tr data-row="3"><th class=" " data-stat="ref_pos" scope="row">Field Judge</th><td class=" " data-stat="name"><a href="/officials/GautGr0r.htm">Greg Gautreaux</a></td></tr>
<tr data-row="4"><th class=" " data-stat="ref_pos" scope="row">Back Judge</th><td class=" " data-stat="name"><a href="/officials/YettGr0r.htm">Greg Yette</a></td></tr>
<tr data-row="5"><th class=" " data-stat="ref_pos" scope="row">Side Judge</th><td class=" " data-stat="name"><a href="/officials/PattRi0r.htm">Rick Patterson</a></td></tr>
<tr data-row="6"><th class=" " data-stat="ref_pos" scope="row">Line Judge</th><td class=" " data-stat="name"><a href="/officials/BaynRu0r.htm">Rusty Baynes</a></td></tr>
</tbody></table>

这篇关于即使元素存在,BeautifulSoup仍返回“无"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆