美丽的汤获取动态表数据 [英] Beautiful Soup fetch dynamic table data

查看:102
本文介绍了美丽的汤获取动态表数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

url = 'https://www.basketball-reference.com/leagues/NBA_2017_standings.html#all_expanded_standings'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')

print(len(soup.findAll('table')))
print(soup.findAll('table'))

网页上有6个表,但只返回4个表.我尝试使用'html.parser'或'html5lib'作为解析器,但也无法正常工作.

There are 6 tables on the webpage, but it only returns 4 tables. I tried to use 'html.parser' or 'html5lib' as parsers but did not work either.

有人知道如何从网页上获得Table的扩展排名"吗?

Any idea how I can get the Table "expanded standings" from the webpage?

谢谢!

推荐答案

requests无法获取JS加载的数据.因此,您必须使用selenium.首先通过pip-pip install selenium安装selenium并下载 chrome驱动程序并将文件放在您的工作目录中.然后尝试以下代码.

requests can't fetch data that are loaded by JS. So, you have to use selenium. First install selenium via pip - pip install selenium and download chrome driver and put the file in your working directory. Then try the following code.

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "https://www.basketball-reference.com/leagues/NBA_2017_standings.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"id": "expanded_standings"}))

browser.close()
browser.quit()

请参见selenium 文档.

如果您使用的是Linux并出现错误Chromedriver executable needs to be in the PATH,请尝试按照以下方式操作-

If you are on Linux and get error Chromedriver executable needs to be in the PATH then try following these ways - link-1, link-2

这篇关于美丽的汤获取动态表数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆