美丽的汤获取动态表数据 [英] Beautiful Soup fetch dynamic table data
问题描述
我有以下代码:
url = 'https://www.basketball-reference.com/leagues/NBA_2017_standings.html#all_expanded_standings'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
print(len(soup.findAll('table')))
print(soup.findAll('table'))
网页上有6个表,但只返回4个表.我尝试使用'html.parser'或'html5lib'作为解析器,但也无法正常工作.
There are 6 tables on the webpage, but it only returns 4 tables. I tried to use 'html.parser' or 'html5lib' as parsers but did not work either.
有人知道如何从网页上获得Table的扩展排名"吗?
Any idea how I can get the Table "expanded standings" from the webpage?
谢谢!
推荐答案
requests
无法获取JS
加载的数据.因此,您必须使用selenium
.首先通过pip
-pip install selenium
安装selenium
并下载 chrome驱动程序并将文件放在您的工作目录中.然后尝试以下代码.
requests
can't fetch data that are loaded by JS
. So, you have to use selenium
. First install selenium
via pip
- pip install selenium
and download chrome driver and put the file in your working directory. Then try the following code.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
url = "https://www.basketball-reference.com/leagues/NBA_2017_standings.html"
browser = webdriver.Chrome()
browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
print(len(soup.find_all("table")))
print(soup.find("table", {"id": "expanded_standings"}))
browser.close()
browser.quit()
请参见selenium
文档.
如果您使用的是Linux
并出现错误Chromedriver executable needs to be in the PATH
,请尝试按照以下方式操作-链接2
If you are on Linux
and get error Chromedriver executable needs to be in the PATH
then try following these ways - link-1, link-2
这篇关于美丽的汤获取动态表数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!