用beautifulsoup解析网站 [英] parsing site with beautifulsoup
问题描述
我正在尝试学习如何使用 python 解析 html我目前坚持使用soup.findAll返回一个空数组,因此可以找到一些元素这是我的代码:
i'm trying to learn how to parse html with python and i`m currently stuck with soup.findAll return me an empty array,therefore there are elements which could be found Here is my code:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
headers = {"User-Agent":'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
url = 'https://www.oddsportal.com/matches/tennis/20191114/'
responce = requests.get(url,headers=headers)
soup = BeautifulSoup(responce.text, 'html.parser')
info = soup.findAll('tr', {'class':'odd deactivate'})
print(info)
我会感谢任何帮助,提前致谢
i`ll appreciate any help,thanks in advance
推荐答案
显然,该页面仅在浏览器中调用后才加载赔率"部分.所以你可以使用 Selenium 和 Chrome 驱动程序.
Apparently, the page only loades the "odds" parts once it is called in a browser. So you could use Selenium and Chrome driver.
请注意,您需要下载 Chrome 驱动程序并将驱动程序放在您的 .../python/
目录中.确保选择匹配的驱动程序版本,即与您安装的 Chrome 浏览器版本匹配的 Chrome 驱动程序版本.
Note that you need to download the Chrome driver and place the driver in your .../python/
directory. Make sure you choose a matching driver version, meaning a version of Chrome driver that matches the version of the Chrome browser you have installed.
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests, time, traceback, random, csv, codecs, re, os
# Webdriver
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument('log-level=3')
browser = webdriver.Chrome(chrome_options=options)
url = 'https://www.oddsportal.com/matches/tennis/20191114/'
browser.get(url)
soup = BeautifulSoup(browser.page_source, "html.parser")
info = soup.findAll('tr', {'class':'odd deactivate'})
print(info)
这篇关于用beautifulsoup解析网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!