解析网站与beautifulsoup [英] parsing site with beautifulsoup
问题描述
我正在尝试学习如何使用python解析html 而且我目前卡在汤中.findAll向我返回一个空数组,因此可以找到一些元素 这是我的代码:
i'm trying to learn how to parse html with python and i`m currently stuck with soup.findAll return me an empty array,therefore there are elements which could be found Here is my code:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
headers = {"User-Agent":'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
url = 'https://www.oddsportal.com/matches/tennis/20191114/'
responce = requests.get(url,headers=headers)
soup = BeautifulSoup(responce.text, 'html.parser')
info = soup.findAll('tr', {'class':'odd deactivate'})
print(info)
感谢您的帮助,
推荐答案
显然,该页面仅在浏览器中被调用后才加载奇数"部分.因此,您可以使用硒和 Chrome驱动程序.
Apparently, the page only loades the "odds" parts once it is called in a browser. So you could use Selenium and Chrome driver.
请注意,您需要下载Chrome驱动程序并将其放置在.../python/
目录中.确保选择匹配的驱动程序版本,即与您已安装的Chrome浏览器版本相匹配的Chrome驱动程序版本.
Note that you need to download the Chrome driver and place the driver in your .../python/
directory. Make sure you choose a matching driver version, meaning a version of Chrome driver that matches the version of the Chrome browser you have installed.
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests, time, traceback, random, csv, codecs, re, os
# Webdriver
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument('log-level=3')
browser = webdriver.Chrome(chrome_options=options)
url = 'https://www.oddsportal.com/matches/tennis/20191114/'
browser.get(url)
soup = BeautifulSoup(browser.page_source, "html.parser")
info = soup.findAll('tr', {'class':'odd deactivate'})
print(info)
这篇关于解析网站与beautifulsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!