用beautifulsoup解析网站 [英] parsing site with beautifulsoup

查看:21
本文介绍了用beautifulsoup解析网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习如何使用 python 解析 html我目前坚持使用soup.findAll返回一个空数组,因此可以找到一些元素这是我的代码:

i'm trying to learn how to parse html with python and i`m currently stuck with soup.findAll return me an empty array,therefore there are elements which could be found Here is my code:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
headers = {"User-Agent":'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
url = 'https://www.oddsportal.com/matches/tennis/20191114/'

responce = requests.get(url,headers=headers)

soup = BeautifulSoup(responce.text, 'html.parser')

info = soup.findAll('tr', {'class':'odd deactivate'})

print(info)

我会感谢任何帮助,提前致谢

i`ll appreciate any help,thanks in advance

推荐答案

显然,该页面仅在浏览器中调用后才加载赔率"部分.所以你可以使用 SeleniumChrome 驱动程序.

Apparently, the page only loades the "odds" parts once it is called in a browser. So you could use Selenium and Chrome driver.

请注意,您需要下载 Chrome 驱动程序并将驱动程序放在您的 .../python/ 目录中.确保选择匹配的驱动程序版本,即与您安装的 Chrome 浏览器版本匹配的 Chrome 驱动程序版本.

Note that you need to download the Chrome driver and place the driver in your .../python/ directory. Make sure you choose a matching driver version, meaning a version of Chrome driver that matches the version of the Chrome browser you have installed.

from bs4 import BeautifulSoup 
from urllib.request import urlopen 
import requests, time, traceback, random, csv, codecs, re, os

# Webdriver
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument('log-level=3')
browser = webdriver.Chrome(chrome_options=options)

url = 'https://www.oddsportal.com/matches/tennis/20191114/'
browser.get(url)
soup = BeautifulSoup(browser.page_source, "html.parser")
info = soup.findAll('tr', {'class':'odd deactivate'})
print(info) 

这篇关于用beautifulsoup解析网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆