来自html的所有元素都没有被Python中的Requests和BeautifulSoup提取 [英] All elements from html not being extracted by Requests and BeautifulSoup in Python

查看：312 发布时间：2020/11/24 21:15:01 python web-scraping beautifulsoup html-parsing

本文介绍了来自html的所有元素都没有被Python中的Requests和BeautifulSoup提取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从一个网站上刮取赔率，该站点显示来自不同代理机构的当前赔率，以分派市场竞争的影响.我正在使用Requests和BeautifulSoup提取相关数据.但是使用之后:

I am trying to scrape odds from a site that displays current odds from different agencies for an assignment on the effects of market competition. I am using Requests and BeautifulSoup to extract the relevant data. However after using:

import requests
from bs4 import BeautifulSoup

url = "https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/"

r=requests.get(url)
Print(r.text)

它不会显示任何赔率，但是如果我检查页面上的元素，我会在html中看到它们.如何获得将其导入Python进行提取的请求?

It does not print any odds, yet if I inspect the element on the page I can see them in the html. How do I get Requests to import them into Python to extract?

推荐答案

requests不太适合在这种情况下使用-该网站相当动态，并且使用多个XHR请求和javascript形成页面.获得所需信息的一种更快，更轻松的方法是通过

requests is not quite suitable to use in this case - the site is quite dynamic and uses multiple XHR requests and javascript to form the page. A quicker and much less painful way to get to the desired information would be to use a real browser automated via selenium.

这是一个入门示例代码-使用无头 PhantomJS 浏览器:

Here is an example code to get you started - headless PhantomJS browser is used:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS()
driver.get("https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/")

# waiting for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".odds-comparison")))

for comparison in driver.find_elements_by_css_selector(".odds-comparison"):
    description = comparison.find_element_by_css_selector(".description").text
    print(description)

driver.close()

它将在页面上打印所有赔率表说明:

It prints all the odds table descriptions on the page:

MATCH ODDS
MOST SIXES
TOP SRI LANKA BATSMAN
TOP AFGHANISTAN BATSMAN

这篇关于来自html的所有元素都没有被Python中的Requests和BeautifulSoup提取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

来自html的所有元素都没有被Python中的Requests和BeautifulSoup提取 [英] All elements from html not being extracted by Requests and BeautifulSoup in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

来自html的所有元素都没有被Python中的Requests和BeautifulSoup提取 [英] All elements from html not being extracted by Requests and BeautifulSoup in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭