使用BeautifulSoup进行网页抓取,获取空列表 [英] Webscraping with BeautifulSoup, getting empty list

查看:144
本文介绍了使用BeautifulSoup进行网页抓取,获取空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过从 https://www.获取基本天气数据(例如每日高温/低温)来练习抓网. wunderground.com/(搜索随机邮政编码).

I'm practicing webscraping by getting basic weather data like the daily high/low temperature from https://www.wunderground.com/ (random zipcode searched).

我尝试过各种代码变体,但它始终返回温度应为空的列表.老实说,我只是不知道该怎么办.谁能指出我正确的方向?

I've tried various variations of my code but it keeps returning an empty list where the temperature should be. I honestly just don't know enough to pinpoint where i'm going wrong. Can anyone point me in the right direction?

import requests
from bs4 import BeautifulSoup
response=requests.get('https://www.wunderground.com/cgi-bin/findweather/getForecast?query=76502')
response_data = BeautifulSoup(response.content, 'html.parser')
results=response_data.select("strong.high")

我还尝试了以下操作以及其他各种变体:

I've also tried doing the following along with various other variations:

results = response_data.find_all('strong', class_ = 'high')
results = response_data.select('div.small_6 columns > strong.high' )

推荐答案

您要解析的数据是由JavaScript动态创建的,requests无法处理.您应该将 selenium Chromedriver 的示例> :

This data you want to parse is being dynamically created by JavaScript, requests can't handle that. You should use selenium together with PhantomJS or any other driver. Below is an example using selenium and Chromedriver:

from selenium import webdriver
from bs4 import BeautifulSoup

url='https://www.wunderground.com/cgi-bin/findweather/getForecast?query=76502'
driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

检查元素的最低,最高和当前温度可以使用:

Inspecting the elements, the lowest, the highest and the current temperature can be find using:

high = soup.find('strong', {'class':'high'}).text
low = soup.find('strong', {'class':'low'}).text
now = soup.find('span', {'data-variable':'temperature'}).find('span').text


>>> low, high, now
('25', '37', '36.5')

这篇关于使用BeautifulSoup进行网页抓取,获取空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆