网络抓取谷歌航班价格 [英] Web scraping google flight prices

查看:113
本文介绍了网络抓取谷歌航班价格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习使用Python库BeautifulSoup,例如,我想在Google Flights上抓取某个航班的价格. 所以我连接到Google Flights,例如在

I am trying to learn to use the python library BeautifulSoup, I would like to, for example, scrape a price of a flight on Google Flights. So I connected to Google Flights, for example at this link, and I want to get the cheapest flight price.

因此,我将使用此类"gws-flights-results__itinerary-price"(如图所示)在div中获取值.

So I would get the value inside the div with this class "gws-flights-results__itinerary-price" (as in the figure).

这是我写的简单代码:

from bs4 import BeautifulSoup
import urllib.request

url = 'https://www.google.com/flights?hl=it#flt=/m/07_pf./m/05qtj.2019-04-27;c:EUR;e:1;sd:1;t:f;tt:o'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
div = soup.find('div', attrs={'class': 'gws-flights-results__itinerary-price'})

但是结果div的类为NoneType.

But the resulting div has class NoneType.

我也尝试

find_all('div') 

但是在以这种方式找到的所有div中,没有我感兴趣的div. 有人可以帮我吗?

but within all the div I found in this way, there was not the div I was interested in. Can someone help me?

推荐答案

看起来像javascript需要运行,因此请使用诸如硒之类的方法

Looks like javascript needs to run so use a method like selenium

from selenium import webdriver
url = 'https://www.google.com/flights?hl=it#flt=/m/07_pf./m/05qtj.2019-04-27;c:EUR;e:1;sd:1;t:f;tt:o'
driver = webdriver.Chrome()
driver.get(url)
print(driver.find_element_by_css_selector('.gws-flights-results__cheapest-price').text)
driver.quit()

这篇关于网络抓取谷歌航班价格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆