即使在页面上,BeautifulSoup也看不到element, [英] BeautifulSoup does not see element , even though it is present on a page

查看:89
本文介绍了即使在页面上,BeautifulSoup也看不到element,的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Airbnb抓取清单.每个列表都有其自己的ID.但是,以下代码的输出为None:

I am trying to scrape listings from Airbnb. Every listing has its own ID. However, the output of the code below is None:

import requests, bs4

response = requests.get('https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC')  
soup = bs4.BeautifulSoup(response.text, "html.parser")

element = soup.find(id="listing-18354577")
print(element)

即使汤已经加载到页面上,为什么汤也看不到该元素?

Why does the soup does not see this element, even though it is already loaded on the page?

是在我需要以其他方式抓取的某种类型的容器中吗?

Is it in a container of some type I need to scrape differently?

推荐答案

requests不要等待js,可以使用

requests don't wait for js, you can use selenium to load all page and after this use bs4 for example this works:

import requests, bs4
from selenium import webdriver

# put the path to chromedriver
driver = webdriver.Chrome('path/to/chromedriver') 
website = "https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC"
driver.get(website) 
html = driver.page_source
soup = bs4.BeautifulSoup(html, "html.parser")

element = soup.find(id="listing-18354577")
print(element)

输出

<div class="_1wq3lj" id="listing-18354577"> ...  #and many other data

这篇关于即使在页面上,BeautifulSoup也看不到element,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆