BeautifulSoup 看不到元素,即使它存在于页面上 [英] BeautifulSoup does not see element , even though it is present on a page

查看:32
本文介绍了BeautifulSoup 看不到元素,即使它存在于页面上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Airbnb 抓取房源.每个列表都有自己的 ID.但是,下面代码的输出是None:

导入请求,bs4response = requests.get('https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&;结帐= 2018年7月25日&安培; allow_override%5B%5D =安培; ne_lat = 42.40450221314142&安培; ne_lng = 3.3245690859736214&安培; sw_lat = 41.97668610374056&安培; sw_lng = 1.7960961855829964和缩放= 10安培; search_by_map =真安培; s_tag = nrGiXgWC')汤 = bs4.BeautifulSoup(response.text, "html.parser")元素 = 汤.find(id="listing-18354577")打印(元素)

为什么汤里已经加载了这个元素,却看不到这个元素?

它是否在某种类型的容器中,我需要以不同的方式刮擦?

解决方案

requests 不用等js,可以使用selenium 加载所有页面,然后使用 bs4 例如这有效:

导入请求,bs4从硒导入网络驱动程序# 把路径放到chromedriverdriver = webdriver.Chrome('path/to/chromedriver')网站 = "https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&安培; allow_override%5B%5D =安培; ne_lat = 42.40450221314142&安培; ne_lng = 3.3245690859736214&安培; sw_lat = 41.97668610374056&安培; sw_lng = 1.7960961855829964和缩放= 10安培; search_by_map =真安培; s_tag = nrGiXgWC"driver.get(网站)html = driver.page_source汤 = bs4.BeautifulSoup(html, "html.parser")元素 = 汤.find(id="listing-18354577")打印(元素)

输出

... #和许多其他数据

I am trying to scrape listings from Airbnb. Every listing has its own ID. However, the output of the code below is None:

import requests, bs4

response = requests.get('https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC')  
soup = bs4.BeautifulSoup(response.text, "html.parser")

element = soup.find(id="listing-18354577")
print(element)

Why does the soup does not see this element, even though it is already loaded on the page?

Is it in a container of some type I need to scrape differently?

解决方案

requests don't wait for js, you can use selenium to load all page and after this use bs4 for example this works:

import requests, bs4
from selenium import webdriver

# put the path to chromedriver
driver = webdriver.Chrome('path/to/chromedriver') 
website = "https://www.airbnb.pl/s/Girona--Hiszpania/homes?refinement_paths%5B%5D=%2Fhomes&query=Girona%2C%20Hiszpania&checkin=2018-07-04&checkout=2018-07-25&allow_override%5B%5D=&ne_lat=42.40450221314142&ne_lng=3.3245690859736214&sw_lat=41.97668610374056&sw_lng=1.7960961855829964&zoom=10&search_by_map=true&s_tag=nrGiXgWC"
driver.get(website) 
html = driver.page_source
soup = bs4.BeautifulSoup(html, "html.parser")

element = soup.find(id="listing-18354577")
print(element)

Output

<div class="_1wq3lj" id="listing-18354577"> ...  #and many other data

这篇关于BeautifulSoup 看不到元素,即使它存在于页面上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆