美丽的汤不等到页面完全加载 [英] Beautiful Soup not waiting until page is fully loaded

查看:61
本文介绍了美丽的汤不等到页面完全加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,在下面的代码中,我想打开一个公寓网站URL并抓取该网页. 唯一的问题是,Beautiful Soup不会等到整个网页都呈现出来. 直到将它们加载到页面上之前,这些公寓才会在html中呈现,这需要花费几秒钟的时间.我该如何解决?

So with my code below I want to open an apartment website URL and scrape the webpage. The only issue is that Beautiful Soup isn't waiting until the entire webpage is rendered. The apartments aren't rendered in the html until they are loaded on the page, which takes a few seconds. How do I fix this?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://xxxxx.com/properties/?sort=latest'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

 page_soup = soup(page_html, "html.parser")

 containers = page_soup.findAll("div",{"class":"grid-item"})
#len(containers) is empty since the contents haven't been loaded yet!

推荐答案

如果要等待页面完全加载其数据,则应考虑使用selenium,在您的情况下,它可能看起来像这样:

If you want to wait for the page to fully load its data you should think about using selenium, in your case it could look like this:

from bs4 import BeautifulSoup
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

url = "<URL>"

chrome_options = Options()  
chrome_options.add_argument("--headless") # Opens the browser up in background

with Chrome(options=chrome_options) as browser:
     browser.get(url)
     html = browser.page_source

page_soup = BeautifulSoup(html, 'html.parser')
containers = page_soup.findAll("div",{"class":"grid-item"})

这篇关于美丽的汤不等到页面完全加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆