BeautifulSoup仅识别页面中的几个元素 [英] BeautifulSoup identifying only few elements in the page

查看:55
本文介绍了BeautifulSoup仅识别页面中的几个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网站上进行了网络抓取.它仅占用页面中的1st 20个元素.如果我们向下滚动,将加载其余的元素.也要如何刮除这些元素?有没有其他方法可以做到这一点?

I did web scraping on a site. It is taking only 1st 20 elements in the page. The remaining elements will be loaded if we scroll down. How to scrape those elements too? Is there any different method to do that?

import requests
from bs4 import BeautifulSoup

r=requests.get("https://www.century21.com/real-estate/rock-spring-ga/LCGAROCKSPRING/")
c=r.content
c

soup=BeautifulSoup(c,"html5lib")
soup

all=soup.find_all("div",{"class":"property-card-primary-info"})
len(all)

它只给出20.不是全部.怎么也刮掉隐藏的元素?

It is giving only 20. Not all. How to scrape the hidden elements too?

推荐答案

使用硒向下滚动,然后可以抓取内容

Use selenium to scroll down and then you can scrape the contents

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome(executable_path=os.path.join(os.getcwd(),'chromedriver'))
browser.get(link)

body = browser.find_element_by_tag_name("body")

no_of_pagedowns = 2 #Enter number of pages that you would like to scroll here

while no_of_pagedowns:
    body.send_keys(Keys.PAGE_DOWN)
    no_of_pagedowns-=1

这篇关于BeautifulSoup仅识别页面中的几个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆