抓取需要您向下滚动的网站 [英] scraping a website that requires you to scroll down

查看：78 发布时间：2021/4/15 19:08:31 javascript python dynamic beautifulsoup bs4

本文介绍了抓取需要您向下滚动的网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正尝试在此处抓取此网站:

I am trying to scrape this website here:

但是，它要求我向下滚动才能收集其他数据.我不知道如何使用Beautiful soup或python向下滚动.这里有人知道吗?

However, it requires that I scroll down in order to collect additional data. I have no idea how to scroll down using Beautiful soup or python. Does anybody here know how?

代码有点混乱，但是就在这里.

The code is a bit of a mess but here it is.

import scrapy
from scrapy.selector import Selector
from testtest.items import TesttestItem
import datetime
from selenium import webdriver
from bs4 import BeautifulSoup
from HTMLParser import HTMLParser
import re
import time

class MLStripper(HTMLParser):


class MySpider(scrapy.Spider):
        name = "A1Locker"

        def strip_tags(html):
            s = MLStripper()
            s.feed(html)
            return s.get_data()

     allowed_domains = ['https://www.a1lockerrental.com']
    start_urls = ['http://www.a1lockerrental.com/self-storage/mo/st-
 louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
 category=all']

     def parse(self, response):

                 url='http://www.a1lockerrental.com/self-storage/mo/st-
louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
category=Small'
                driver = webdriver.Firefox()
                driver.get(url)
                html = driver.page_source
                soup = BeautifulSoup(html, 'html.parser')
        url2='http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-
meramec-bottom-rd-facility/unit-sizes-prices#/units?category=Medium'
        driver2 = webdriver.Firefox()
                driver2.get(url2)
                html2 = driver.page_source
                soup2 = BeautifulSoup(html2, 'html.parser')                
                #soup.append(soup2)
                #print soup
        items = []
        inside = "Indoor"
                outside = "Outdoor"
        inside_units = ["5 x 5", "5 x 10"]
        outside_units = ["10 x 15","5 x 15", "8 x 10","10 x 10","10 x 
20","10 x 25","10 x 30"]
        sizeTagz = soup.findAll('span',{"class":"sss-unit-size"})
        sizeTagz2 = soup2.findAll('span',{"class":"sss-unit-size"})
        #print soup.findAll('span',{"class":"sss-unit-size"})



        rateTagz = soup.findAll('p',{"class":"unit-special-offer"})


        specialTagz = soup.findAll('span',{"class":"unit-special-offer"})
        typesTagz = soup.findAll('div',{"class":"unit-info"},)

        rateTagz2 = soup2.findAll('p',{"class":"unit-special-offer"})


        specialTagz2 = soup2.findAll('span',{"class":"unit-special-offer"})
        typesTagz2 = soup2.findAll('div',{"class":"unit-info"},)
        yield {'date': datetime.datetime.now().strftime("%m-%d-%y"),
                'name': "A1Locker"
                   }
        size = []
        for n in range(len(sizeTagz)):
                    print len(rateTagz)
                    print len(typesTagz)

                    if "Outside" in (typesTagz[n]).get_text():



                            size.append(re.findall(r'\d+',
 (sizeTagz[n]).get_text()))
                            size.append(re.findall(r'\d+',
 (sizeTagz2[n]).get_text()))
                            print "logic hit"
                for i in range(len(size)):
                        yield {
                    #soup.findAll('p',{"class":"icon-bg"})
                    #'name': soup.find('strong', {'class':'high'}).text

                    'size': size[i]
                    #"special": (specialTagz[n]).get_text(),
                    #"rate": re.findall(r'\d+',(rateTagz[n]).get_text()),
                    #"size": i.css(".sss-unit-size::text").extract(),
                    #"types": "Outside"

    }
            driver.close()

代码的理想输出是显示从此网页收集的数据:

The desired output of the code is to have it display the data collected from this webpage: http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?category=all

要这样做，需要向下滚动才能查看其余数据.至少这就是我的想法.

To do so would require being able to scroll down to view the rest of the data. At least that is how it would be done in my mind.

谢谢，DM123

抓取需要您向下滚动的网站 [英] scraping a website that requires you to scroll down

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

抓取需要您向下滚动的网站 [英] scraping a website that requires you to scroll down

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭