用 Scrapy 抓取 ajax 页面? [英] Scraping ajax page with Scrapy?

查看:26
本文介绍了用 Scrapy 抓取 ajax 页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Scrapy 从这个页面抓取数据

I'm using Scrapy for scrape data from this page

https://www.bricoetloisirs.ch/magasins/gardena

产品列表动态显示.查找 url 以获取产品

Product list appears dynamically. Find url to get products

https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&ui=2&;carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272

但是当我用 Scrapy 抓取它时,它给了我一个空白页面

But when i scrape it by Scrapy it give me empty page

<span class="pageSizeInformation" id="page0" data-page="0" data-pagesize="12">Page: 0 / Size: 12</span>

这是我的代码

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
        ]

    def parse(self, response):
        print response.body

推荐答案

我解决了这个问题.

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/magasins/gardena'
        ]

    def parse(self, response):
        for page in xrange(1, 50):
            url = response.url + '/.do?page=%s&_=1473841539272' % page
            yield scrapy.Request(url, callback=self.parse_page)

    def parse_page(self, response):
        print response.body

这篇关于用 Scrapy 抓取 ajax 页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆