如何使用scrapy抓取多个页面? [英] How to use scrapy to crawl multiple pages?

查看：50 发布时间：2021/7/16 21:47:27 python scrapy

本文介绍了如何使用scrapy抓取多个页面?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现的所有 Scrapy 示例都讨论了如何抓取单个页面、具有相同 url 架构的页面或网站的所有页面.我需要抓取一系列页面 A、B、C，其中在 A 中你得到了 B 的链接等等.例如网站结构是:

All examples i found of Scrapy talk about how to crawl a single page, pages with the same url schema or all the pages of a website. I need to crawl series of pages A, B, C where in A you got the link to B and so on.. For example the website structure is:

A
----> B
---------> C
D
E

我需要抓取所有的 C 页面，但要获得指向 C 的链接，我需要在 A 和 B 之前抓取.有什么提示吗?

I need to crawl all the C pages, but to get link to C i need to crawl before A and B. Any hints?

推荐答案

参见 scrapy 请求结构，要抓取这样的链，您必须使用如下回调参数:

see scrapy Request structure, to crawl such chain you'll have to use the callback parameter like the following:

class MySpider(BaseSpider):
    ...
    # spider starts here
    def parse(self, response):
        ...
        # A, D, E are done in parallel, A -> B -> C are done serially
        yield Request(url=<A url>,
                      ...
                      callback=parseA)
        yield Request(url=<D url>,
                      ...
                      callback=parseD)
        yield Request(url=<E url>,
                      ...
                      callback=parseE)

    def parseA(self, response):
        ...
        yield Request(url=<B url>,
                      ...
                      callback=parseB)

    def parseB(self, response):
        ...
        yield Request(url=<C url>,
                      ...
                      callback=parseC)

    def parseC(self, response):
        ...

    def parseD(self, response):
        ...

    def parseE(self, response):
        ...

这篇关于如何使用scrapy抓取多个页面?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用scrapy抓取多个页面? [英] How to use scrapy to crawl multiple pages?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用scrapy抓取多个页面? [英] How to use scrapy to crawl multiple pages?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭