是否可以同时抓取多个 start_urls 列表 [英] Is it possible to crawl multiple start_urls list simultaneously

查看：57 发布时间：2021/7/16 22:26:24 python scrapy

本文介绍了是否可以同时抓取多个 start_urls 列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有 3 个 URL 文件，它们都具有相同的结构，因此所有列表都可以使用相同的蜘蛛.一个特殊的需求是需要同时抓取所有三个.

I have 3 URL files all of them have same structure so same spider can be used for all lists. A special need is that all three need to be crawled simultaneously.

是否可以在不创建多个蜘蛛的情况下同时抓取它们?

is it possible to crawl them simultaneously without creating multiple spiders?

我相信这个答案

start_urls = ["http://example.com/category/top/page-%d/" % i for i in xrange(4)] + \
["http://example.com/superurl/top/page-%d/" % i for i in xrange(55)]

在用scrapy抓取多个url只加入两个列表，但不能运行他们同时.

in Scrap multiple urls with scrapy only joins two list, but not to run them at the same time.

非常感谢

推荐答案

使用 start_requests 而不是 start_urls ... 这对你有用

use start_requests instead of start_urls ... this will work for u

class MySpider(scrapy.Spider):
name = 'myspider'

def start_requests(self):
    for page in range(1,20):
        yield self.make_requests_from_url('https://www.example.com/page-%s' %page)

这篇关于是否可以同时抓取多个 start_urls 列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否可以同时抓取多个 start_urls 列表 [英] Is it possible to crawl multiple start_urls list simultaneously

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

是否可以同时抓取多个 start_urls 列表 [英] Is it possible to crawl multiple start_urls list simultaneously

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭