通过 .txt 文件向 Scrapy Spider 传递要抓取的 URL 列表 [英] Pass Scrapy Spider a list of URLs to crawl via .txt file

查看：31 发布时间：2021/12/17 14:20:53 python web-scraping scrapy command-line-arguments scrapy-spider

本文介绍了通过 .txt 文件向 Scrapy Spider 传递要抓取的 URL 列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对 Python 有点陌生，对 Scrapy 也很陌生.

I'm a little new to Python and very new to Scrapy.

我已经设置了一个蜘蛛来抓取和提取我需要的所有信息.但是，我需要将 URL 的 .txt 文件传递给 start_urls 变量.

I've set up a spider to crawl and extract all the information I need. However, I need to pass a .txt file of URLs to the start_urls variable.

例如:

class LinkChecker(BaseSpider):
    name = 'linkchecker'
    start_urls = [] #Here I want the list to start crawling a list of urls from a text file a pass via the command line.

我做了一些研究，但一直空手而归.我见过这种类型的例子(如何通过用户定义的参数在 scrapy spider)，但我认为这不适用于传递文本文件.

I've done a little bit of research and keep coming up empty handed. I've seen this type of example (How to pass a user defined argument in scrapy spider), but I don't think that will work for a passing a text file.

推荐答案

使用 -a 选项运行你的蜘蛛，例如:

Run your spider with -a option like:

scrapy crawl myspider -a filename=text.txt

然后在spider的__init__方法中读取文件并定义start_urls:

Then read the file in the __init__ method of the spider and define start_urls:

class MySpider(BaseSpider):
    name = 'myspider'

    def __init__(self, filename=None):
        if filename:
            with open(filename, 'r') as f:
                self.start_urls = f.readlines()

希望有所帮助.

这篇关于通过 .txt 文件向 Scrapy Spider 传递要抓取的 URL 列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过 .txt 文件向 Scrapy Spider 传递要抓取的 URL 列表 [英] Pass Scrapy Spider a list of URLs to crawl via .txt file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过 .txt 文件向 Scrapy Spider 传递要抓取的 URL 列表 [英] Pass Scrapy Spider a list of URLs to crawl via .txt file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭