如何像一个简单的脚本一样以编程方式运行一个scrapy蜘蛛? [英] How run a scrapy spider programmatically like a simple script?

查看:47
本文介绍了如何像一个简单的脚本一样以编程方式运行一个scrapy蜘蛛?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个 Scrapy 蜘蛛.但我想将它作为脚本运行.我怎么能做到这一点.现在我可以在终端中通过这个命令运行:

I created a Scrapy spider. But I wanna run it as a script. How I can do this. Now I am able to run by this command in terminal:

$ scrapy crawl book -o book.json

但我想像一个简单的python脚本一样运行它

But I want to run it like a simple python script

推荐答案

可以不用project直接在python脚本中运行spider.

You can run spider directly in python script without using project.

您必须使用 scrapy.crawler.CrawlerProcessscrapy.crawler.CrawlerRunner
但我不确定它是否具有项目中的所有功能.

You have to use scrapy.crawler.CrawlerProcess or scrapy.crawler.CrawlerRunner
but I'm not sure if it has all functionality as in project.

在文档中查看更多信息:常见做法

See more in documentation: Common Practices

或者你可以把你的命令放在 Linux 上的 bash 脚本或 Windows 上的 .bat 文件中.

Or you can put your command in bash script on Linux or in .bat file on Windows.

顺便说一句:在Linux上,您可以在第一行(#!/bin/bash)中添加shebang并设置属性executable"-
IE.chmod +x your_script - 它将像正常程序一样运行.

BTW: on Linux you can add shebang in first line (#!/bin/bash) and set attribute "executable" -
ie. chmod +x your_script - and it will run as normal program.

工作示例

#!/usr/bin/env python3

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    allowed_domains = ['http://quotes.toqoute.com']

    #start_urls = []

    #def start_requests(self):
    #    for tag in self.tags:
    #        for page in range(self.pages):
    #            url = self.url_template.format(tag, page)
    #            yield scrapy.Request(url)

    def parse(self, response):
        print('url:', response.url)

# --- it runs without project and saves in `output.csv` ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'output.csv',
})
c.crawl(MySpider)
c.start()

这篇关于如何像一个简单的脚本一样以编程方式运行一个scrapy蜘蛛?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆