Django定制管理命令运行Scrapy:如何包括Scrapy的选项? [英] Django custom management command running Scrapy: How to include Scrapy's options?

查看:171
本文介绍了Django定制管理命令运行Scrapy:如何包括Scrapy的选项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够在Django内运行 Scraging Web爬行框架。 Scrapy本身只提供一个命令行工具 scrapy 来执行其命令,即该工具没有被有意地写入外部程序调用。



用户 Mikhail Korobov 提出了一个解决方案,即从Django自定义管理命令中调用Scrapy。为了方便起见,我在这里重复他的解决方案:

 # -  *  - 编码:utf-8  -  *  -  
# myapp / management / commands / scrapy.py
$ b来自__future__ import import_import
from django.core.management.base import BaseCommand

class Command(BaseCommand):

def run_from_argv(self,argv):
self._argv = argv
return super(Command,self).run_from_argv(argv)

def handle (self,* args,** options):
from scrapy.cmdline import execute
execute(self._argv [1:])

而不是调用eg scrapy crawl domain.com 我现在可以在Django项目中执行 python manage.py scrapy crawl domain.com 。但是,Scrapy命令的选项根本不解析。如果我执行 python manage.py scrapy crawl domain.com -o scred_data.json -t json ,我只收到以下响应:

 用法:manage.py scrapy [options] 

manage.py:error:没有这样的选项:-o

所以我的问题是如何扩展自定义管理命令以采用Scrapy的命令行选项? / p>

不幸的是,Django的这部分的文档不是很广泛。我还阅读了Python的 optparse模块的文档,但之后不清楚我。任何人都可以在这方面帮助我吗?非常感谢!

解决方案

好的,我找到了解决问题的办法。这有点丑,但它的作品。由于Django项目的 manage.py 命令不接受Scrapy的命令行选项,所以我将选项字符串分为两个参数,这些参数被 manage.py 。成功解析之后,我重新加入了两个论点,并将它们传递给Scrapy。



而不是写

  python manage.py scrapy crawl domain.com -o scred_data.json -t json 

我在这样的选项之间放置空格

  python manage.py scrapy crawl domain.com  -  o scred_data.json  -  t json 

我的句柄函数如下所示:

  def handle(self,* args,** options):
arguments = self._argv [1:]
for arguments in arguments :
if arg in(' - ',' - '):
i = arguments.index(arg)
new_arg =''.join((arguments [i],arguments [i +1]))
del arguments [i:i + 2]
arguments.insert(i,new_arg)

from scrapy.cmdline import execute
execute(参数)






同时,Mikhail Korobov已经提供了最佳解决方案。参见这里:

 # -  *  - 编码:utf-8  -  *  -  
#myapp /来自__future__导入的scrapy.py

从django.core.management.base导入的absolute_import
导入BaseCommand

类命令(BaseCommand):

def run_from_argv(self,argv):
self._argv = argv
self.execute()

def handle(self,* args,** options):
从scrapy.cmdline import执行
execute(self._argv [1:])


I want to be able to run the Scrapy web crawling framework from within Django. Scrapy itself only provides a command line tool scrapy to execute its commands, i.e. the tool was not intentionally written to be called from an external program.

The user Mikhail Korobov came up with a nice solution, namely to call Scrapy from a Django custom management command. For convenience, I repeat his solution here:

# -*- coding: utf-8 -*-
# myapp/management/commands/scrapy.py 

from __future__ import absolute_import
from django.core.management.base import BaseCommand

class Command(BaseCommand):

    def run_from_argv(self, argv):
        self._argv = argv
        return super(Command, self).run_from_argv(argv)

    def handle(self, *args, **options):
        from scrapy.cmdline import execute
        execute(self._argv[1:])

Instead of calling e.g. scrapy crawl domain.com I can now do python manage.py scrapy crawl domain.com from within a Django project. However, the options of a Scrapy command are not parsed at all. If I do python manage.py scrapy crawl domain.com -o scraped_data.json -t json, I only get the following response:

Usage: manage.py scrapy [options] 

manage.py: error: no such option: -o

So my question is, how to extend the custom management command to adopt Scrapy's command line options?

Unfortunately, Django's documentation of this part is not very extensive. I've also read the documentation of Python's optparse module but afterwards it was not clearer to me. Can anyone help me in this respect? Thanks a lot in advance!

解决方案

Okay, I have found a solution to my problem. It's a bit ugly but it works. Since the Django project's manage.py command does not accept Scrapy's command line options, I split the options string into two arguments which are accepted by manage.py. After successful parsing, I rejoin the two arguments and pass them to Scrapy.

That is, instead of writing

python manage.py scrapy crawl domain.com -o scraped_data.json -t json

I put spaces in between the options like this

python manage.py scrapy crawl domain.com - o scraped_data.json - t json

My handle function looks like this:

def handle(self, *args, **options):
    arguments = self._argv[1:]
    for arg in arguments:
        if arg in ('-', '--'):
            i = arguments.index(arg)
            new_arg = ''.join((arguments[i], arguments[i+1]))
            del arguments[i:i+2]
            arguments.insert(i, new_arg)

    from scrapy.cmdline import execute
    execute(arguments)


Meanwhile, Mikhail Korobov has provided the optimal solution. See here:

# -*- coding: utf-8 -*- 
# myapp/management/commands/scrapy.py 

from __future__ import absolute_import
from django.core.management.base import BaseCommand

class Command(BaseCommand):

    def run_from_argv(self, argv):
        self._argv = argv
        self.execute()

    def handle(self, *args, **options):
        from scrapy.cmdline import execute
        execute(self._argv[1:])

这篇关于Django定制管理命令运行Scrapy:如何包括Scrapy的选项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆