使用刮擦命令“爬行”从django [英] Using scrapy command "crawl" from django

查看:130
本文介绍了使用刮擦命令“爬行”从django的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从django抓取一个蜘蛛(scrapy),现在的问题是,只有当我们在顶级目录(具有scrapy.cfg的目录)时,才能爬行蜘蛛。那么如何才能实现?

  ... / polls / managements / commands / mycommand.py 

from django.core.management.base import baseCommand
from scrapy.cmdline import execute
import os

class Command(BaseCommand):

def run_from_argv(self,argv):
print('in run_from_argv')
self._argv = argv
return self.execute()

def句柄,* args,** options):
#os.environ ['SCRAPY_SETTINGS_MODULE'] ='/ home / nabin / scraptut / newscrawler'
execute(self._argv [1:])

如果我尝试

  python manage.py mycommands crawl myspider 

然后我将无法使用。因为要使用爬网,我需要使用scrapy.cfg文件在顶级目录中。所以我想知道,这是可能的吗?

解决方案

在settings.py中我定义了:

  CRAWLER_PATH = os.path.join(os.path.dirname(BASE_DIR),'required path')

并执行了以下操作。

 从django.conf导入设置
os .chdir(settings.CRAWLER_PATH)


I am trying to crawl a spider (of scrapy) from django and now the problem is, the spider can be crawled only when we are at the top directory(directory with scrapy.cfg). So how can that be achieved?

.../polls/managements/commands/mycommand.py

from django.core.management.base import BaseCommand
from scrapy.cmdline import execute
import os

class Command(BaseCommand):

    def run_from_argv(self, argv):
        print ('In run_from_argv')
        self._argv = argv
        return self.execute()

    def handle(self, *args, **options):
        #os.environ['SCRAPY_SETTINGS_MODULE'] = '/home/nabin/scraptut/newscrawler'
        execute(self._argv[1:])

And if i try

python manage.py mycommands crawl myspider

then I won't be able. Because to use crawl i need to be in the top directory with scrapy.cfg file.. So I want to know, how that is possible?

解决方案

Ok i have found the solution myself.

In settings.py I defined:

CRAWLER_PATH = os.path.join(os.path.dirname(BASE_DIR), 'required path')

And did the following.

from django.conf import settings
os.chdir(settings.CRAWLER_PATH)

这篇关于使用刮擦命令“爬行”从django的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆