使用刮擦命令“爬行”从django [英] Using scrapy command "crawl" from django
本文介绍了使用刮擦命令“爬行”从django的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
... / polls / managements / commands / mycommand.py
from django.core.management.base import baseCommand
from scrapy.cmdline import execute
import os
class Command(BaseCommand):
def run_from_argv(self,argv):
print('in run_from_argv')
self._argv = argv
return self.execute()
def句柄,* args,** options):
#os.environ ['SCRAPY_SETTINGS_MODULE'] ='/ home / nabin / scraptut / newscrawler'
execute(self._argv [1:])
如果我尝试
python manage.py mycommands crawl myspider
然后我将无法使用。因为要使用爬网,我需要使用scrapy.cfg文件在顶级目录中。所以我想知道,这是可能的吗?
解决方案我们已经找到了解决方案。
在settings.py中我定义了:
CRAWLER_PATH = os.path.join(os.path.dirname(BASE_DIR),'required path')
并执行了以下操作。
从django.conf导入设置
os .chdir(settings.CRAWLER_PATH)
I am trying to crawl a spider (of scrapy) from django and now the problem is, the spider can be crawled only when we are at the top directory(directory with scrapy.cfg). So how can that be achieved?
.../polls/managements/commands/mycommand.py
from django.core.management.base import BaseCommand
from scrapy.cmdline import execute
import os
class Command(BaseCommand):
def run_from_argv(self, argv):
print ('In run_from_argv')
self._argv = argv
return self.execute()
def handle(self, *args, **options):
#os.environ['SCRAPY_SETTINGS_MODULE'] = '/home/nabin/scraptut/newscrawler'
execute(self._argv[1:])
And if i try
python manage.py mycommands crawl myspider
then I won't be able. Because to use crawl i need to be in the top directory with scrapy.cfg file.. So I want to know, how that is possible?
解决方案
Ok i have found the solution myself.
In settings.py I defined:
CRAWLER_PATH = os.path.join(os.path.dirname(BASE_DIR), 'required path')
And did the following.
from django.conf import settings
os.chdir(settings.CRAWLER_PATH)
这篇关于使用刮擦命令“爬行”从django的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文