在Scrapy项目中使用Django的模型(在管道中) [英] Use Django's models in a Scrapy project (in the pipeline)

查看:140
本文介绍了在Scrapy项目中使用Django的模型(在管道中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前曾有人问过,但总会出现的答案是使用 DjangoItem .但是它在github上指出:

This has been asked before but the answer that always comes up is to use DjangoItem. However it states on it's github that:

对于写密集型应用程序(例如Web爬网程序)而言,通常不是一个好的选择...可能无法很好地扩展

often not a good choice for a write intensive applications (such as a web crawler) ... may not scale well

这是我问题的症结所在,我想以与运行 python manage.py shell 时相同的方式使用django模型并与之交互>,然后从myapp.models中导入 .使用查询如此处所示.

This is the crux of my problem, I'd like to use and interact with my django model in the same way I can when I run python manage.py shell and I do from myapp.models import Model1. Using queries like seen here.

我尝试了相对导入,并将我的整个scrapy项目移到了django应用程序中,但都无济于事.

I have tried relative imports and moving my whole scrapy project inside my django app, both to no avail.

我应该将我的拼凑项目移到哪里进行这项工作?我该如何重新创建/使用Scrapy管道内的Shell中所有可用的方法?

Where should I move my scrapy project to for this to work? How can I recreate / use all the methods that are available to me in the shell inside a scrapy pipeline?

先谢谢了.

推荐答案

在这里,我创建了一个示例项目,该项目在django中使用了scrapy.并在管道之一中使用Django模型和ORM.

In here i have create a sample project which uses scrapy inside django. And uses Django models and ORM in the one of the pipelines.

https://github.com/bipul21/scrapy_django

目录结构从django项目开始. 在这种情况下,项目名称为 django_project . 进入基础项目后,即可创建您的scrapy项目,即在此处 scrapy_project

The directory structure starts with your django project. In this case the the project name is django_project. Once inside the base project you create your scrapy project i.e. scrapy_project here

在您的scrapy项目设置中,添加以下行以设置初始化django

In your scrapy project settings add the following line to setup initialize django

import os
import sys
import django

sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), ".."))
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'

django.setup()

在管道中,我对问题模型进行了简单的查询

In the pipeline i have made a simple query to Question Model

from questions.models import Questions

class ScrapyProjectPipeline(object):
    def process_item(self, item, spider):
        try:
            question = Questions.objects.get(identifier=item["identifier"])
            print "Question already exist"
            return item
        except Questions.DoesNotExist:
            pass

        question = Questions()
        question.identifier = item["identifier"]
        question.title = item["title"]
        question.url = item["url"]
        question.save()
        return item

您可以在项目中签入任何更多详细信息,例如模型架构.

You can check in the project for any further details like model schema.

这篇关于在Scrapy项目中使用Django的模型(在管道中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆