Scrapy:如何在Spider中使用项目以及如何将项目发送到管道? [英] Scrapy: how to use items in spider and how to send items to pipelines?

查看：191 发布时间：2020/7/6 6:49:54 python scrapy scrapy-spider scrapy-pipeline

本文介绍了Scrapy:如何在Spider中使用项目以及如何将项目发送到管道?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是scrapy的新手，我的任务很简单:

I am new to scrapy and my task is simple:

对于给定的电子商务网站:

For a given e-commerce website:

抓取所有网站页面

crawl all website pages

查找产品页面

如果URL指向产品页面

If the URL point to a product page

创建项目

处理该项目以将其存储在数据库中

Process the item to store it in a database

我创建了蜘蛛，但是产品只是打印在一个简单的文件中.

I created the spider but products are just printed in a simple file.

我的问题是关于项目结构的:如何在Spider中使用项目以及如何将项目发送到管道?

My question is about the project structure: how to use items in spider and how to send items to pipelines ?

我找不到使用项目和管道的项目的简单示例.

I can't find a simple example of a project using items and pipelines.

推荐答案

如何使用蜘蛛网中的物品?

好吧，项目的主要目的是存储您爬网的数据. scrapy.Items基本上是字典.要声明您的物品，您将必须创建一个类并在其中添加scrapy.Field:

Well, the main purpose of items is to store the data you crawled. scrapy.Items are basically dictionaries. To declare your items, you will have to create a class and add scrapy.Field in it:

import scrapy

class Product(scrapy.Item):
    url = scrapy.Field()
    title = scrapy.Field()

您现在可以通过导入产品在蜘蛛中使用它.

You can now use it in your spider by importing your Product.

有关高级信息，我让您检查文档这里

For advanced information, I let you check the doc here

如何将项目发送到管道?

首先，您需要告诉蜘蛛使用您的custom pipeline.

First, you need to tell to your spider to use your custom pipeline.

在 settings.py 文件中:

ITEM_PIPELINES = {
    'myproject.pipelines.CustomPipeline': 300,
}

您现在可以编写管道并处理您的项目.

You can now write your pipeline and play with your item.

在 pipeline.py 文件中:

from scrapy.exceptions import DropItem

class CustomPipeline(object):
   def __init__(self):
        # Create your database connection

    def process_item(self, item, spider):
        # Here you can index your item
        return item

最后，在您的蜘蛛中，一旦物品被填满，就需要yield.

Finally, in your spider, you need to yield your item once it is filled.

spider.py 示例:

import scrapy
from myspider.items import Product

class MySpider(scrapy.Spider):
    name = "test"
    start_urls = [
        'http://www.exemple.com',
    ]
def parse(self, response):
    doc = Product()
    doc['url'] = response.url
    doc['title'] = response.xpath('//div/p/text()')
    yield doc # Will go to your pipeline

希望这会有所帮助，这是管道的文档:项目管道

Hope this helps, here is the doc for pipelines: Item Pipeline

这篇关于Scrapy:如何在Spider中使用项目以及如何将项目发送到管道?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy:如何在Spider中使用项目以及如何将项目发送到管道? [英] Scrapy: how to use items in spider and how to send items to pipelines?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy:如何在Spider中使用项目以及如何将项目发送到管道? [英] Scrapy: how to use items in spider and how to send items to pipelines?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭