python scrapy parse()函数,返回值返回到哪里? [英] python scrapy parse() function, where is the return value returned to?

查看:63
本文介绍了python scrapy parse()函数,返回值返回到哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scrapy 的新手,如果这个问题是微不足道的,我很抱歉.我已经从官方网页上阅读了 Scrapy 上的文档.当我浏览文档时,我遇到了这个例子:

I am new on Scrapy, and I am sorry if this question is trivial. I have read the document on Scrapy from official webpage. And while I look through the document, I met this example:

import scrapy
from myproject.items import MyItem

class MySpider(scrapy.Spider):
  name = ’example.com’
  allowed_domains = [’example.com’]
  start_urls = [
  ’http://www.example.com/1.html’,
  ’http://www.example.com/2.html’,
  ’http://www.example.com/3.html’,
  ]

  def parse(self, response):
    for h3 in response.xpath(’//h3’).extract():
      yield MyItem(title=h3)
    for url in response.xpath(’//a/@href’).extract():
      yield scrapy.Request(url, callback=self.parse) 

我知道,解析方法必须返回一个项目或/和请求,但这些返回值返回到哪里?

I know, the parse method must return an item or/and request, but where are these return values returned to?

一个是项目,另一个是请求,我认为这两种类型的处理方式不同,在 CrawlSpider 的情况下,它具有带回调的规则.这个回调的返回值呢?去哪儿 ?和 parse() 一样吗?

One is an item and the other is request, I think these two type would be handled differently and in the case of CrawlSpider, it has Rule with callback. What about this callback's return value? where to ? same as parse()?

我对 Scrapy 程序很困惑,即使我阅读了文档....

I am very confused on Scrapy procedure, even i read the document....

推荐答案

根据 文档:

parse() 方法负责处理响应和返回抓取的数据(作为 Item 对象)和更多要遵循的 URL(作为请求对象).

The parse() method is in charge of processing the response and returning scraped data (as Item objects) and more URLs to follow (as Request objects).

换句话说,返回/产生的物品和请求的处理方式不同,物品被交给物品管道和物品导出器,但请求被放入调度器,后者将请求通过管道传送到Downloader 用于发出请求并返回响应.然后,引擎接收响应并将其提供给蜘蛛进行处理(到 callback 方法).

In other words, returned/yielded items and requests are handled differently, items are being handed to the item pipelines and item exporters, but requests are being put into the Scheduler which pipes the requests to the Downloader for making a request and returning a response. Then, the engine receives the response and gives it to the spider for processing (to the callback method).

整个数据流过程在架构概览页面中有非常详细的描述方式.

The whole data-flow process is described in the Architecture Overview page in a very detailed manner.

希望有所帮助.

这篇关于python scrapy parse()函数,返回值返回到哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆