如何以正确的顺序导入 Scrapy 项目键? [英] How to import Scrapy item keys in the correct order?

查看:21
本文介绍了如何以正确的顺序导入 Scrapy 项目键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将 Scrapy 项目密钥从 items.py 导入到 pipelines.py.问题在于导入项目的顺序与它们在items.py文件中的定义方式不同.

I am importing the Scrapy item keys from items.py, into pipelines.py. The problem is that the order of the imported items are different from how they were defined in the items.py file.

我的 items.py 文件:

class NewAdsItem(Item):
    AdId        = Field()
    DateR       = Field()
    AdURL       = Field()

在我的 pipelines.py 中:

from adbot.items import NewAdsItem
...
def open_spider(self, spider):
     self.ikeys = NewAdsItem.fields.keys()
     print("Keys in pipelines: 	%s" % ",".join(self.ikeys) )
     #self.createDbTable(ikeys)

输出为:

Keys in pipelines:  AdId,AdURL,DateR

而不是预期的:AdId,DateR,AdURL.

如何确保导入的订单保持不变?

注意:这可能与 How to get order of fields in Scrapy item,但它根本不是很清楚发生了什么,因为 Python3 文档指出列表和字典应该保留它们的顺序.还要注意,当使用 process_item() 和使用 item.keys() 时,顺序是保留的!但是我需要访问 keys 以便 before item 被刮掉.

Note: This might be related to How to get order of fields in Scrapy item, but it's not at all very clear what's going on, since Python3 docs state that lists and dicts should retain their order. Also note, that when using process_item() and using item.keys(), the order is retained! But I need to access the keys in order before item's are scraped.

推荐答案

我可以让它发挥作用的唯一方法是使用 此解决方案采用以下方式.

The only way I could get this to work, was to use this solution in the following manner.

我的items.py文件:

My items.py file:

from scrapy.item import Item, Field
from collections import OrderedDict
from types import FunctionType

class StaticOrderHelper(type):
    # Requires Python3
    def __prepare__(name, bases, **kwargs):
        return OrderedDict()

    def __new__(mcls, name, bases, namespace, **kwargs):
        namespace['_field_order'] = [
                k
                for k, v in namespace.items()
                if not k.startswith('__') and not k.endswith('__')
                    and not isinstance(v, (FunctionType, classmethod, staticmethod))
        ]
        return type.__new__(mcls, name, bases, namespace, **kwargs)

class NewAdsItem(metaclass=StaticOrderHelper):
    AdId        = Field()
    DateR       = Field()
    AdURL       = Field()

然后将 _field_order 项目导入到您的 piplines.py 中:

Then import the _field_order item into your piplines.py with:

...
from adbot.items import NewAdsItem
...
class DbPipeline(object):
    ikeys = NewAdsItem._field_order
    ...
    def createDbTable(self):
        print("Creating new table: %s" % self.dbtable )
        print("Keys in creatDbTable: 	%s" % ",".join(self.ikeys) )
        ...

我现在可以按照正确的出现顺序创建新的数据库表,而不必担心 Python 以意想不到的方式对 dict 进行排序的奇怪方式.

I can now create new DB tables in the correct order of appearance, without worrying of Python's weird way of sorting dicts in unexpected ways.

这篇关于如何以正确的顺序导入 Scrapy 项目键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆