如何获取 Scrapy 项目中的字段顺序 [英] How to get order of fields in Scrapy item

查看:36
本文介绍了如何获取 Scrapy 项目中的字段顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣在一个scrapy项目中保持对字段名称顺序的引用.这是存储在哪里?

<预><代码>>>>目录(项目)出[7]:['_MutableMapping__marker','__抽象方法__','__班级__','__包含__','__delattr__','__deitem__','__dict__','__doc__','__eq__','__格式__','__getattr__','__getattribute__','__getitem__','__哈希__','__在里面__','__iter__','__len__','__元类__','__模块__','__ne__','__新的__','__减少__','__reduce_ex__','__repr__','__setattr__','__setitem__','__sizeof__','__插槽__','__str__','__子类钩子__','__weakref__','_abc_cache','_abc_negative_cache','_abc_negative_cache_version','_abc_registry','_班级','_values','清除','复制','领域','得到','项目','元素','iterkeys','迭代值',钥匙",'流行音乐','流行项目','默认设置','更新','价值观']

我试过 item.keys(),但它返回一个无序的字典

解决方案

Item 类有一个dict接口,将值存储在_values dict中,不保留跟踪密钥顺序(https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53).我相信您可以从 Item 子类化并覆盖 __init__ 方法以使该容器成为 Ordereddict:

from scrapy import Item从集合导入 OrderedDict类 OrderedItem(Item):def __init__(self, *args, **kwargs):self._values = OrderedDict()if args 或 kwargs: # 避免为最常见的情况创建 dict对于 k, v 在六个.iteritems(dict(*args, **kwargs)):自我[k] = v

该项目然后保留分配值的顺序:

在 [28]: class SomeItem(OrderedItem):...: a = Field()...: b = Field()...: c = Field()...: d = 字段()...:...: i = SomeItem()...: i['b'] = 'bbb'...: i['a'] = 'aaa'...: i['d'] = 'ddd'...: i['c'] = 'ccc'...: i.items()...:出[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]

I'm interested in keeping reference to the order of the field names in a scrapy item. where is this stored?

>>> dir(item)
Out[7]: 
['_MutableMapping__marker',
 '__abstractmethods__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__doc__',
 '__eq__',
 '__format__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__iter__',
 '__len__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_class',
 '_values',
 'clear',
 'copy',
 'fields',
 'get',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

I tried item.keys(), but that returns an unordered dict

解决方案

Item class has a dict interface, storing the values in the _values dict, which does not keep track of the key order (https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53). I believe you could subclass from Item and override the __init__ method to make that container an Ordereddict:

from scrapy import Item
from collections import OrderedDict

class OrderedItem(Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  # avoid creating dict for most common case
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

The item then preserves the order in which the values were assigned:

In [28]: class SomeItem(OrderedItem):
    ...:     a = Field()
    ...:     b = Field()
    ...:     c = Field()
    ...:     d = Field()
    ...: 
    ...: i = SomeItem()
    ...: i['b'] = 'bbb'
    ...: i['a'] = 'aaa'
    ...: i['d'] = 'ddd'
    ...: i['c'] = 'ccc'
    ...: i.items()
    ...: 
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]

这篇关于如何获取 Scrapy 项目中的字段顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆