如何获取 Scrapy 项目中的字段顺序 [英] How to get order of fields in Scrapy item
问题描述
我有兴趣在一个scrapy项目中保持对字段名称顺序的引用.这是存储在哪里?
<预><代码>>>>目录(项目)出[7]:['_MutableMapping__marker','__抽象方法__','__班级__','__包含__','__delattr__','__deitem__','__dict__','__doc__','__eq__','__格式__','__getattr__','__getattribute__','__getitem__','__哈希__','__在里面__','__iter__','__len__','__元类__','__模块__','__ne__','__新的__','__减少__','__reduce_ex__','__repr__','__setattr__','__setitem__','__sizeof__','__插槽__','__str__','__子类钩子__','__weakref__','_abc_cache','_abc_negative_cache','_abc_negative_cache_version','_abc_registry','_班级','_values','清除','复制','领域','得到','项目','元素','iterkeys','迭代值',钥匙",'流行音乐','流行项目','默认设置','更新','价值观']我试过 item.keys(),但它返回一个无序的字典
Item
类有一个dict接口,将值存储在_values
dict中,不保留跟踪密钥顺序(https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53).我相信您可以从 Item
子类化并覆盖 __init__
方法以使该容器成为 Ordereddict
:
from scrapy import Item从集合导入 OrderedDict类 OrderedItem(Item):def __init__(self, *args, **kwargs):self._values = OrderedDict()if args 或 kwargs: # 避免为最常见的情况创建 dict对于 k, v 在六个.iteritems(dict(*args, **kwargs)):自我[k] = v
该项目然后保留分配值的顺序:
在 [28]: class SomeItem(OrderedItem):...: a = Field()...: b = Field()...: c = Field()...: d = 字段()...:...: i = SomeItem()...: i['b'] = 'bbb'...: i['a'] = 'aaa'...: i['d'] = 'ddd'...: i['c'] = 'ccc'...: i.items()...:出[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]
I'm interested in keeping reference to the order of the field names in a scrapy item. where is this stored?
>>> dir(item)
Out[7]:
['_MutableMapping__marker',
'__abstractmethods__',
'__class__',
'__contains__',
'__delattr__',
'__delitem__',
'__dict__',
'__doc__',
'__eq__',
'__format__',
'__getattr__',
'__getattribute__',
'__getitem__',
'__hash__',
'__init__',
'__iter__',
'__len__',
'__metaclass__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__slots__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_cache',
'_abc_negative_cache',
'_abc_negative_cache_version',
'_abc_registry',
'_class',
'_values',
'clear',
'copy',
'fields',
'get',
'items',
'iteritems',
'iterkeys',
'itervalues',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values']
I tried item.keys(), but that returns an unordered dict
Item
class has a dict interface, storing the values in the _values
dict, which does not keep track of the key order (https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53). I believe you could subclass from Item
and override the __init__
method to make that container an Ordereddict
:
from scrapy import Item
from collections import OrderedDict
class OrderedItem(Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs: # avoid creating dict for most common case
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
The item then preserves the order in which the values were assigned:
In [28]: class SomeItem(OrderedItem):
...: a = Field()
...: b = Field()
...: c = Field()
...: d = Field()
...:
...: i = SomeItem()
...: i['b'] = 'bbb'
...: i['a'] = 'aaa'
...: i['d'] = 'ddd'
...: i['c'] = 'ccc'
...: i.items()
...:
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]
这篇关于如何获取 Scrapy 项目中的字段顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!