检查python中的巨大列表是否已更改 [英] Check if huge list in python has changed

查看:84
本文介绍了检查python中的巨大列表是否已更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之:如何检查python中的巨大列表是否发生了变化? hashlib 需要一个缓冲区,并且构建该列表的字符串表示是不可行的。



有大量代表数据的字典。我对这些数据进行了大量的分析,但是所有分析都需要一些元数据方面的信息。一组主题(列表中的每个词典都有一个主题关键词,有时我只需要列出数据集中存在数据的所有主题)。所以我想实现以下内容:

$ p $ class Data:
def __init __(self,...) :
self.data = [{...},{...},...]#长屁股列表
self.subjects = set()
self.hash = 0

def get_subjects(self):
#仅在必要时重新计算一组主题
如果self.has_changed():
set(datum ['subject' ]为self.data中的数据)

返回self.subjects

def has_changed(self):
#计算self.data的散列值
hash = self.data.get_hash()#如何做到这一点?
changed = self.hash == hash
self.hash = hash#重置最后记忆哈希
返回已更改

问题是如何实现 has_changed 方法,或者更具体地说, get_hash (每个对象都有一个 __ hash __ 方法,但是默认情况下它只返回对象的 id 当我们例如将一个元素附加到一个列表中时,它会发生变化)。

解决方案

更复杂的方法是使用代理数据元素而不是本地列表和字典,这可能会标记对其属性的任何更改。为了使它更加灵活,你甚至可以编写一个回调函数用于任何修改。



所以,假设你只需要处理你的列表和字典数据结构 - 当访问对象上的任何数据改变方法时,我们可以使用从dict继承的类和带回调的列表完整的方法列表位于 http://docs.python.org/reference/datamodel.html

 # -  *  -  coding:utf-8  -  *  -  
#doctests和example的字符串:

>>> a = NotifierList()
>>> flag.has_changed
False
>>> a.append(NotifierDict())
>>> flag.has_changed
True
>>> flag.clear()
>>> flag.has_changed
False
>>> a [0] [ 状态] = 新
>>> flag.has_changed
True
>>>




changer_methods = set(__ setitem__ __setslice__ __delitem__ update append extend add insert pop popitem remove setdefault __iadd __。split())

$ b $ def callback_getter(obj):
def callback(name):
obj.has_changed = True
返回回调

def proxy_decorator (func,callback):
def wrapper(* args,** kw):
callback(func .__ name__)
return func(* args,** kw)
wrapper .__ name__ = func .__ name__
返回包装器

def proxy_class_factory(cls,obj):
new_dct = cls .__ dict __。copy()
for key,value in new_dct.items():如果在changer_methods中键入

new_dct [key] = proxy_decorator(value,callback_getter(obj))
返回类型(proxy _+ cls .__ name__,(cls ,),new_dct)


class标记(对象):
def __init __(self):
self.clear()
def clear(self ):
self.has_changed = False

flag = Flag()

NotifierList = proxy_class_factory(list,flag)
NotifierDict = proxy_class_factory(dict,flag)
pre>

2017更新



其中一个确实存在并学习:本地列表可以通过本地方法通过调用绕过神奇的方法。傻瓜证明系统是相同的方法,但继承自 collections.abc.MutableSequence 代替,保持本地列表作为代理对象的内部属性。


In short: what's the fasted way to check if a huge list in python has changed? hashlib needs a buffer, and building a string representation of that list is unfeasible.

In long: I've got a HUGE list of dictionaries representing data. I run a number of analyses on this data, but there are a few meta-data aspects that are required by all of the analyses, ie. the the set of subjects (each dict in the list has a subject key, and at times I just need a list of all subject who have data present in the data set.). So I'd like to implement the following:

class Data:
    def __init__(self, ...):
        self.data = [{...}, {...}, ...] # long ass list of dicts
        self.subjects = set()
        self.hash = 0

    def get_subjects(self):
        # recalculate set of subjects only if necessary
        if self.has_changed():
            set(datum['subject'] for datum in self.data)

        return self.subjects

    def has_changed(self):
        # calculate hash of self.data
        hash = self.data.get_hash() # HOW TO DO THIS?
        changed = self.hash == hash
        self.hash = hash # reset last remembered hash
        return changed

The question is how to implement the has_changed method, or more specifically, get_hash (each object already has a __hash__ method, but by default it just returns the object's id, which doesn't change when we e.g. append an element to a list).

解决方案

A more sophisticated approach there would be to work with proxy data elements instead of native lists and dictionaries, which could flag any change to their attributes. To make it more flexible, you could even code a callback to be used in case of any changes.

So, assuming you only have to deal with lists and dictionaries on your data structure - we can work with classes inheriting from dict and list with a callback when any data changing method on the object is accessed The full list of methods is in http://docs.python.org/reference/datamodel.html

# -*- coding: utf-8 -*-
# String for doctests and  example:
"""
            >>> a = NotifierList()
            >>> flag.has_changed
            False
            >>> a.append(NotifierDict())
            >>> flag.has_changed
            True
            >>> flag.clear()
            >>> flag.has_changed
            False
            >>> a[0]["status"]="new"
            >>> flag.has_changed
            True
            >>> 

"""


changer_methods = set("__setitem__ __setslice__ __delitem__ update append extend add insert pop popitem remove setdefault __iadd__".split())


def callback_getter(obj):
    def callback(name):
        obj.has_changed = True
    return callback

def proxy_decorator(func, callback):
    def wrapper(*args, **kw):
        callback(func.__name__)
        return func(*args, **kw)
    wrapper.__name__ = func.__name__
    return wrapper

def proxy_class_factory(cls, obj):
    new_dct = cls.__dict__.copy()
    for key, value in new_dct.items():
        if key in changer_methods:
            new_dct[key] = proxy_decorator(value, callback_getter(obj))
    return type("proxy_"+ cls.__name__, (cls,), new_dct)


class Flag(object):
    def __init__(self):
        self.clear()
    def clear(self):
        self.has_changed = False

flag = Flag()

NotifierList = proxy_class_factory(list, flag)
NotifierDict = proxy_class_factory(dict, flag)

2017 update

One does live and learn: native lists can be changed by native methods by calls that bypass the magic methods. The fool proof system is the same approach, but inheriting from collections.abc.MutableSequence instead, nd keeping a native list as an internal attribute of your proxy object.

这篇关于检查python中的巨大列表是否已更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆