在词典中找到混合类型值的重复项 [英] Find duplicates for mixed type values in dictionaries

查看：110 发布时间：2017/5/24 21:20:26 python dictionary hash pickle

本文介绍了在词典中找到混合类型值的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在字典中识别和分组重复值。为了做到这一点，我建立一个伪数据（更好地阅读我们的数据集），如下所示：

I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow:

from pickle import dumps
taxonomy = {}
binder = defaultdict(list)
for key, value in ds.items():
    signature = dumps(value)
    taxonomy[signature] = value
    binder[signature].append(key)

有关具体用例，请参阅此问题

For a concrete use-case see this question.

不幸的是，我意识到如果以下语句是 True ：

Unfortunately I realized that if the following statement is True:

>>> ds['key1'] == ds['key2']
True

不总是 True

>>> dumps(ds['key1']) == dumps(ds['key2'])
False

$ b $我注意到倾销产出的关键顺序对于两者都不同。如果我 / 粘贴，则输出 ds ['key1'] 和 ds [ 'key2'] 进入新的字典我可以使比较成功。

I notice the key order on the dumped output differ for both dict. If I copy/paste the output of ds['key1'] and ds['key2'] into new dictionaries I can make the comparison successful.

作为一个过分的替代方法，我可以递归地遍历我的数据集，并用 OrderedDict替换 dict ：

As an overkill alternative I could traverse my dataset recursively and replace dict instances with OrderedDict:

import copy
def faithfulrepr(od):
    od = od.deepcopy(od)
    if isinstance(od, collections.Mapping):
        res = collections.OrderedDict()
        for k, v in sorted(od.items()):
            res[k] = faithfulrepr(v)
        return repr(res)
    if isinstance(od, list):
        for i, v in enumerate(od):
            od[i] = faithfulrepr(v)
        return repr(od)
    return repr(od)

>>> faithfulrepr(ds['key1']) == faithfulrepr(ds['key2'])
True

$ b $我很担心这个天真的做法，因为我不知道我是否涵盖了所有可能的情况。

I am worried about this naive approach because I do not know whether I cover all the possible situations.

我可以使用什么其他（通用）替代方案？

What other (generic) alternative can I use?

推荐答案

首先是删除对这个瓶颈的 deepcopy 的调用：

The first thing is to remove the call to deepcopy which is your bottleneck here:

def faithfulrepr(ds):
    if isinstance(ds, collections.Mapping):
        res = collections.OrderedDict(
            (k, faithfulrepr(v)) for k, v in sorted(ds.items())
        )
    elif isinstance(ds, list):
        res = [faithfulrepr(v) for v in ds]
    else:
        res = ds
    return repr(res)

然而和 repr 有其缺点：

你可以不要使用与不同类型密钥的映射

所以第二件事是摆脱 faithfulrepr 并将对象与 __ eq __ ：

So the second thing is to get rid of faithfulrepr and compare objects with __eq__:

binder, values = [], []
for key, value in ds.items():
    try:
        index = values.index(value)
    except ValueError:
        values.append(value)
        binder.append([key])
    else:
        binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))

这篇关于在词典中找到混合类型值的重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在词典中找到混合类型值的重复项 [英] Find duplicates for mixed type values in dictionaries

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在词典中找到混合类型值的重复项 [英] Find duplicates for mixed type values in dictionaries

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭