将字典与不可比较或无法比较的值进行比较? (例如列表或数据框) [英] Compare Dictionaries with unhashable or uncomparable values? (e.g. Lists or Dataframes)

查看:115
本文介绍了将字典与不可比较或无法比较的值进行比较? (例如列表或数据框)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR:如果两个python字典中有些具有不可重用/可变的值(例如列表或大熊猫Dataframe),您可以如何比较两个python字典?

TL;DR: How can you compare two python dictionaries if some of them have values which are unhashable/mutable (e.g. lists or pandas Dataframes)?

我必须比较字典对才能相等。在这个意义上,这个问题与这两个问题相似,但是他们的解决方案似乎只适用于不可变对象 ...

I have to compare dictionary pairs for equality. In that sense, this question is similar to these two, but their solutions only seem to work for immutable objects...

  • Is there a better way to compare dictionary values
  • Comparing two dictionaries in Python

我的问题是,我正在处理成对的高度嵌套字典,其中不同的对象可以在不同的地方找到,这取决于我比较的哪一对字典。我的想法是,我需要迭代字典中包含的最便宜的值,不能仅仅依靠 dict.iteritems(),它只能展开最高的键值对。我不知道如何遍历字典中包含的所有可能的键值对,并使用set / ==对于可哈希对象进行比较,并在熊猫数据框的情况下运行 df1.equals (df2)。(熊猫数据框注意,刚刚运行 df1 == df2 做分段比较,NA处理不好。 df1.equals(df2)绕过这个伎俩。)

My problem, is that I'm dealing with pairs of highly nested dictionaries where the unhashable objects could be found in different places depending on which pair of dictionaries I'm comparing. My thinking is that I'll need to iterate across the deapest values contained in the dictionary and can't just rely on the dict.iteritems() which only unrolls the highest key-value pairs. I'm not sure how iterate across all the possible key-value pairs contained in the dictionary and compare them either using sets/== for the hashable objects and in the cases of pandas dataframes, running df1.equals(df2). (Note for pandas dataframe, just running df1==df2 does a piecewise comparison and NA's are poorly handled. df1.equals(df2) gets around that does the trick.)

所以例如:

a = {'x': 1, 'y': {'z': "George", 'w': df1}}
b = {'x': 1, 'y': {'z': "George", 'w': df1}}
c = {'x': 1, 'y': {'z': "George", 'w': df2}}

至少,这将是非常可怕的,解决方案将产生TRUE / FALSE是否他们的值是相同的,并且适用于熊猫数据框。

At a minimum, and this would be pretty awesome already, the solution would yield TRUE/FALSE as to whether their values are the same and would work for pandas dataframes.

def dict_compare(d1, d2):
   if ...
      return True
   elif ...
      return False

dict_compare(a,b)
>>> True
dict_compare(a,c)
>>> False

中度更好:解决方案将指出什么键/值

Moderately better: the solution would point out what key/values would be different across the dictionaries.

在理想的情况下:解决方案可以将值分成4个组:

In the ideal case: the solution could separate the values into 4 groupings:


  • 已添加,

  • 已删除,

  • 修改

  • 相同

  • added,
  • removed,
  • modified
  • same

推荐答案

嗯,有一种方法可以使任何类型相似:它在一个类似你需要的类中:

Well, there's a way to make any type comparable: Simply wrap it in a class that compares like you need it:

class DataFrameWrapper():
    def __init__(self, df):
        self.df = df

    def __eq__(self, other):
        return self.df.equals(other.df)

所以当你打包无法比较的值时,你现在可以使用 ==

So when you wrap your "uncomparable" values you can now simply use ==:

>>> import pandas as pd

>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})

>>> a = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> b = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> c = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df2)}}
>>> a == b
True
>>> a == c
False

当然包装你的价值有它的缺点,但如果你只需要比较它们,这将是一个非常简单的方法。所有可能需要的是在执行比较之前的递归包装,然后是递归展开:

Of course wrapping your values has it's disadvantages but if you only need to compare them that would be a very easy approach. All that may be needed is a recursive wrapping before doing the comparison and a recursive unwrapping afterwards:

def recursivewrap(dict_):
    for key, value in dict_.items():
        wrapper = wrappers.get(type(value), lambda x: x)  # for other types don't wrap
        dict_[key] = wrapper(value)
    return dict_  # return dict_ so this function can be used for recursion

def recursiveunwrap(dict_):
    for key, value in dict_.items():
        unwrapper = unwrappers.get(type(value), lambda x: x)
        dict_[key] = unwrapper(value)
    return dict_

wrappers = {pd.DataFrame: DataFrameWrapper,
            dict: recursivewrap}
unwrappers = {DataFrameWrapper: lambda x: x.df,
              dict: recursiveunwrap}

示例案例:

>>> recursivewrap(a)
{'x': 1,
 'y': {'w': <__main__.DataFrameWrapper at 0x2affddcc048>, 'z': 'George'}}
>>> recursiveunwrap(recursivewrap(a))
{'x': 1, 'y': {'w':    a
  0  1
  1  2
  2  3, 'z': 'George'}}

如果你觉得真的很冒险,可以使用根据比较的包装类结果修改一些保存信息不相等的变量。

If you feel really adventurous you could use wrapper classes that depending on the comparison result modify some variable that holds the information what wasn't equal.

这部分答案是基于原始的不包括嵌套的问题:

This part of the answer was based on the original question that didn't include nestings:

您可以从可散列值中分离不可重用的值,并对散列值进行设置比较,并且可以使用独立于订单不兼容的列表比较:

You can seperate the unhashable values from the hashable values and do a set-comparison for the hashable values and a "order-independant" list-comparison for the unhashables:

def split_hashable_unhashable(vals):
    """Seperate hashable values from unhashable ones and returns a set (hashables) 
    and list (unhashable ones)"""
    set_ = set()
    list_ = []
    for val in vals:
        try:
            set_.add(val)
        except TypeError:  # unhashable
            list_.append(val)
    return set_, list_


def compare_lists_arbitary_order(l1, l2, cmp=pd.DataFrame.equals):
    """Compare two lists using a custom comparison function, the order of the
    elements is ignored."""
    # need to have equal lengths otherwise they can't be equal
    if len(l1) != len(l2):  
        return False

    remaining_indices = set(range(len(l2)))
    for item in l1:
        for cmpidx in remaining_indices:
            if cmp(item, l2[cmpidx]):
                remaining_indices.remove(cmpidx)
                break
        else:
            # Run through the loop without finding a match
            return False
    return True

def dict_compare(d1, d2):
    if set(d1) != set(d2):  # compare the dictionary keys
        return False
    set1, list1 = split_hashable_unhashable(d1.values())
    set2, list2 = split_hashable_unhashable(d2.values())
    if set1 != set2:  # set comparison is easy
        return False

    return compare_lists_arbitary_order(list1, list2)

它比预期的要长一点。对于您的测试用例,它确实有效:

It got a bit longer than expected. For your test-cases it definetly works:

>>> import pandas as pd

>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})

>>> a = {'x': 1, 'y': df1}
>>> b = {'y': 1, 'x': df1}
>>> c = {'y': 1, 'x': df2}
>>> dict_compare(a, b)
True
>>> dict_compare(a, c)
False
>>> dict_compare(b, c)
False

set - 操作也可用于查找差异(请参阅 set.difference )。使用列表有点复杂,但并不是真的不可能。可以将没有匹配的项目添加到单独的列表中,而不是立即返回 False

The set-operations can also be used to find differences (see set.difference). It's a bit more complicated with the lists, but not really impossible. One could add the items where no match was found to a seperate list instead of instantly returning False.

这篇关于将字典与不可比较或无法比较的值进行比较? (例如列表或数据框)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆