如何比较两个集合的“等价性"基于来自不同 Java 类的字段? [英] How to compare two Collections for "equivalence" based on fields from different Java classes?

查看:30
本文介绍了如何比较两个集合的“等价性"基于来自不同 Java 类的字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定任意两个类,例如ClassAClassB 如下:

Given any two classes, e.g. ClassA and ClassB below:

class ClassA {
    private int intA;
    private String strA;
    private boolean boolA;
    // Constructor
    public ClassA (int intA, String strA, boolean boolA) {
        this.intA = intA; this.strA = strA; this.boolA = boolA;
    } // Getters and setters etc. below...
}

class ClassB {
    private int intB;
    private String strB;
    private boolean boolB;
    // Constructor
    public ClassB (int intB, String strB, boolean boolB) {
        this.intB = intB; this.strB = strB; this.boolB = boolB;
    } // Getters and setters etc. below...
}

以及任意两种不同的 Collection 类型,一种带有 ClassA 元素,另一种带有 ClassB 元素,例如:

And any two different Collection types, one with ClassA elements and the other with ClassB elements, e.g:

List<Object> myList = Arrays.asList(new ClassA(1, "A", true),
                                    new ClassA(2, "B", true));
Set<Object> mySet = new HashSet<Object>(
                      Arrays.asList(new ClassB(1, "A", false),
                                    new ClassB(2, "B", false)));

根据指定的字段子集,判断两个 Collection 是否等效"(*) 的最简单方法是什么?

What's the simplest way of telling whether the two Collections are "equivalent"(*) in terms of a specified subset of fields?

(*) 使用等效"一词而不是相等",因为这是上下文 - 即这种等效"在其他上下文中可能有不同的定义.

(*) The word "equivalent" is used rather then "equal" since this is contextual - i.e. such "equivalence" may be defined differently in another context.

上面的例子:假设我们指定 intAstrA 应该分别与 intBstrB 匹配(但 boolA/boolB 值可以忽略).这将使上面定义的两个集合对象被认为是等效的 - 但如果一个元素被添加到其中一个集合中或从其中一个集合中删除,那么它们将不再是.

Worked example from above: Suppose we specify that intA and strA should match with intB and strB respectively (but the boolA / boolB values can be ignored). This would make the two collection objects defined above be considered equivalent - but if an element were added to or removed from one of the collections then they no longer would be.

首选解决方案:所使用的方法对于任何Collection 类型都应该是通用的.理想情况下,Java 7 仅限于使用它(但其他人可能会对 Java 8 感兴趣).乐于使用 Guava 或 Apache Commons,但不想使用更晦涩的外部库.

Preferred solution: The method used should be generic for any Collection type. Ideally Java 7 as am confined to using this (but Java 8 may be of additional interest to others). Happy to use Guava or Apache Commons but would prefer not to use more obscure external libraries.

推荐答案

这是一个使用 lambda 和高阶函数的 Java 8 版本.可能可以使用匿名内部类而不是 lambda 将其转换为 Java 7.(我相信大多数 IDE 都有执行此操作的重构操作.)我将把它留给感兴趣的读者作为练习.

Here's a Java 8 version using lambdas and higher-order functions. It's probably possible to convert this to Java 7 using anonymous inner classes instead of lambdas. (I believe most IDEs have a refactoring operation that does this.) I'll leave that as an exercise for interested readers.

这里实际上有两个不同的问题:

There are actually two distinct problems here:

  1. 给定两个不同类型的对象,通过检查每个对象的相应字段来评估它们.这与 JDK 库 API 已经定义的等于"和比较"操作不同,因此我将使用术语等效"代替.

  1. Given two objects of different types, evaluate them by examining respective fields of each. This differs from "equals" and "compare" operations which are already defined by the JDK library APIs, so I'll use the term "equivalent" instead.

给定两个包含这些类型元素的集合,确定它们对于该术语的某些定义是否相等".这实际上非常微妙;请参阅下面的讨论.

Given two collections containing elements of those types, determine if they are "equals" for some definition of that term. This is actually quite subtle; see the discussion below.

1.等价

给定两个类型为 TU 的对象,我们要确定它们是否等价.结果是一个布尔值.这可以由 BiPredicate 类型的函数表示.但我们不一定能直接检查对象;相反,我们需要从每个对象中提取各自的字段,并相互评估提取结果.如果从T中提取的字段是TR类型,而从U中提取的字段是UR类型,则提取器由函数类型表示

Given two objects of types T and U we want to determine whether they're equivalent. The result is a boolean. This can be represented by a function of type BiPredicate<T,U>. But we can't necessarily examine the objects directly; instead, we need to extract respective fields from each object and evaluate the results of extraction against each other. If the field extracted from T is of type TR and the field extracted from U is of type UR, then the extractors are represented by the function types

Function<T, TR>
Function<U, UR>

现在我们已经提取了 TRUR 类型的结果.我们可以对它们调用 equals() ,但这是不必要的限制.相反,我们可以提供另一个等价函数,该函数将被调用以相互评估这两个结果.这是一个 BiPredicate.

Now we have extracted results of type TR and UR. We could just call equals() on them, but that's unnecessarily restrictive. Instead, we can provide another equivalence function that will be called to evaluate these two results against each other. That's a BiPredicate<TR,UR>.

考虑到所有这些,我们可以编写一个高阶函数,该函数接受所有这些函数并为我们生成等价函数(为了完整性而包含通配符):

Given all this, we can write a higher-order function that takes all of these functions and produces and equivalence function for us (wildcards included for completeness):

static <T,U,TR,UR> BiPredicate<T,U> equiv(Function<? super T, TR> tf,
                                          Function<? super U, UR> uf,
                                          BiPredicate<? super TR, ? super UR> pred) {
    return (t, u) -> pred.test(tf.apply(t), uf.apply(u));
}

使用 equals() 评估字段提取的结果可能是一种常见情况,因此我们可以为此提供一个重载:

It's probably a common case for the results of field extraction to be evaluated using equals(), so we can provide an overload for that:

static <T,U> BiPredicate<T,U> equiv(Function<? super T, ?> tf,
                                    Function<? super U, ?> uf) {
    return (t, u) -> equiv(tf, uf, Object::equals).test(t, u);
}

我可以提供另一个类型变量 R 作为两个函数的结果类型,以确保它们是相同的类型,但事实证明这不是必需的.由于 equals() 是在 Object 上定义的,并且它接受一个 Object 参数,我们实际上并不关心函数的返回类型是什么,因此通配符.

I could have provided another type variable R as the result type of both functions, to ensure they're the same type, but it turns out this isn't necessary. Since equals() is defined on Object and it takes an Object argument, we don't actually care what the function return types are, hence the wildcards.

以下是如何使用它来仅使用字符串字段来评估 OP 的示例类:

Here's how to use this to evaluate the OP's example classes using just the string fields:

ClassA a = ... ;
ClassB b = ... ;
if (equiv(ClassA::getStrA, ClassB::getStrB).test(a, b)) {
    // they're equivalent
}

作为一种变体,我们可能还需要原始特化以避免不必要的装箱:

As a variation, we might also want a primitive specialization in order to avoid unnecessary boxing:

static <T,U> BiPredicate<T,U> equivInt(ToIntFunction<? super T> tf,
                                       ToIntFunction<? super U> uf) {
    return (t, u) -> tf.applyAsInt(t) == uf.applyAsInt(u);
}

这让我们可以基于单个字段构建等价函数.如果我们想基于多个字段来评估等价性怎么办?我们可以通过链接 and() 方法来组合任意数量的 BiPredicates.以下是如何创建一个函数,该函数使用来自 OP 示例的类的 intString 字段来评估等价性.为此,最好将函数存储在一个变量中而不是使用它,尽管这可能全部内联(我认为这会使它不可读):

This lets us construct equivalence functions based on a single field. What if we want to evaluate equivalence based on multiple fields? We can combine an arbitrary number of BiPredicates by chaining the and() method. Here's how to create a function that evaluates equivalence using the int and String fields of the classes from the OP's example. For this, it's probably best to store the function in a variable separately from using it, though this can probably all be inlined (which I think will make it unreadable):

BiPredicate<ClassA, ClassB> abEquiv =
    equivInt(ClassA::getIntA, ClassB::getIntB)
        .and(equiv(ClassA::getStrA, ClassB::getStrB));

if (abEquiv.test(a, b)) {
    // they're equivalent
}

最后一个例子,在为两个类创建等价函数时,能够为字段提取结果提供等价函数是非常强大的.例如,假设我们要提取两个 String 字段,并且如果提取的字符串是相等的,则认为它们是等效的,忽略大小写.以下代码的结果为 true:

As a final example, it's quite powerful to be able to provide an equivalence function for the field extraction results when creating an equivalence function for two classes. For example, suppose we want to extract two String fields and consider them equivalent if the extracted strings are equals, ignoring case. The following code results in true:

equiv(ClassA::getStrA, ClassB::getStrB, String::equalsIgnoreCase)
    .test(new ClassA(2, "foo", true),
          new ClassB(3, "FOO", false))

2.平等"系列

第二部分是评估两个集合在某种意义上是否相等".问题是在集合框架中,相等的概念被定义为一个 List 只能等于另一个 List,而一个 Set 只能等于另一个 Set.因此,其他类型的 Collection 永远不能等于 List 或 Set.参见的规范Collection.equals() 对这一点的一些讨论.

The second part is to evaluate whether two collections are "equals" in some sense. The problem is that in the Collections Framework, the notion of equality for is defined such that a List can only be equal to another List, and a Set can only be equal to another Set. It follows that a Collection of some other type can never be equal to either a List or a Set. See the specification of Collection.equals() for some discussion of this point.

这显然与 OP 想要的不一致.正如 OP 所建议的那样,我们并不真正想要平等",但我们想要一些需要为其提供定义的其他属性.基于 OP 的示例,以及 Przemek Gumulajanos,似乎我们希望两个集合中的元素以某种方式一一对应.我将其称为 bijection,这在数学上可能不精确,但它似乎足够接近.此外,每对元素之间的对应关系应该是等价,如上面定义的那样.

This is clearly at odds with what the OP wants. As suggested by the OP, we don't really want "equality," but we want some other property for which we need to provide a definition. Based on the OP's examples, and some suggestions in other answers by Przemek Gumula and janos, it seems like we want the elements in the two collections to somehow be in one-for-one correspondence. I'll call this a bijection which might not be mathematically precise, but it seems close enough. Furthermore, the correspondence between each pair of elements should be equivalence as defined above.

计算这个有点微妙,因为我们有自己的等价关系.我们不能使用许多内置的 Collections 操作,因为它们都使用 equals().我的第一次尝试是这样的:

Computing this is a bit subtle, since we have our own equivalence relation. We can't use many of the built-in Collections operations, since they all use equals(). My first attempt was this:

// INCORRECT
static <T,U> boolean isBijection(Collection<T> c1,
                                 Collection<U> c2,
                                 BiPredicate<? super T, ? super U> pred) {
    return c1.size() == c2.size() &&
           c1.stream().allMatch(t -> c2.stream()
                                       .anyMatch(u -> pred.test(t, u)));
}

(这与 Przemek Gumula 给出的基本相同.)这有问题,归结为可能性一个集合中的多个元素对应于另一个集合中的单个元素,从而使元素不匹配.如果给定两个多重集,使用相等作为等价函数,这会产生奇怪的结果:

(This is essentially the same as given by Przemek Gumula.) This has problems, which boil down to the possibility of more than one element in the one collection corresponding to a single element in the other collection, leaving elements unmatched. This gives strange results if given two multisets, using equality as the equivalence function:

{a x 2, b}    // essentially {a, a, b}
{a, b x 2}    // essentially {a, b, b}

这个函数认为这两个多重集是一个双射,显然不是这样.如果等价函数允许多对一匹配,则会出现另一个问题:

This function considers these two multisets to be a bijection, which clearly isn't the case. Another problem occurs if the equivalence function allows many-to-one matching:

Set<String> set1 = new HashSet<>(Arrays.asList("foo", "FOO", "bar"));
Set<String> set2 = new HashSet<>(Arrays.asList("fOo", "bar", "quux"));

isBijection(set1, set2, equiv(s -> s, s -> s, String::equalsIgnoreCase))

结果为true,但如果集合以相反的顺序给出,则结果为false.这显然是错误的.

The result is true, but if the sets are given in the opposite order, the result is false. That's clearly wrong.

另一种算法是创建一个临时结构并删除匹配的元素.该结构必须考虑重复项,因此我们需要递减计数,并且仅在计数达到零时才删除元素.幸运的是,Java 8 的各种特性使这变得非常简单.这与 janos 的答案中使用的算法非常相似,尽管我已将等价函数提取到方法参数中.唉,由于我的等价函数可以有嵌套的等价函数,这意味着我无法探测映射(由等式定义).相反,我必须搜索地图的键,这意味着算法是 O(N^2).哦,好吧.

An alternative algorithm is to create a temporary structure and remove elements as they're matched. The structure has to account for duplicates, so we need to decrement the count and only remove the element when the count reaches zero. Fortunately, various Java 8 features make this pretty simple. This is quite similar to the algorithm used in the answer from janos, though I've extracted the equivalence function into a method parameter. Alas, since my equivalence function can have nested equivalence functions, it means I can't probe the map (which is defined by equality). Instead, I have to search the map's keys, which means the algorithm is O(N^2). Oh well.

然而,代码非常简单.首先,使用 groupingBy 从第二个集合生成频率图.然后,迭代第一个集合的元素,并搜索频率图的键以查找等效项.如果找到一个,则减少它的出现次数.注意传递给 Map.compute().这具有删除条目的副作用,而不是将映射设置为null.这有点像 API hack,但非常有效.

The code, however, is pretty simple. First, the frequency map is generated from the second collection using groupingBy. Then, the elements of the first collection are iterated, and the frequency map's keys are searched for an equivalent. If one is found, its occurrence count is decremented. Note the return value of null from the remapping function passed to Map.compute(). This has the side effect of removing the entry, not setting the mapping to null. It's a bit of an API hack, but it's quite effective.

对于第一个集合中的每个元素,必须在第二个集合中找到一个等价的元素,否则就会退出.处理完第一个集合的所有元素后,频率图中的所有元素也应该已经处理完毕,因此只需测试是否为空.

For every element in the first collection, an equivalent element in the second collection must be found, otherwise it bails out. After all elements of the first collection have been processed, all elements from the frequency map should also have been processed, so it's simply tested for being empty.

代码如下:

static <T,U> boolean isBijection(Collection<T> c1,
                                 Collection<U> c2,
                                 BiPredicate<? super T, ? super U> pred) {
    Map<U, Long> freq = c2.stream()
                          .collect(Collectors.groupingBy(u -> u, Collectors.counting()));
    for (T t : c1) {
        Optional<U> ou = freq.keySet()
                             .stream()
                             .filter(u -> pred.test(t, u))
                             .findAny();
        if (ou.isPresent()) {
            freq.compute(ou.get(), (u, c) -> c == 1L ? null : c - 1L);
        } else {
            return false;
        }
    }

    return freq.isEmpty();
}

这个定义是否正确尚不完全清楚.但直觉上似乎是人们想要的.不过它很脆弱.如果等价函数不对称,isBijection 将失败.还有一些自由度没有考虑在内.例如,假设集合是

It's not entirely clear whether this definition is the correct one. But it seems intuitively to be what people want. It's fragile, though. If the equivalence function isn't symmetric, isBijection will fail. There are also some degrees of freedom aren't accounted for. For example, suppose the collections are

{a, b}
{x, y}

And a 等价于 xy,但 b 只等价于 x.如果ax匹配,则isBijection的结果为false.但是如果 ay 匹配,结果将是 true.

And a is equivalent to both x and y, but b is only equivalent to x. If a is matched to x, the result of isBijection is false. But if a were matched to y, the result would be true.

组合起来

这是 OP 的示例,使用 equiv()equivInt()isBijection 函数编码:

Here's the OP's example, coded up using the equiv(), equivInt(), and isBijection functions:

List<ClassA> myList = Arrays.asList(new ClassA(1, "A", true),
                                    new ClassA(2, "B", true));

Set<ClassB> mySet = new HashSet<>(Arrays.asList(new ClassB(1, "A", false),
                                                new ClassB(2, "B", false)));

BiPredicate<ClassA, ClassB> abEquiv =
    equivInt(ClassA::getIntA, ClassB::getIntB)
        .and(equiv(ClassA::getStrA, ClassB::getStrB));

isBijection(myList, mySet, abEquiv)

结果是true.

这篇关于如何比较两个集合的“等价性"基于来自不同 Java 类的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆