如何在Java 8中动态进行过滤? [英] How to dynamically do filtering in Java 8?

查看:172
本文介绍了如何在Java 8中动态进行过滤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在Java 8中,我可以像这样过滤:

I know in Java 8, I can do filtering like this :

List<User> olderUsers = users.stream().filter(u -> u.age > 30).collect(Collectors.toList());

但如果我有一个集合和六个过滤标准怎么办,我想测试这个组合标准?

But what if I have a collection and half a dozen filtering criteria, and I want to test the combination of the criteria ?

例如,我有一组对象和以下标准:

For example I have a collection of objects and the following criteria :

<1> Size
<2> Weight
<3> Length
<4> Top 50% by a certain order
<5> Top 20% by a another certain ratio
<6> True or false by yet another criteria

我想测试上述标准的组合,比如:

And I want to test the combination of the above criteria, something like :

<1> -> <2> -> <3> -> <4> -> <5>
<1> -> <2> -> <3> -> <5> -> <4>
<1> -> <2> -> <5> -> <4> -> <3>
...
<1> -> <5> -> <3> -> <4> -> <2>
<3> -> <2> -> <1> -> <4> -> <5>
...
<5> -> <4> -> <3> -> <3> -> <1>

如果每个测试订单可能给我不同的结果,如何写一个循环来自动过滤所有的组合?

If each testing order may give me different results, how to write a loop to automatically filter through all the combinations ?

我能想到的是使用另一种生成测试顺序的方法,如下所示:

What I can think of is to use another method that generates the testing order like the following :

int[][] getTestOrder(int criteriaCount)
{
 ...
}

So if the criteriaCount is 2, it will return : {{1,2},{2,1}}
If the criteriaCount is 3, it will return : {{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}}
...

但是如何使用Java 8附带的简洁表达式中的过滤机制最有效地实现它?

But then how to most efficiently implement it with the filtering mechanism in concise expressions that comes with Java 8 ?

推荐答案

有趣的问题。这里有几件事情。毫无疑问,这可以在不到半页的Haskell或Lisp中解决,但这是Java,所以我们走了....

Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go....

一个问题是我们具有可变数量的过滤器,而已显示的大多数示例都说明了固定管道。

One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.

另一个问题是某些OP的过滤器是上下文敏感的,例如按特定顺序排名前50%。这不能通过流上的简单过滤器(谓词)构造来完成。

Another issue is that some of the OP's "filters" are context sensitive, such as "top 50% by a certain order". This can't be done with a simple filter(predicate) construct on a stream.

关键是要我们意识到,虽然lambdas允许函数作为参数传递(效果良好),但它也意味着它们可以存储在数据结构中,并且可以对它们执行计算。最常见的计算是采用多个函数并组合它们。

The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.

假设正在操作的值是Widget的实例,这是一个有明显getter的POJO:

Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:

class Widget {
    String name() { ... }
    int length() { ... }
    double weight() { ... }

    // constructors, fields, toString(), etc.
}

让我们从第一个问题开始,弄清楚如何使用可变数量的简单谓词进行操作。我们可以创建一个这样的谓词列表:

Let's start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:

List<Predicate<Widget>> allPredicates = Arrays.asList(
    w -> w.length() >= 10,
    w -> w.weight() > 40.0,
    w -> w.name().compareTo("c") > 0);

鉴于此列表,我们可以对它们进行置换(可能没有用,因为它们是独立的顺序)或者选择我们想要的任何子集。假设我们只想应用所有这些。我们如何将可变数量的谓词应用于流?有一个 Predicate.and()方法,该方法将使用两个谓词并使用逻辑组合它们,返回单个谓词。所以我们可以使用第一个谓词并编写一个循环,将它与连续谓词结合起来构建一个单独的谓词,它是所有的复合

Given this list, we can permute them (probably not useful, since they're order independent) or select any subset we want. Let's say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that's a composite and of them all:

Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
    compositePredicate = compositePredicate.and(allPredicates.get(i));
}

这样可行,但如果列表为空则失败,因为我们'现在重新进行函数式编程,在循环中变量变量是declassé。但是,瞧!这是减少!我们可以通过运算符减少所有谓词得到单个复合谓词,如下所示:

This works, but it fails if the list is empty, and since we're doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:

Predicate<Widget> compositePredicate =
    allPredicates.stream()
                 .reduce(w -> true, Predicate::and);

(信用:我从 @ venkat_s 。如果你有机会,去看他在会议上发言。他很好。)

(Credit: I learned this technique from @venkat_s. If you ever get a chance, go see him speak at a conference. He's good.)

注意使用 w - > true 作为减少的标识值。 (这也可以用作循环的 compositePredicate 的初始值,这将修复零长度列表的情况。)

Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)

现在我们有了复合谓词,我们可以编写一个简短的管道来简单地将复合谓词应用到窗口小部件:

Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:

widgetList.stream()
          .filter(compositePredicate)
          .forEach(System.out::println);



上下文敏感过滤器



现在让我们考虑一下我称之为上下文敏感的过滤器,由例如按特定顺序排在前50%的例子表示,按重量计算最多50%的小部件。 上下文敏感并不是最好的术语,但它是我现在所拥有的,而且它有点描述性,因为它与流中元素的数量相关到目前为止。

Context Sensitive Filters

Now let's consider what I referred to as a "context sensitive" filter, which is represented by the example like "top 50% in a certain order", say the top 50% of widgets by weight. "Context sensitive" isn't the best term for this but it's what I've got at the moment, and it is somewhat descriptive in that it's relative to the number of elements in the stream up to this point.

我们如何使用流实现这样的东西?除非有人提出一些非常聪明的东西,否则我认为在我们将第一个元素发送到输出之前,我们必须首先在某处收集元素(例如,在列表中)。它有点像 sorted()在一个管道中,它无法分辨哪个是第一个要输出的元素,直到它读取每个输入元素并对它们进行排序。

How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It's kind of like sorted() in a pipeline which can't tell which is the first element to output until it has read every single input element and has sorted them.

使用流来按重量查找最重要的50%小部件的直接方法如下所示:

The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:

List<Widget> temp =
    list.stream()
        .sorted(comparing(Widget::weight).reversed())
        .collect(toList());
temp.stream()
    .limit((long)(temp.size() * 0.5))
    .forEach(System.out::println);

这并不复杂,但由于我们必须将元素收集到一个列表中,所以有点麻烦并将其分配给变量,以便在50%计算中使用列表的大小。

This isn't complicated, but it's a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list's size in the 50% computation.

这是限制性的,因为它是静态表示这种过滤。我们如何将它链接到具有可变数量元素(其他过滤器或标准)的流中,就像我们对谓词所做的那样?

This is limiting, though, in that it's a "static" representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?

一个重要的观察结果是这个代码确实如此它在消耗流和流的发射之间的实际工作。它恰好在中间有一个收集器,但是如果你将一条流链接到它的前端并从它的后端链起来的东西,没有人是明智的。实际上,标准流管道操作(如 map filter )都将流作为输入并将流作为输出发出。所以我们可以自己写一个这样的函数:

A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:

Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
    List<Widget> temp =
        stream.sorted(comparing(Widget::weight).reversed())
              .collect(toList());
    return temp.stream()
               .limit((long)(temp.size() * 0.5));
}

类似的例子可能是找到最短的三个小部件:

A similar example might be to find the shortest three widgets:

Stream<Widget> shortestThree(Stream<Widget> stream) {
    return stream.sorted(comparing(Widget::length))
                 .limit(3);
}

现在我们可以编写一些将这些状态过滤器与普通流操作相结合的东西: / p>

Now we can write something that combines these stateful filters with ordinary stream operations:

shortestThree(
    top50PercentByWeight(
        widgetList.stream()
                  .filter(w -> w.length() >= 10)))
.forEach(System.out::println);

这有效,但有点糟糕,因为它读作从里到外并向后。流源是 widgetList ,它通过普通谓词进行流式处理和过滤。现在,向后,应用前50%过滤器,然后应用最短三过滤器,最后应用流操作 forEach 。这有效但读起来很混乱。它仍然是静态的。我们真正想要的是有一种方法将这些新过滤器放在我们可以操作的数据结构中,例如,运行所有排列,就像在原始问题中一样。

This works, but is kind of lousy because it reads "inside-out" and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it's still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.

这一点的关键见解是这些新类型的过滤器实际上只是函数,我们在Java中有函数接口类型,它们让我们将函数表示为对象,操作它们,将它们存储在数据结构中,组合它们,接受某种类型参数并返回相同类型值的函数接口类型是 UnaryOperator 。在这种情况下,参数和返回类型是 Stream< Widget> 。如果我们采用方法引用,例如 this :: shortestThree this :: top50PercentByWeight ,结果的类型对象将是

A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be

UnaryOperator<Stream<Widget>>

如果我们将这些列入一个列表,该列表的类型将是

If we were to put these into a list, the type of that list would be

List<UnaryOperator<Stream<Widget>>>

唉!嵌套泛型的三个层次对我来说太过分了。 (但是 Aleksey Shipilev 曾经向我展示了一些使用嵌套泛型的四个级别的代码。)太多的解决方案泛型是定义我们自己的类型。让我们把我们的新事物之一称为标准。事实证明,通过使我们的新功能接口类型与 UnaryOperator 相关,几乎没有价值,因此我们的定义可以简单地为:

Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let's call one of our new things a Criterion. It turns out that there's little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:

@FunctionalInterface
public interface Criterion {
    Stream<Widget> apply(Stream<Widget> s);
}

现在我们可以创建一个这样的标准列表:

Now we can create a list of criteria like this:

List<Criterion> criteria = Arrays.asList(
    this::shortestThree,
    this::lengthGreaterThan20
);

(我们将弄清楚如何使用下面的列表。)这是向前迈出的一步,因为我们现在可以动态操作列表,但它仍然有些限制。首先,它不能与普通谓词结合使用。其次,这里有很多硬编码值,比如最短的三个:两个或四个怎么样?与长度不同的标准怎么样?我们真正想要的是一个为我们创建这些Criterion对象的函数。使用lambdas很容易。

(We'll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it's still somewhat limiting. First, it can't be combined with ordinary predicates. Second, there's a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.

这会创建一个标准,在给定比较器的情况下选择前N个小部件:

This creates a criterion that selects the top N widgets, given a comparator:

Criterion topN(Comparator<Widget> cmp, long n) {
    return stream -> stream.sorted(cmp).limit(n);
}

这会创建一个标准,在给定比较器的情况下选择最高百分比的小部件:

This creates a criterion that selects the top p percent of widgets, given a comparator:

Criterion topPercent(Comparator<Widget> cmp, double pct) {
    return stream -> {
        List<Widget> temp =
            stream.sorted(cmp).collect(toList());
        return temp.stream()
                   .limit((long)(temp.size() * pct));
    };
}

这会从普通谓词创建一个标准:

And this creates a criterion from an ordinary predicate:

Criterion fromPredicate(Predicate<Widget> pred) {
    return stream -> stream.filter(pred);
}

现在我们有一种非常灵活的方式来创建标准并将它们放入列表中,它们可以是子集或置换或其他:

Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:

List<Criterion> criteria = Arrays.asList(
    fromPredicate(w -> w.length() > 10),                    // longer than 10
    topN(comparing(Widget::length), 4L),                    // longest 4
    topPercent(comparing(Widget::weight).reversed(), 0.50)  // heaviest 50%
);

一旦我们有了Criterion对象列表,我们需要找到一种方法来应用所有这些对象。再次,我们可以使用我们的朋友 reduce 将所有这些组合成一个Criterion对象:

Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:

Criterion allCriteria =
    criteria.stream()
            .reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));

身份功能 c - > c 很清楚,但第二个arg有点棘手。给定一个流 s 我们首先应用Criterion c1,然后应用Criterion c2,并将其包含在一个lambda中,该lambda采用两个Criterion对象c1和c2并返回一个应用该组合的lambda c1和c2到流中并返回结果流。

The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.

现在我们已经编写了所有条件,我们可以将它应用到小部件流中,如下所示:

Now that we've composed all the criteria, we can apply it to a stream of widgets like so:

allCriteria.apply(widgetList.stream())
           .forEach(System.out::println);

这仍然有点内外,但它控制得相当好。最重要的是,它解决了原始问题,即如何动态组合标准。一旦Criterion对象处于数据结构中,就可以根据需要选择,子集化,置换或其他任何对象,并且可以将它们组合在一个标准中并使用上述技术应用于流。

This is still a bit inside-out, but it's fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.

函数式编程大师可能会说他刚刚重新发明......这可能是真的。我确信这可能已经在某处发明了,但它对Java来说是新的,因为在lambda之前,编写使用这些技术的Java代码是不可行的。

The functional programming gurus are probably saying "He just reinvented ... !" which is probably true. I'm sure this has probably been invented somewhere already, but it's new to Java, because prior to lambda, it just wasn't feasible to write Java code that uses these techniques.

我已经清理并发布了完整的示例代码

I've cleaned up and posted the complete sample code in a gist.

这篇关于如何在Java 8中动态进行过滤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆