如何在 Java 8 中动态进行过滤? [英] How to dynamically do filtering in Java 8?

查看:99
本文介绍了如何在 Java 8 中动态进行过滤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在 Java 8 中,我可以做这样的过滤:

I know in Java 8, I can do filtering like this :

List<User> olderUsers = users.stream().filter(u -> u.age > 30).collect(Collectors.toList());

但是如果我有一个集合和六个过滤条件,我想测试条件的组合怎么办?

But what if I have a collection and half a dozen filtering criteria, and I want to test the combination of the criteria ?

例如,我有一组对象和以下条件:

For example I have a collection of objects and the following criteria :

<1> Size
<2> Weight
<3> Length
<4> Top 50% by a certain order
<5> Top 20% by a another certain ratio
<6> True or false by yet another criteria

而且我想测试上述标准的组合,例如:

And I want to test the combination of the above criteria, something like :

<1> -> <2> -> <3> -> <4> -> <5>
<1> -> <2> -> <3> -> <5> -> <4>
<1> -> <2> -> <5> -> <4> -> <3>
...
<1> -> <5> -> <3> -> <4> -> <2>
<3> -> <2> -> <1> -> <4> -> <5>
...
<5> -> <4> -> <3> -> <3> -> <1>

如果每个测试顺序可能给我不同的结果,如何编写一个循环来自动过滤所有组合?

If each testing order may give me different results, how to write a loop to automatically filter through all the combinations ?

我能想到的是使用另一种生成测试顺序的方法,如下所示:

What I can think of is to use another method that generates the testing order like the following :

int[][] getTestOrder(int criteriaCount)
{
 ...
}

So if the criteriaCount is 2, it will return : {{1,2},{2,1}}
If the criteriaCount is 3, it will return : {{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}}
...

但是,如何使用 Java 8 附带的简洁表达式中的过滤机制最有效地实现它?

But then how to most efficiently implement it with the filtering mechanism in concise expressions that comes with Java 8 ?

推荐答案

有趣的问题.这里有几件事情正在发生.毫无疑问,这可以在不到半页的 Haskell 或 Lisp 中解决,但这是 Java,所以我们开始......

Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go....

一个问题是我们的过滤器数量是可变的,而大多数展示的示例都说明了固定管道.

One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.

另一个问题是 OP 的一些过滤器"是上下文敏感的,例如按特定顺序排在前 50%".这不能通过流上的简单 filter(predicate) 构造来完成.

Another issue is that some of the OP's "filters" are context sensitive, such as "top 50% by a certain order". This can't be done with a simple filter(predicate) construct on a stream.

关键是要认识到,虽然 lambda 允许将函数作为参数传递(效果良好),但这也意味着它们可以存储在数据结构中,并且可以对它们执行计算.最常见的计算是将多个函数组合起来.

The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.

假设被操作的值是 Widget 的实例,它是一个有一些明显的 getter 的 POJO:

Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:

class Widget {
    String name() { ... }
    int length() { ... }
    double weight() { ... }

    // constructors, fields, toString(), etc.
}

让我们从第一个问题开始,弄清楚如何使用可变数量的简单谓词进行操作.我们可以像这样创建一个谓词列表:

Let's start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:

List<Predicate<Widget>> allPredicates = Arrays.asList(
    w -> w.length() >= 10,
    w -> w.weight() > 40.0,
    w -> w.name().compareTo("c") > 0);

给定这个列表,我们可以对它们进行置换(可能没有用,因为它们与顺序无关)或选择我们想要的任何子集.假设我们只想应用所有这些.我们如何将可变数量的谓词应用于流?有一个 Predicate.and() 方法,它将采用两个谓词并使用逻辑 and 将它们组合起来,返回一个谓词.因此,我们可以采用第一个谓词并编写一个循环,将它与连续的谓词组合起来以构建一个单一的谓词,该谓词是一个复合的:

Given this list, we can permute them (probably not useful, since they're order independent) or select any subset we want. Let's say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that's a composite and of them all:

Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
    compositePredicate = compositePredicate.and(allPredicates.get(i));
}

这行得通,但如果列表为空,它就会失败,而且由于我们现在正在进行函数式编程,因此在循环中对变量进行变异是 declassé.但是!这是降价!我们可以减少 运算符上的所有谓词,得到一个复合谓词,如下所示:

This works, but it fails if the list is empty, and since we're doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:

Predicate<Widget> compositePredicate =
    allPredicates.stream()
                 .reduce(w -> true, Predicate::and);

(来源:我从 @venkat_s 那里学到了这个技巧.如果你有机会,去看他演讲在会议上.他很好.)

(Credit: I learned this technique from @venkat_s. If you ever get a chance, go see him speak at a conference. He's good.)

注意使用 w ->true 作为归约的标识值.(这也可以用作循环的 compositePredicate 的初始值,这将修复零长度列表的情况.)

Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)

现在我们有了复合谓词,我们可以写出一个简短的管道,简单地将复合谓词应用于小部件:

Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:

widgetList.stream()
          .filter(compositePredicate)
          .forEach(System.out::println);

上下文敏感过滤器

现在让我们考虑一下我所说的上下文敏感"过滤器,它由示例表示,例如按特定顺序排在前 50%",即按重量排名前 50% 的小部件.上下文敏感"不是对此的最佳术语,但这是我目前所拥有的,并且它具有一定的描述性,因为它与流中到目前为止的元素数量有关.

Context Sensitive Filters

Now let's consider what I referred to as a "context sensitive" filter, which is represented by the example like "top 50% in a certain order", say the top 50% of widgets by weight. "Context sensitive" isn't the best term for this but it's what I've got at the moment, and it is somewhat descriptive in that it's relative to the number of elements in the stream up to this point.

我们将如何使用流来实现这样的事情?除非有人想出一些非常聪明的方法,否则我认为我们必须先在某处(例如在列表中)收集元素,然后才能将第一个元素发送到输出.这有点像管道中的 sorted() ,它无法判断哪个是第一个输出元素,直到它读取了每个输入元素并对其进行了排序.

How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It's kind of like sorted() in a pipeline which can't tell which is the first element to output until it has read every single input element and has sorted them.

使用流查找按重量排名前 50% 的小部件的直接方法如下所示:

The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:

List<Widget> temp =
    list.stream()
        .sorted(comparing(Widget::weight).reversed())
        .collect(toList());
temp.stream()
    .limit((long)(temp.size() * 0.5))
    .forEach(System.out::println);

这并不复杂,但有点麻烦,因为我们必须将元素收集到一个列表中并将其分配给一个变量,以便在 50% 的计算中使用列表的大小.

This isn't complicated, but it's a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list's size in the 50% computation.

不过,这是限制性的,因为它是这种过滤的静态"表示.我们如何将其链接到一个具有可变数量元素(其他过滤器或条件)的流中,就像我们对谓词所做的那样?

This is limiting, though, in that it's a "static" representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?

一个重要的观察是这段代码在流的消耗和流的发射之间完成了它的实际工作.它恰好在中间有一个收集器,但是如果你将一个流链接到它的前端并将其链接到它的后端,没有人是更聪明的.事实上,像 mapfilter 这样的标准流管道操作都将一个流作为输入并发出一个流作为输出.所以我们可以自己写一个类似这样的函数:

A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:

Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
    List<Widget> temp =
        stream.sorted(comparing(Widget::weight).reversed())
              .collect(toList());
    return temp.stream()
               .limit((long)(temp.size() * 0.5));
}

一个类似的例子可能是找到最短的三个小部件:

A similar example might be to find the shortest three widgets:

Stream<Widget> shortestThree(Stream<Widget> stream) {
    return stream.sorted(comparing(Widget::length))
                 .limit(3);
}

现在我们可以编写一些将这些有状态过滤器与普通流操作结合起来的东西:

Now we can write something that combines these stateful filters with ordinary stream operations:

shortestThree(
    top50PercentByWeight(
        widgetList.stream()
                  .filter(w -> w.length() >= 10)))
.forEach(System.out::println);

这行得通,但有点糟糕,因为它读起来是由内而外"和向后读的.流源是 widgetList,它通过一个普通谓词进行流式传输和过滤.现在,倒退,应用前 50% 过滤器,然后应用最短的三个过滤器,最后应用流操作 forEach.这有效,但读起来很混乱.它仍然是静态的.我们真正想要的是有一种方法将这些新过滤器放入我们可以操作的数据结构中,例如,运行所有排列,如原始问题中所示.

This works, but is kind of lousy because it reads "inside-out" and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it's still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.

在这一点上的一个关键见解是,这些新类型的过滤器实际上只是函数,我们在 Java 中有函数式接口类型,它让我们将函数表示为对象,操作它们,将它们存储在数据结构中,组合它们,等.接受某种类型的参数并返回相同类型的值的功能接口类型是UnaryOperator.这种情况下的参数和返回类型是 Stream.如果我们采用诸如 this::shortestThreethis::top50PercentByWeight 之类的方法引用,则结果对象的类型将是

A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be

UnaryOperator<Stream<Widget>>

如果我们将它们放入一个列表中,该列表的类型将是

If we were to put these into a list, the type of that list would be

List<UnaryOperator<Stream<Widget>>>

呃!三层嵌套泛型对我来说太多了.(但是 Aleksey Shipilev 曾经向我展示了一些使用四级嵌套泛型的代码.)太多的解决方案泛型是定义我们自己的类型.让我们将我们的一项新事物称为标准.事实证明,让我们的新函数式接口类型与 UnaryOperator 相关并没有什么价值,所以我们的定义可以简单地为:

Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let's call one of our new things a Criterion. It turns out that there's little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:

@FunctionalInterface
public interface Criterion {
    Stream<Widget> apply(Stream<Widget> s);
}

现在我们可以创建一个这样的标准列表:

Now we can create a list of criteria like this:

List<Criterion> criteria = Arrays.asList(
    this::shortestThree,
    this::lengthGreaterThan20
);

(我们将在下面弄清楚如何使用这个列表.)这是向前迈出的一步,因为我们现在可以动态操作列表,但它仍然有些限制.首先,它不能与普通谓词结合使用.其次,这里有很多硬编码的值,比如最短的三个:两个或四个怎么样?与长度不同的标准怎么样?我们真正想要的是一个为我们创建这些 Criterion 对象的函数.使用 lambda 很容易做到这一点.

(We'll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it's still somewhat limiting. First, it can't be combined with ordinary predicates. Second, there's a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.

给定一个比较器,这将创建一个选择前 N 个小部件的标准:

This creates a criterion that selects the top N widgets, given a comparator:

Criterion topN(Comparator<Widget> cmp, long n) {
    return stream -> stream.sorted(cmp).limit(n);
}

给定一个比较器,这将创建一个选择前 p% 的小部件的标准:

This creates a criterion that selects the top p percent of widgets, given a comparator:

Criterion topPercent(Comparator<Widget> cmp, double pct) {
    return stream -> {
        List<Widget> temp =
            stream.sorted(cmp).collect(toList());
        return temp.stream()
                   .limit((long)(temp.size() * pct));
    };
}

这从普通谓词中创建了一个标准:

And this creates a criterion from an ordinary predicate:

Criterion fromPredicate(Predicate<Widget> pred) {
    return stream -> stream.filter(pred);
}

现在我们有一种非常灵活的方式来创建标准并将它们放入一个列表中,在列表中可以对它们进行子集化或排列或其他:

Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:

List<Criterion> criteria = Arrays.asList(
    fromPredicate(w -> w.length() > 10),                    // longer than 10
    topN(comparing(Widget::length), 4L),                    // longest 4
    topPercent(comparing(Widget::weight).reversed(), 0.50)  // heaviest 50%
);

一旦我们有了 Criterion 对象的列表,我们就需要找出一种方法来应用所有这些对象.再一次,我们可以使用我们的朋友 reduce 将所有这些组合成一个 Criterion 对象:

Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:

Criterion allCriteria =
    criteria.stream()
            .reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));

恒等函数c ->c 很清楚,但第二个 arg 有点棘手.给定一个流s,我们首先应用Criterion c1,然后应用Criterion c2,这被包装在一个lambda 中,它接受两个Criterion 对象c1 和c2 并返回一个将c1 和c2 的组合应用于a流并返回结果流.

The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.

既然我们已经编写了所有条件,我们可以将其应用到一个小部件流中,如下所示:

Now that we've composed all the criteria, we can apply it to a stream of widgets like so:

allCriteria.apply(widgetList.stream())
           .forEach(System.out::println);

这仍然有点由内而外,但控制得相当好.最重要的是,它解决了最初的问题,即如何动态组合条件.一旦 Criterion 对象位于数据结构中,就可以根据需要对它们进行选择、子集化、置换或任何其他操作,并且可以将它们全部组合在单个标准中并使用上述技术应用于流.

This is still a bit inside-out, but it's fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.

函数式编程大师可能会说他刚刚重新发明了……!"这可能是真的.我敢肯定这可能已经在某个地方被发明了,但它对 Java 来说是新的,因为在 lambda 之前,编写使用这些技术的 Java 代码是不可行的.

The functional programming gurus are probably saying "He just reinvented ... !" which is probably true. I'm sure this has probably been invented somewhere already, but it's new to Java, because prior to lambda, it just wasn't feasible to write Java code that uses these techniques.

我已经清理并在要点中发布了完整的示例代码.

I've cleaned up and posted the complete sample code in a gist.

这篇关于如何在 Java 8 中动态进行过滤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆