如何比较多行? [英] How to compare multiple rows?

查看:23
本文介绍了如何比较多行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较两个连续的行 icol2i-1(按 col1 排序>).

I'd like to compare two consecutive rows i with i-1 of col2 (sorted by col1).

如果 i-th 行的 item_iitem_[i-1]_row 不同,我想递增item_[i-1] 的计数加 1.

If item_i of the i-th row and the item_[i-1]_row are different, I'd like to increment the count of item_[i-1] by 1.

+--------------+
| col1 col2    |
+--------------+
| row_1 item_1 |
| row_2 item_1 |
| row_3 item_2 |
| row_4 item_1 |
| row_5 item_2 |
| row_6 item_1 |
+--------------+

在上面的例子中,如果我们一次向下扫描两行,我们看到 row_2row_3 是不同的,因此我们向 item_1 添加一个.接下来,我们看到row_3row_4 不同,然后在item_2 上加一个.继续直到我们结束:

In the above example, if we scan two rows at a time downwards, we see that row_2 and row_3 are different therefore we add one to item_1. Next, we see that row_3 is different from row_4, then add one to item_2. Continue until we end up with:

+-------------+
|  col2  col3 |
+-------------+
|  item_1  2  |
|  item_2  2  |
+-------------+

推荐答案

您可以结合使用窗口函数和聚合来执行此操作.窗口函数用于获取col2的下一个值(使用col1进行排序).然后聚合计算我们遇到差异的次数.这是在下面的代码中实现的:

You can use a combination of a window function and an aggregate to do this. The window function is used to get the next value of col2 (using col1 for ordering). The aggregate then counts the times we encounter a differences. This is implemented in the code below:

val data = Seq(
  ("row_1", "item_1"),
  ("row_2", "item_1"),
  ("row_3", "item_2"),
  ("row_4", "item_1"),
  ("row_5", "item_2"),
  ("row_6", "item_1")).toDF("col1", "col2")

import org.apache.spark.sql.expressions.Window
val q = data.
  withColumn("col2_next",
    coalesce(lead($"col2", 1) over Window.orderBy($"col1"), $"col2")).
  groupBy($"col2").
  agg(sum($"col2" =!= $"col2_next" cast "int") as "col3")

scala> q.show
17/08/22 10:15:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
+------+----+
|  col2|col3|
+------+----+
|item_1|   2|
|item_2|   2|
+------+----+

这篇关于如何比较多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆