PySpark - 将上一行和下一行附加到当前行 [英] PySpark - Append previous and next row to current row

查看：43 发布时间：2021/11/14 23:13:08 python apache-spark dataframe pyspark apache-spark-sql

本文介绍了PySpark - 将上一行和下一行附加到当前行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个像这样的 PySpark 数据框:

Let's say I have a PySpark data frame like so:

1 0 1 0
0 0 1 1
0 1 0 1

如何将一行的最后一列和下一列附加到当前行，如下所示:

How can I append the last and next column of a row to the current row, like so:

1 0 1 0 0 0 0 0 0 0 1 1
0 0 1 1 1 0 1 0 0 1 0 1
0 1 0 1 0 0 1 1 0 0 0 0

我熟悉用于添加列的 .withColumn() 方法，但不确定我会在该字段中放入什么.

I'm familiar with the .withColumn() method for adding columns, but am not sure what I would put in this field.

"0 0 0 0" 是占位符值，因为在这些行之前和之后没有之前或之后的行.

The "0 0 0 0" are placeholder values because there are no prior or subsequent rows before and after those rows.

推荐答案

您可以使用 pyspark.sql.functions.lead() 和 pyspark.sql.functions.lag() 但首先您需要一种对行进行排序的方法.如果您还没有确定顺序的列，您可以使用 pyspark.sql.functions.monotonically_increasing_id()

然后将其与 Window 函数结合使用.

Then use this in conjunction with a Window function.

例如，如果您有以下 DataFrame df:

For example, if you had the following DataFrame df:

df.show()
#+---+---+---+---+
#|  a|  b|  c|  d|
#+---+---+---+---+
#|  1|  0|  1|  0|
#|  0|  0|  1|  1|
#|  0|  1|  0|  1|
#+---+---+---+---+

你可以这样做:

from pyspark.sql import Window
import pyspark.sql.functions as f

cols = df.columns
df = df.withColumn("id", f.monotonically_increasing_id())
df.select(
    "*", 
    *([f.lag(f.col(c),default=0).over(Window.orderBy("id")).alias("prev_"+c) for c in cols] + 
      [f.lead(f.col(c),default=0).over(Window.orderBy("id")).alias("next_"+c) for c in cols])
).drop("id").show()
#+---+---+---+---+------+------+------+------+------+------+------+------+
#|  a|  b|  c|  d|prev_a|prev_b|prev_c|prev_d|next_a|next_b|next_c|next_d|
#+---+---+---+---+------+------+------+------+------+------+------+------+
#|  1|  0|  1|  0|     0|     0|     0|     0|     0|     0|     1|     1|
#|  0|  0|  1|  1|     1|     0|     1|     0|     0|     1|     0|     1|
#|  0|  1|  0|  1|     0|     0|     1|     1|     0|     0|     0|     0|
#+---+---+---+---+------+------+------+------+------+------+------+------+

这篇关于PySpark - 将上一行和下一行附加到当前行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PySpark - 将上一行和下一行附加到当前行 [英] PySpark - Append previous and next row to current row

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PySpark - 将上一行和下一行附加到当前行 [英] PySpark - Append previous and next row to current row

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭