如何在带有增量值的Pyspark中的DataFrame中添加列? [英] How could I add a column to a DataFrame in Pyspark with incremental values?
本文介绍了如何在带有增量值的Pyspark中的DataFrame中添加列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个名为"df"的数据框,如下所示:
I have a DataFrame called 'df' like the following:
+-------+-------+-------+
| Atr1 | Atr2 | Atr3 |
+-------+-------+-------+
| A | A | A |
+-------+-------+-------+
| B | A | A |
+-------+-------+-------+
| C | A | A |
+-------+-------+-------+
我想为其添加一个带有增量值的新列,并获取以下更新的DataFrame:
I want to add a new column to it with incremental values and get the following updated DataFrame:
+-------+-------+-------+-------+
| Atr1 | Atr2 | Atr3 | Atr4 |
+-------+-------+-------+-------+
| A | A | A | 1 |
+-------+-------+-------+-------+
| B | A | A | 2 |
+-------+-------+-------+-------+
| C | A | A | 3 |
+-------+-------+-------+-------+
我怎么能得到它?
推荐答案
如果您只需要增量值(例如ID),并且如果不存在数字必须连续的约束,则可以可以使用monotonically_increasing_id()
.使用此功能的唯一保证是每一行的值都会增加,但是,每次执行时它们自身的值可能会有所不同.
If you only need incremental values (like an ID) and if there is no constraint that the numbers need to be consecutive, you could use monotonically_increasing_id()
. The only guarantee when using this function is that the values will be increasing for each row, however, the values themself can differ each execution.
from pyspark.sql.functions import monotonically_increasing_id
df.withColumn("Atr4", monotonically_increasing_id())
这篇关于如何在带有增量值的Pyspark中的DataFrame中添加列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文