如何在带有增量值的Pyspark中的DataFrame中添加列? [英] How could I add a column to a DataFrame in Pyspark with incremental values?

查看：105 发布时间：2020/9/13 20:46:48 python dataframe attributes pyspark increment

本文介绍了如何在带有增量值的Pyspark中的DataFrame中添加列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名为"df"的数据框，如下所示:

I have a DataFrame called 'df' like the following:

+-------+-------+-------+
|  Atr1 |  Atr2 |  Atr3 |
+-------+-------+-------+
|   A   |   A   |   A   |
+-------+-------+-------+
|   B   |   A   |   A   |
+-------+-------+-------+
|   C   |   A   |   A   |
+-------+-------+-------+

我想为其添加一个带有增量值的新列，并获取以下更新的DataFrame:

I want to add a new column to it with incremental values and get the following updated DataFrame:

+-------+-------+-------+-------+
|  Atr1 |  Atr2 |  Atr3 |  Atr4 |
+-------+-------+-------+-------+
|   A   |   A   |   A   |   1   |
+-------+-------+-------+-------+
|   B   |   A   |   A   |   2   |
+-------+-------+-------+-------+
|   C   |   A   |   A   |   3   |
+-------+-------+-------+-------+

我怎么能得到它?

推荐答案

如果您只需要增量值(例如ID)，并且如果不存在数字必须连续的约束，则可以可以使用monotonically_increasing_id().使用此功能的唯一保证是每一行的值都会增加，但是，每次执行时它们自身的值可能会有所不同.

If you only need incremental values (like an ID) and if there is no constraint that the numbers need to be consecutive, you could use monotonically_increasing_id(). The only guarantee when using this function is that the values will be increasing for each row, however, the values themself can differ each execution.

from pyspark.sql.functions import monotonically_increasing_id

df.withColumn("Atr4", monotonically_increasing_id())

这篇关于如何在带有增量值的Pyspark中的DataFrame中添加列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在带有增量值的Pyspark中的DataFrame中添加列? [英] How could I add a column to a DataFrame in Pyspark with incremental values?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在带有增量值的Pyspark中的DataFrame中添加列? [英] How could I add a column to a DataFrame in Pyspark with incremental values?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭