Spark-单调增加的id在数据帧中没有按预期工作? [英] Spark-Monotonically increasing id not working as expected in dataframe?

查看：23 发布时间：2021/11/14 21:25:06 scala apache-spark apache-spark-sql

本文介绍了Spark-单调增加的id在数据帧中没有按预期工作?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 Spark 中有一个数据帧 df，它看起来像这样:

I have a dataframe df in Spark which looks something like this:

scala> df.show()
+--------+--------+
|columna1|columna2|
+--------+--------+
|     0.1|     0.4|
|     0.2|     0.5|
|     0.1|     0.3|
|     0.3|     0.6|
|     0.2|     0.7|
|     0.2|     0.8|
|     0.1|     0.7|
|     0.5|     0.5|
|     0.6|    0.98|
|     1.2|     1.1|
|     1.2|     1.2|
|     0.4|     0.7|
+--------+--------+

我尝试使用以下代码包含一个 id 列

I tried to include an id column with the following code

val df_id = df.withColumn("id",monotonicallyIncreasingId)

但 id 列不是我所期望的:

but the id column is not what I expect:

scala> df_id.show()
+--------+--------+----------+
|columna1|columna2|        id|
+--------+--------+----------+
|     0.1|     0.4|         0|
|     0.2|     0.5|         1|
|     0.1|     0.3|         2|
|     0.3|     0.6|         3|
|     0.2|     0.7|         4|
|     0.2|     0.8|         5|
|     0.1|     0.7|8589934592|
|     0.5|     0.5|8589934593|
|     0.6|    0.98|8589934594|
|     1.2|     1.1|8589934595|
|     1.2|     1.2|8589934596|
|     0.4|     0.7|8589934597|
+--------+--------+----------+

正如你所看到的，它从 0 到 5 都很顺利，但接下来的 id 是 8589934592 而不是 6 等等.

As you can see, it goes well from 0 to 5 but then the next id is 8589934592 instead of 6 and so on.

那么这里出了什么问题?为什么这里的 id 列没有正确索引?

So what is wrong here? Why is the id column not properly indexed here?

Spark-单调增加的id在数据帧中没有按预期工作? [英] Spark-Monotonically increasing id not working as expected in dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark-单调增加的id在数据帧中没有按预期工作? [英] Spark-Monotonically increasing id not working as expected in dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭