Spark单调增加的id在数据帧中无法按预期工作吗? [英] Spark-Monotonically increasing id not working as expected in dataframe?

查看：139 发布时间：2020/9/4 0:02:50 scala apache-spark apache-spark-sql

本文介绍了Spark单调增加的id在数据帧中无法按预期工作吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark中有一个数据框df，看起来像这样:

I have a dataframe df in Spark which looks something like this:

scala> df.show()
+--------+--------+
|columna1|columna2|
+--------+--------+
|     0.1|     0.4|
|     0.2|     0.5|
|     0.1|     0.3|
|     0.3|     0.6|
|     0.2|     0.7|
|     0.2|     0.8|
|     0.1|     0.7|
|     0.5|     0.5|
|     0.6|    0.98|
|     1.2|     1.1|
|     1.2|     1.2|
|     0.4|     0.7|
+--------+--------+

我试图在id列中添加以下代码

I tried to include an id column with the following code

val df_id = df.withColumn("id",monotonicallyIncreasingId)

但是id列不是我所期望的:

but the id column is not what I expect:

scala> df_id.show()
+--------+--------+----------+
|columna1|columna2|        id|
+--------+--------+----------+
|     0.1|     0.4|         0|
|     0.2|     0.5|         1|
|     0.1|     0.3|         2|
|     0.3|     0.6|         3|
|     0.2|     0.7|         4|
|     0.2|     0.8|         5|
|     0.1|     0.7|8589934592|
|     0.5|     0.5|8589934593|
|     0.6|    0.98|8589934594|
|     1.2|     1.1|8589934595|
|     1.2|     1.2|8589934596|
|     0.4|     0.7|8589934597|
+--------+--------+----------+

如您所见，它从0到5很好，但是下一个ID是8589934592而不是6，依此类推.

As you can see, it goes well from 0 to 5 but then the next id is 8589934592 instead of 6 and so on.

那么这里出什么问题了?为什么id列在此处未正确索引?

So what is wrong here? Why is the id column not properly indexed here?

Spark单调增加的id在数据帧中无法按预期工作吗? [英] Spark-Monotonically increasing id not working as expected in dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark单调增加的id在数据帧中无法按预期工作吗? [英] Spark-Monotonically increasing id not working as expected in dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭