Spark数据框:如何添加索引列:Aka分布式数据索引 [英] Spark Dataframe :How to add a index Column : Aka Distributed Data Index

查看：72 发布时间：2020/9/3 23:38:24 scala apache-spark dataframe apache-spark-sql

本文介绍了Spark数据框:如何添加索引列:Aka分布式数据索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我从csv文件读取数据，但没有索引.

I read data from a csv file ,but don't have index.

我想在1到行的数字之间添加一列.

I want to add a column from 1 to row's number.

我该怎么办，谢谢(斯卡拉)

What should I do,Thanks (scala)

使用Scala，您可以使用:

With Scala you can use:

import org.apache.spark.sql.functions._ 

df.withColumn("id",monotonicallyIncreasingId)

您可以参考示例和scala

You can refer to this exemple and scala docs.

使用Pyspark，您可以使用:

With Pyspark you can use:

from pyspark.sql.functions import monotonically_increasing_id 

df_index = df.select("*").withColumn("id", monotonically_increasing_id())

这篇关于Spark数据框:如何添加索引列:Aka分布式数据索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！