如何在 spark SQL 中为表添加增量列 ID [英] how to add a Incremental column ID for a table in spark SQL

查看：68 发布时间：2021/11/14 22:29:30 apache-spark apache-spark-sql spark-dataframe apache-spark-mllib

本文介绍了如何在 spark SQL 中为表添加增量列 ID的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究 spark mllib 算法.我拥有的数据集是这种形式

I'm working on a spark mllib algorithm. The dataset I have is in this form

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(还有更多类似的值)

Company":"XXXX","CurrentTitle":"XYZ","Edu_Title":"ABC","Exp_mnth":.(there are more values similar to these)

我正在尝试将字符串值原始编码为数字值.因此，我尝试使用 zipwithuniqueID 作为每个字符串值的唯一值.出于某种原因，我无法将修改后的数据集保存到磁盘.我可以使用 spark SQL 以任何方式执行此操作吗?或者什么是更好的方法?

Im trying to raw code String values to Numeric values. So, I tried using zipwithuniqueID for unique value for each of the string values.For some reason I'm not able to save the modified dataset to the disk. Can I do this in any way using spark SQL? or what would be the better approach for this?

推荐答案

Scala

val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

Java

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());

这篇关于如何在 spark SQL 中为表添加增量列 ID的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 spark SQL 中为表添加增量列 ID [英] how to add a Incremental column ID for a table in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 spark SQL 中为表添加增量列 ID [英] how to add a Incremental column ID for a table in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭