在DataFrame中强制转换多个列 [英] Cast multiples columns in a DataFrame

查看：118 发布时间：2020/9/29 23:38:20 scala apache-spark dataframe casting databricks

本文介绍了在DataFrame中强制转换多个列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Databricks，并且正在处理分类问题。
我有一个包含2000多个列的DataFrame。我想转换所有将成为功能加倍的列。

I'm on Databricks and I'm working on a classification problem. I have a DataFrame with 2000+ columns. I want to cast all the columns that will become features to double.

val array45 = data.columns drop(1)

for (element <- array45) {

data.withColumn(element, data(element).cast("double"))

}
 data.printSchema()

对double的强制转换有效，但我没有将其保存在名为Data的DataFrame中。如果我在循环中创建一个新的DataFrame；在for循环之外，我的DataFrame不存在。
我不想使用UDF。

The cast to double is working but I'm not saving it in the DataFrame called Data. If I create a new DataFrame in the loop ; outside of the for loops my DataFrame won't exist. I do not want to use UDF.

我该如何解决？

编辑：：谢谢你们的回答！我不知道为什么，但是Shaido和Raul的答案花了很多时间来计算。我认为它来自Databricks。

EDIT : Thanks both of you for your answer ! I don't know why but the answer of Shaido and Raul are taking a bunch of time to compute. It comes from Databricks, I think.

推荐答案

您只需将函数编写为 cast a 列到 doubleType 并在 select 方法中使用该函数。

you can simply write a function to cast a column to doubleType and use the function in select method.

函数：

import org.apache.spark.sql.types._
def func(column: Column) = column.cast(DoubleType)

使用 select 中的函数作为

val array45 = data.columns.drop(1)
import org.apache.spark.sql.functions._
data.select(array45.map(name => func(col(name))): _*).show(false)

我希望答案会有所帮助

这篇关于在DataFrame中强制转换多个列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在DataFrame中强制转换多个列 [英] Cast multiples columns in a DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在DataFrame中强制转换多个列 [英] Cast multiples columns in a DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭