在数据框中将字符串转换为双精度 [英] Converting a string to double in a dataframe

查看：31 发布时间：2021/11/14 22:10:03 apache-spark apache-spark-sql

本文介绍了在数据框中将字符串转换为双精度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 concat 构建了一个生成字符串的数据框.

I have built a dataframe using concat which produces a string.

import sqlContext.implicits._

val df = sc.parallelize(Seq((1.0, 2.0), (3.0, 4.0))).toDF("k", "v")
df.registerTempTable("df")

val dfConcat = df.select(concat($"k", lit(","), $"v").as("test"))

dfConcat: org.apache.spark.sql.DataFrame = [test: string]

+-------------+
|         test|
+-------------+
|      1.0,2.0|
|      3.0,4.0|
+-------------+

如何将其转换回双倍?

我尝试转换为 DoubleType 但我得到 null

I have tried casting to DoubleType but I get null

import org.apache.spark.sql.types._
 intterim.features.cast(IntegerType))

val testDouble = dfConcat.select( dfConcat("test").cast(DoubleType).as("test"))

+----+
|test|
+----+
|null|
|null|
+----+

和udf在运行时返回数字格式异常

and udf return number format exception at run time

import org.apache.spark.sql.functions._

val toDbl    = udf[Double, String]( _.toDouble)

val testDouble = dfConcat
.withColumn("test",      toDbl(dfConcat("test")))              
.select("test")

推荐答案

您不能将其转换为 double，因为它根本不是有效的 double 表示形式.如果你想要一个数组，只需使用 array 函数:

You cannot convert it to double because it is simply not a valid double representation. If you want an array just use array function:

import org.apache.spark.sql.functions.array

df.select(array($"k", $"v").as("test"))

您也可以尝试拆分和转换，但远非最佳:

You can also try to split and convert but it is far from optimal:

import org.apache.spark.sql.types.{ArrayType, DoubleType}
import org.apache.spark.sql.functions.split

dfConcat.select(split($"test", ",").cast(ArrayType(DoubleType)))

这篇关于在数据框中将字符串转换为双精度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在数据框中将字符串转换为双精度 [英] Converting a string to double in a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在数据框中将字符串转换为双精度 [英] Converting a string to double in a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭