使用案例类Spark 2.1.0的显式强制转换.csv [英] Explicit cast reading .csv with case class Spark 2.1.0
本文介绍了使用案例类Spark 2.1.0的显式强制转换.csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下案例类:
case class OrderDetails(OrderID : String, ProductID : String, UnitPrice : Double,
Qty : Int, Discount : Double)
我正在尝试阅读以下csv: https ://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
I am trying read this csv: https://github.com/xsankar/fdps-v3/blob/master/data/NW-Order-Details.csv
这是我的代码:
val spark = SparkSession.builder.master(sparkMaster).appName(sparkAppName).getOrCreate()
import spark.implicits._
val orderDetails = spark.read.option("header","true").csv( inputFiles + "NW-Order-Details.csv").as[OrderDetails]
错误是:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast `UnitPrice` from string to double as it may truncate
The type path of the target object is:
- field (class: "scala.Double", name: "UnitPrice")
- root class: "es.own3dh2so4.OrderDetails"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
如果所有字段均为双精度"值,为什么不能对其进行转换?我不明白什么?
Why can not it be transformed if all fields are "doubles" values? What do not I understand?
Spark版本2.1.0,Scala版本2.11.7
Spark version 2.1.0, Scala version 2.11.7
推荐答案
您只需要将字段显式转换为Double
:
You just need to explicitly cast your field to a Double
:
val orderDetails = spark.read
.option("header","true")
.csv( inputFiles + "NW-Order-Details.csv")
.withColumn("unitPrice", 'UnitPrice.cast(DoubleType))
.as[OrderDetails]
顺便提一下,根据Scala(和Java)的约定,您的case类构造函数参数应为小写驼峰式:
On a side note, by Scala (and Java) convention, your case class constructor parameters should be lower camel case:
case class OrderDetails(orderID: String,
productID: String,
unitPrice: Double,
qty: Int,
discount: Double)
这篇关于使用案例类Spark 2.1.0的显式强制转换.csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文