Spark Scala:无法将其从字符串转换为整数,因为它可能会被截断 [英] Spark Scala: Cannot up cast from string to int as it may truncate
问题描述
我在玩火花时遇到了这个异常.
I got this exception while playing with spark.
线程主要" org.apache.spark.sql.AnalysisException中的异常: 无法将
price
从字符串强制转换为int,因为它可能会被截断 目标对象的类型路径为: -字段(类:"scala.Int",名称:"price") -根类:"org.spark.code.executable.Main.Record" 您可以在输入数据中添加显式转换,也可以在目标对象中选择字段的高精度类型.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast
price
from string to int as it may truncate The type path of the target object is: - field (class: "scala.Int", name: "price") - root class: "org.spark.code.executable.Main.Record" You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
如何解决此异常?这是代码
How Can this exception be solved? Here is the code
object Main {
case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "C:\\winutils\\");
val schema = Encoders.product[Record].schema
val df = SparkConfig.sparkSession.read
.option("header", "true")
.csv("SalesJan2009.csv");
import SparkConfig.sparkSession.implicits._
val ds = df.as[Record]
//ds.groupByKey(body => body.state).count().show()
import org.apache.spark.sql.expressions.scalalang.typed.{
count => typedCount,
sum => typedSum
}
ds.groupByKey(body => body.state)
.agg(typedSum[Record](_.price).name("sum(price)"))
.withColumnRenamed("value", "group")
.alias("Summary by state")
.show()
}
推荐答案
您先阅读了csv文件,然后尝试将其转换为具有不同架构的数据集.最好传递在读取csv文件时创建的架构,如下所示:
You read the csv file first and tried to convert to it to dataset which has different schema. Its better to pass the schema created while reading the csv file as below
val spark = SparkSession.builder()
.master("local")
.appName("test")
.getOrCreate()
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema
val ds = spark.read
.option("header", "true")
.schema(schema) // passing schema
.option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
.csv(path)// csv path
.as[Record] // convert to DS
默认的timestampFormat为yyyy-MM-dd'T'HH:mm:ss.SSSXXX
,因此您还需要传递自定义的timestampFormat.
The default timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXX
so you also need to pass your custom timestampFormat.
希望这会有所帮助
这篇关于Spark Scala:无法将其从字符串转换为整数,因为它可能会被截断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!