Spark Scala:无法将其从字符串转换为整数,因为它可能会被截断 [英] Spark Scala: Cannot up cast from string to int as it may truncate

查看:532
本文介绍了Spark Scala:无法将其从字符串转换为整数,因为它可能会被截断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在玩火花时遇到了这个异常.

I got this exception while playing with spark.

线程主要" org.apache.spark.sql.AnalysisException中的异常: 无法将price从字符串强制转换为int,因为它可能会被截断 目标对象的类型路径为: -字段(类:"scala.Int",名称:"price") -根类:"org.spark.code.executable.Main.Record" 您可以在输入数据中添加显式转换,也可以在目标对象中选择字段的高精度类型.

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast price from string to int as it may truncate The type path of the target object is: - field (class: "scala.Int", name: "price") - root class: "org.spark.code.executable.Main.Record" You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;

如何解决此异常?这是代码

How Can this exception be solved? Here is the code

object Main {

 case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
                accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
 def main(args: Array[String]) {

   System.setProperty("hadoop.home.dir", "C:\\winutils\\");

   val schema = Encoders.product[Record].schema

   val df = SparkConfig.sparkSession.read
  .option("header", "true")
  .csv("SalesJan2009.csv");

   import SparkConfig.sparkSession.implicits._
   val ds = df.as[Record]

  //ds.groupByKey(body => body.state).count().show()

  import org.apache.spark.sql.expressions.scalalang.typed.{
  count => typedCount,
  sum => typedSum
}

  ds.groupByKey(body => body.state)
  .agg(typedSum[Record](_.price).name("sum(price)"))
  .withColumnRenamed("value", "group")
  .alias("Summary by state")
  .show()
}

推荐答案

您先阅读了csv文件,然后尝试将其转换为具有不同架构的数据集.最好传递在读取csv文件时创建的架构,如下所示:

You read the csv file first and tried to convert to it to dataset which has different schema. Its better to pass the schema created while reading the csv file as below

val spark = SparkSession.builder()
  .master("local")
  .appName("test")
  .getOrCreate()

import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema

val ds = spark.read
  .option("header", "true")
  .schema(schema)  // passing schema 
  .option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
  .csv(path)// csv path
  .as[Record] // convert to DS

默认的timestampFormat为yyyy-MM-dd'T'HH:mm:ss.SSSXXX,因此您还需要传递自定义的timestampFormat.

The default timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXX so you also need to pass your custom timestampFormat.

希望这会有所帮助

这篇关于Spark Scala:无法将其从字符串转换为整数,因为它可能会被截断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆