如何使用 case 类将简单的 DataFrame 转换为 DataSet Spark Scala? [英] How to convert a simple DataFrame to a DataSet Spark Scala with case class?

查看：34 发布时间：2021/11/14 22:07:01 scala apache-spark apache-spark-sql

本文介绍了如何使用 case 类将简单的 DataFrame 转换为 DataSet Spark Scala?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将一个简单的 DataFrame 转换为 Spark 示例中的 DataSet:https://spark.apache.org/docs/latest/sql-编程指南.html

I am trying to convert a simple DataFrame to a DataSet from the example in Spark: https://spark.apache.org/docs/latest/sql-programming-guide.html

case class Person(name: String, age: Int)    
import spark.implicits._

val path = "examples/src/main/resources/people.json"

val peopleDS = spark.read.json(path).as[Person]
peopleDS.show()

但是出现了以下问题:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to int as it may truncate
The type path of the target object is:
- field (class: "scala.Int", name: "age")
- root class: ....

有人可以帮我吗?

编辑我注意到用 Long 而不是 Int 有效！这是为什么?

Edit I noticed that with Long instead of Int works! Why is that?

还有:

val primitiveDS = Seq(1,2,3).toDS()
val augmentedDS = primitiveDS.map(i => ("var_" + i.toString, (i + 1).toLong))
augmentedDS.show()

augmentedDS.as[Person].show()

打印:

+-----+---+
|   _1| _2|
+-----+---+
|var_1|  2|
|var_2|  3|
|var_3|  4|
+-----+---+

Exception in thread "main"
org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_1, _2];

有人能帮我理解一下吗?

Can Anyone Help me out understand here?

推荐答案

如果您将 Int 更改为 Long(或 BigInt)，它可以正常工作:

If you change Int to Long (or BigInt) it works fine:

case class Person(name: String, age: Long)
import spark.implicits._

val path = "examples/src/main/resources/people.json"

val peopleDS = spark.read.json(path).as[Person]
peopleDS.show()

输出:

+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

Spark.read.json 默认将数字解析为 Long 类型 - 这样做更安全.您可以在使用 cast 或 udfs 后更改 col 类型.

Spark.read.json by default parses numbers as Long types - it's safer to do so. You can change the col type after using casting or udfs.

编辑 2:

要回答您的第二个问题，您需要在转换为 Person 之前正确命名列:

To answer your 2nd question, you need to name the columns correctly before the conversion to Person will work:

val primitiveDS = Seq(1,2,3).toDS()
val augmentedDS = primitiveDS.map(i => ("var_" + i.toString, (i + 1).toLong)).
 withColumnRenamed ("_1", "name" ).
 withColumnRenamed ("_2", "age" )
augmentedDS.as[Person].show()

输出:

+-----+---+
| name|age|
+-----+---+
|var_1|  2|
|var_2|  3|
|var_3|  4|
+-----+---+

这篇关于如何使用 case 类将简单的 DataFrame 转换为 DataSet Spark Scala?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 case 类将简单的 DataFrame 转换为 DataSet Spark Scala? [英] How to convert a simple DataFrame to a DataSet Spark Scala with case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 case 类将简单的 DataFrame 转换为 DataSet Spark Scala? [英] How to convert a simple DataFrame to a DataSet Spark Scala with case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭