如何使用selectExpr在spark数据帧中转换结构数组? [英] How to cast an array of struct in a spark dataframe using selectExpr?
问题描述
如何在 spark 数据帧中转换结构数组?
How to cast an array of struct in a spark dataframe ?
让我通过一个例子来解释我想要做什么.我们将首先创建一个包含行数组和嵌套行的数据框.我的整数尚未在数据框中进行转换,它们被创建为字符串:
Let me explain what I am trying to do via an example. We'll start by creating a dataframe Which contains an array of rows and nested rows. My Integers are not casted yet in the dataframe, and they're created as strings :
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val rows1 = Seq(
Row("1", Row("a", "b"), "8.00", Seq(Row("1","2"), Row("12","22"))),
Row("2", Row("c", "d"), "9.00", Seq(Row("3","4"), Row("33","44")))
)
val rows1Rdd = spark.sparkContext.parallelize(rows1, 4)
val schema1 = StructType(
Seq(
StructField("id", StringType, true),
StructField("s1", StructType(
Seq(
StructField("x", StringType, true),
StructField("y", StringType, true)
)
), true),
StructField("d", StringType, true),
StructField("s2", ArrayType(StructType(
Seq(
StructField("u", StringType, true),
StructField("v", StringType, true)
)
)), true)
)
)
val df1 = spark.createDataFrame(rows1Rdd, schema1)
这是创建的数据框的架构:
Here's the schema of the created dataframe :
df1.printSchema
root
|-- id: string (nullable = true)
|-- s1: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- d: string (nullable = true)
|-- s2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- u: string (nullable = true)
| | |-- v: string (nullable = true)
我想要做的是将所有可以是整数的字符串转换为整数.我尝试执行以下操作,但没有奏效:
What I want to do is to cast all the strings which can be an integer, to an integer. I tried to do the following but it didn't work:
df1.selectExpr("CAST (id AS INTEGER) as id",
"STRUCT (s1.x, s1.y) AS s1",
"CAST (d AS DECIMAL) as d",
"Array (Struct(CAST (s2.u AS INTEGER), CAST (s2.v AS INTEGER))) as s2").show()
我遇到了以下异常:
cannot resolve 'CAST(`s2`.`u` AS INT)' due to data type mismatch: cannot cast array<string> to int; line 1 pos 14;
任何人都有正确的查询将所有值转换为 INTEGER ?我会很感激的.
Anyone has the right query to cast all the values to INTEGER ? I'll be grateful.
非常感谢,
推荐答案
你应该匹配一个完整的结构:
You should match a full structure:
val result = df1.selectExpr(
"CAST(id AS integer) id",
"s1",
"CAST(d AS decimal) d",
"CAST(s2 AS array<struct<u:integer,v:integer>>) s2"
)
它应该为您提供以下架构:
which should give you following schema:
result.printSchema
root
|-- id: integer (nullable = true)
|-- s1: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- d: decimal(10,0) (nullable = true)
|-- s2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- u: integer (nullable = true)
| | |-- v: integer (nullable = true)
和数据:
result.show
+---+-----+---+----------------+
| id| s1| d| s2|
+---+-----+---+----------------+
| 1|[a,b]| 8|[[1,2], [12,22]]|
| 2|[c,d]| 9|[[3,4], [33,44]]|
+---+-----+---+----------------+
这篇关于如何使用selectExpr在spark数据帧中转换结构数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!