SPARK:UNION只能在列类型兼容的表上执行。结构<;名称,id>;!=结构<;id,名称&>; [英] Spark : Union can only be performed on tables with the compatible column types. Struct<name,id> != Struct<id,name>
本文介绍了SPARK:UNION只能在列类型兼容的表上执行。结构<;名称,id>;!=结构<;id,名称&>;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
错误: 只能对具有兼容列类型的表执行UNION。 在第二张表的第一列;
的struct(tier:string,sky ward_number:string,Skyward_Points:string)<;>struct(sky ward_number:string,tier:string,Skyward_Points:String);此处结构字段的顺序不同,但其余一切都相同。
DataFrame1架构
root
|-- emcg_uuid: string (nullable = true)
|-- name: string (nullable = true)
|-- phone_no: string (nullable = true)
|-- dob: string (nullable = true)
|-- country: string (nullable = true)
|-- travel_type: string (nullable = true)
|-- gdpr_restricted_flg: string (nullable = false)
|-- gdpr_reason_code: string (nullable = false)
|-- document: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- skyward: struct (nullable = false)
| |-- tier: string (nullable = false)
| |-- skyward_number: string (nullable = false)
| |-- skyward_points: string (nullable = false)
dataframe2 schema
root
|-- emcg_uuid: string (nullable = true)
|-- name: string (nullable = true)
|-- phone_no: string (nullable = true)
|-- dob: string (nullable = true)
|-- country: string (nullable = true)
|-- travel_type: string (nullable = true)
|-- gdpr_restricted_flg: string (nullable = true)
|-- gdpr_reason_code: string (nullable = true)
|-- document: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- skyward: struct (nullable = false)
| |-- skyward_number: string (nullable = false)
| |-- tier: string (nullable = false)
| |-- skyward_points: string (nullable = false)
如何解决此问题?
推荐答案
//preserves the order the columns while doing union
def getStructRecursiveDataFrame(df1 : DataFrame, df2 : DataFrame,columns : Array[String]) : DataFrame = {
if(columns.isEmpty) {
df2
}
else {
println("test")
val col_name = columns.head
val col_schema = df1.schema.fields.find(_.name == col_name).get
if(col_schema.dataType.typeName.equals("struct")){
println("test1")
val updatedStructNames: Seq[Column] = col_schema.dataType.asInstanceOf[StructType].fieldNames.map(name => col(col_name+"." + name))
getStructRecursiveDataFrame(df1,df2.withColumn(col_name, struct(updatedStructNames: _*)),columns.tail)
}
else{ getStructRecursiveDataFrame(df1,df2,columns.tail)}
}
}
def unionByName(a: org.apache.spark.sql.DataFrame, b: org.apache.spark.sql.DataFrame): org.apache.spark.sql.DataFrame = {
val b_new_df = getStructRecursiveDataFrame(a,b,a.columns)
val columns_seq = a.columns.toSet.intersect(b_new_df.columns.toSet).map(col).toSeq
a.select(columns_seq: _*).union(b_new_df.select(columns_seq: _*))
}
结果
[INFO] DATAFRAME-1 SCHEME
root
|-- emcg_uuid: string (nullable = true)
|-- name: string (nullable = true)
|-- phone_no: string (nullable = true)
|-- dob: string (nullable = true)
|-- country: string (nullable = true)
|-- travel_type: string (nullable = true)
|-- gdpr_restricted_flg: string (nullable = false)
|-- gdpr_reason_code: string (nullable = false)
|-- document: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- skyward: struct (nullable = false)
| |-- tier: string (nullable = false)
| |-- skyward_number: string (nullable = false)
| |-- skyward_points: string (nullable = false)
[INFO] DATAFRAME-2 SCHEME
root
|-- emcg_uuid: string (nullable = true)
|-- name: string (nullable = true)
|-- phone_no: string (nullable = true)
|-- dob: string (nullable = true)
|-- country: string (nullable = true)
|-- travel_type: string (nullable = true)
|-- gdpr_restricted_flg: string (nullable = true)
|-- gdpr_reason_code: string (nullable = true)
|-- document: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- skyward: struct (nullable = false)
| |-- skyward_number: string (nullable = false)
| |-- tier: string (nullable = false)
| |-- skyward_points: string (nullable = false)
[INFO] DATAFRAME SCHEME AFTER THE UNION
root
|-- skyward: struct (nullable = false)
| |-- skyward_number: string (nullable = false)
| |-- tier: string (nullable = false)
| |-- skyward_points: string (nullable = false)
|-- name: string (nullable = true)
|-- document: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- phone_no: string (nullable = true)
|-- travel_type: string (nullable = true)
|-- gdpr_restricted_flg: string (nullable = true)
|-- dob: string (nullable = true)
|-- gdpr_reason_code: string (nullable = true)
|-- country: string (nullable = true)
|-- emcg_uuid: string (nullable = true)
[INFO] TEST CASE FOR ANONYMIZATION VALIDATION
[INFO] INPUT DATA
+----+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|name|phone_no |travel_type|gdpr_restricted_flg|dob |gdpr_reason_code|country|emcg_uuid|document |skyward |
+----+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|ravi|8747436090|freq | |1988-05-28| |dubai |uuid_1 |Map(document_type -> passport, id -> A3343)|[123456,blue,687]|
|aaaa|8747436091|freg | |1988-06-25| |europe |uuid_2 |Map(document_type -> passport, id -> A3341)|[123456,blue,687]|
|bbbb|8747436092|reg | |1988-07-26| |india |uuid_3 |Map(document_type -> passport, id -> A3345)|[123456,blue,687]|
|cccc|8747436093|na | |1988-08-27| |georgia|uuid_4 |Map(document_type -> passport, id -> A3349)|[123456,blue,687]|
|dddd|8747436094|na | |1988-09-29| |swis |uuid_5 |Map(document_type -> passport, id -> B3343)|[123456,blue,687]|
|null|8747436095|freq | |1988-02-30| |us |uuid_6 |Map(document_type -> passport, id -> C3343)|[123456,blue,687]|
|null|8747436096|na | |1988-01-01| |canada |uuid_7 |Map(document_type -> null, id -> D3343) |[123456,blue,687]|
+----+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
[INFO] EXPECTED OUTPUT
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|name |phone_no |travel_type|gdpr_restricted_flg|dob |gdpr_reason_code|country|emcg_uuid|document |skyward |
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|DDDDDDD|9999999 |freq |Y |1988-05-XX|13-001 |XXXXXXX|uuid_1 |Map(document_type -> ZZZZZ, id -> HH343) |[123456,blue,687]|
|aaaa |8747436091|freg | |1988-06-25| |europe |uuid_2 |Map(document_type -> passport, id -> A3341)|[123456,blue,687]|
|DDDDDDD|9999999 |reg |Y |1988-07-XX|13-001 |XXXXXXX|uuid_3 |Map(document_type -> ZZZZZ, id -> HH345) |[123456,blue,687]|
|cccc |8747436093|na | |1988-08-27| |georgia|uuid_4 |Map(document_type -> passport, id -> A3349)|[123456,blue,687]|
|dddd |8747436094|na | |1988-09-29| |swis |uuid_5 |Map(document_type -> passport, id -> B3343)|[123456,blue,687]|
|null |8747436095|freq | |1988-02-30| |us |uuid_6 |Map(document_type -> passport, id -> C3343)|[123456,blue,687]|
|null |9999999 |na |Y |1988-01-XX|13-001 |XXXXXXX|uuid_7 |Map(document_type -> null, id -> HH343) |[123456,blue,687]|
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
[INFO] ACTUAL OUTPUT
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|name |phone_no |travel_type|gdpr_restricted_flg|dob |gdpr_reason_code|country|emcg_uuid|document |skyward |
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
|DDDDDDD|9999999 |freq |Y |1988-05-XX|13-001 |XXXXXXX|uuid_1 |Map(document_type -> ZZZZZ, id -> HH343) |[UUUUU,blue,JJ7] |
|aaaa |8747436091|freg | |1988-06-25| |europe |uuid_2 |Map(document_type -> passport, id -> A3341)|[123456,blue,687]|
|DDDDDDD|9999999 |reg |Y |1988-07-XX|13-001 |XXXXXXX|uuid_3 |Map(document_type -> ZZZZZ, id -> HH345) |[UUUUU,blue,JJ7] |
|cccc |8747436093|na | |1988-08-27| |georgia|uuid_4 |Map(document_type -> passport, id -> A3349)|[123456,blue,687]|
|dddd |8747436094|na | |1988-09-29| |swis |uuid_5 |Map(document_type -> passport, id -> B3343)|[123456,blue,687]|
|null |8747436095|freq | |1988-02-30| |us |uuid_6 |Map(document_type -> passport, id -> C3343)|[123456,blue,687]|
|null |9999999 |na |Y |1988-01-XX|13-001 |XXXXXXX|uuid_7 |Map(document_type -> null, id -> HH343) |[UUUUU,blue,JJ7] |
+-------+----------+-----------+-------------------+----------+----------------+-------+---------+-------------------------------------------+-----------------+
这篇关于SPARK:UNION只能在列类型兼容的表上执行。结构<;名称,id>;!=结构<;id,名称&>;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文