创建特定于数据帧的模式:以大写字母开头的StructField [英] creating dataframe specific schema : StructField starting with capital letter
问题描述
为冗长的帖子表示歉意,看似简单,但我想提供完整的背景信息...
Apologies for the lengthy post for a seemingly simple curiosity, but I wanted to give full context...
在Databricks中,我将基于特定的架构定义创建一个行"数据,然后将该行插入到一个空的数据框中(也基于相同的特定架构).
In Databricks, I am creating a "row" of data based on a specific schema definition, and then inserting that row into an empty dataframe (also based on the same specific schema).
模式定义如下:
myschema_xb = StructType(
[
StructField("_xmlns", StringType(), True),
StructField("_Version", DoubleType(), True),
StructField("MyIds",
ArrayType(
StructType(
[
StructField("_ID", StringType(), True),
StructField("_ID_Context", StringType(), True),
StructField("_Type", LongType(), True),
]
),
True
),
True
),
]
)
行条目因此是:
myRow = Row(
_xmlns="http://some.where.com",
_Version=12.3,
MyIds=[
Row(
_ID="XY",
_ID_Context="Exxwhy",
_Type=9
),
Row(
_ID="9152",
_ID_Context="LNUMB",
_Type=21
),
]
)
最后,databricks笔记本代码为:
Lastly, the databricks notebook code is:
mydf = spark.createDataFrame(sc.emptyRDD(), myschema_xb)
rows = [myRow]
rdf = spark.createDataFrame(rows, myschema_xb)
appended = mydf.union(rdf)
对 rdf = spark.createDataFrame(rows,myschema_xb)
的调用会导致异常:
ValueError:带有StructType的意外元组'h'.
ValueError: Unexpected tuple 'h' with StructType
.
现在我很想知道的部分是,如果我将元素 MyIds
更改为 myIds
(即首字母小写),则代码可以正常工作,而我的新数据框(附加
)具有单行数据.
Now the part I am curious about is if I change the element MyIds
to myIds
(ie. lower case the first letter), the code works, and my new dataframe (appended
) has the single row of data.
此例外是什么意思&为什么在更改元素大小写时它消失了?
What is this exception mean & why does it go away when I change the case of my element?
(仅供参考,我们的databricks运行时环境为Scala 2.11)
(FYI, our databricks runtime environment is Scala 2.11)
谢谢.