Scala Apache Spark:列名中的非标准字符 [英] Scala Apache Spark: Nonstandard characters in column names
问题描述
我正在调用以下内容:
propertiesDF.select(
col("timestamp"), col("coordinates")(0) as "lon",
col("coordinates")(1) as "lat",
col("properties.tide (above mllw)") as "tideAboveMllw",
col("properties.wind speed") as "windSpeed")
这给了我以下错误:
org.apache.spark.sql.AnalysisException: 没有这样的结构域潮(高于mllw)在空气温度、大气压、露点、主导波周期、平均波方向、名称、程序名称、显着波高、潮汐(高于 mllw):、能见度、水温度、风向、风速;
org.apache.spark.sql.AnalysisException: No such struct field tide (above mllw) in air temperature, atmospheric pressure, dew point, dominant wave period, mean wave direction, name, program name, significant wave height, tide (above mllw):, visibility, water temperature, wind direction, wind speed;
现在肯定有这样一个结构体字段.(错误消息本身就是这么说的.)
Now there definitely is such a struct field. (The error message itself says so.)
这是架构:
root
|-- timestamp: long (nullable = true)
|-- coordinates: array (nullable = true)
| |-- element: double (containsNull = true)
|-- properties: struct (nullable = true)
| |-- air temperature: double (nullable = true)
| |-- atmospheric pressure: double (nullable = true)
| |-- dew point: double (nullable = true)
.
.
.
| |-- tide (above mllw):: string (nullable = true)
.
.
.
输入被读取为 JSON,如下所示:
The input is read as JSON like this:
val df = sqlContext.read.json(dirName)
如何处理列名中的括号?
How do I handle parentheses in a column name?
推荐答案
您应该首先避免使用这样的名称,但您可以拆分访问路径:
You should avoid names like this in the first place but you can either split access path:
val df = spark.range(1).select(struct(
lit(123).as("tide (above mllw)"),
lit(1).as("wind speed")
).as("properties"))
df.select(col("properties").getItem("tide (above mllw)"))
// or
df.select(col("properties")("tide (above mllw)"))
或用反引号括起有问题的字段:
or enclose problematic field with backticks:
df.select(col("properties.`tide (above mllw)`"))
两种解决方案都假定您的数据包含以下结构(基于您用于查询的访问路径):
Both solutions assume data your data contains following structure (based on the access path you use for queries):
df.printSchema
// root
// |-- properties: struct (nullable = false)
// | |-- tide (above mllw): integer (nullable = false)
// | |-- wind speed: integer (nullable = false)
这篇关于Scala Apache Spark:列名中的非标准字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!