Scala Apache Spark:列名中的非标准字符 [英] Scala Apache Spark: Nonstandard characters in column names

查看:29
本文介绍了Scala Apache Spark:列名中的非标准字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调用以下内容:

  propertiesDF.select(
        col("timestamp"), col("coordinates")(0) as "lon", 
        col("coordinates")(1) as "lat", 
        col("properties.tide (above mllw)") as "tideAboveMllw",
        col("properties.wind speed") as "windSpeed")

这给了我以下错误:

org.apache.spark.sql.AnalysisException: 没有这样的结构域潮(高于mllw)在空气温度、大气压、露点、主导波周期、平均波方向、名称、程序名称、显着波高、潮汐(高于 mllw):、能见度、水温度、风向、风速;

org.apache.spark.sql.AnalysisException: No such struct field tide (above mllw) in air temperature, atmospheric pressure, dew point, dominant wave period, mean wave direction, name, program name, significant wave height, tide (above mllw):, visibility, water temperature, wind direction, wind speed;

现在肯定有这样一个结构体字段.(错误消息本身就是这么说的.)

Now there definitely is such a struct field. (The error message itself says so.)

这是架构:

 root
 |-- timestamp: long (nullable = true)
 |-- coordinates: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- properties: struct (nullable = true)
 |    |-- air temperature: double (nullable = true)
 |    |-- atmospheric pressure: double (nullable = true)
 |    |-- dew point: double (nullable = true)
          .
          .
          .
 |    |-- tide (above mllw):: string (nullable = true)
          .
          .
          .

输入被读取为 JSON,如下所示:

The input is read as JSON like this:

val df = sqlContext.read.json(dirName)

如何处理列名中的括号?

How do I handle parentheses in a column name?

推荐答案

您应该首先避免使用这样的名称,但您可以拆分访问路径:

You should avoid names like this in the first place but you can either split access path:

val df = spark.range(1).select(struct(
  lit(123).as("tide (above mllw)"),
  lit(1).as("wind speed")
).as("properties"))

df.select(col("properties").getItem("tide (above mllw)"))

// or

df.select(col("properties")("tide (above mllw)"))

或用反引号括起有问题的字段:

or enclose problematic field with backticks:

df.select(col("properties.`tide (above mllw)`"))

两种解决方案都假定您的数据包含以下结构(基于您用于查询的访问路径):

Both solutions assume data your data contains following structure (based on the access path you use for queries):

df.printSchema
// root
//  |-- properties: struct (nullable = false)
//  |    |-- tide (above mllw): integer (nullable = false)
//  |    |-- wind speed: integer (nullable = false)

这篇关于Scala Apache Spark:列名中的非标准字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆