使用句点访问列名 - Spark SQL 1.3 [英] Accessing column names with periods - Spark SQL 1.3
问题描述
我有一个包含句点字段的 DataFrame.当我尝试对它们使用 select() 时,Spark 无法解析它们,可能是因为 '.'用于访问嵌套字段.
I have a DataFrame with fields which contain a period. When I attempt to use select() on them the Spark cannot resolve them, likely because '.' is used for accessing nested fields.
错误如下:
enrichData.select("google.com")org.apache.spark.sql.AnalysisException:无法解析google.com"给定的输入列 google.com、yahoo.com、....
enrichData.select("google.com") org.apache.spark.sql.AnalysisException: cannot resolve 'google.com' given input columns google.com, yahoo.com, ....
有没有办法访问这些列?或者更改列名称的简单方法(因为我无法选择它们,我该如何更改名称?).
Is there a way to access these columns? Or an easy way to change the column names (as I can't select them, how can I change the names?).
推荐答案
列名中有句点会使 spark 假定它为 Nested 字段,字段中的字段.为了解决这个问题,您需要使用反引号`".这应该有效:
Having a period in column name makes spark assume it as Nested field, field in a field. To counter that, you need to use a backtick "`". This should work:
scala> val df = Seq(("yr", 2000), ("pr", 12341234)).toDF("x.y", "e")
df: org.apache.spark.sql.DataFrame = [x.y: string, e: int]
scala> df.select("`x.y`").show
+---+
|x.y|
+---+
| yr|
| pr|
+---+
你需要加一个反引号(`)
you need to put a backtick(`)
这篇关于使用句点访问列名 - Spark SQL 1.3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!