在星火SQL案例,当语句表 [英] List in the Case-When Statement in Spark SQL

查看:268
本文介绍了在星火SQL案例,当语句表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想一个数据帧由长到宽的转换作为透视星火数据框建议
但是,SQL似乎misinter $ P $角的国家列表作为从表的变量。下面是从上面的链接我从控制台看到消息和样本数据和codeS。任何人都知道如何解决这些问题?


  

从斯卡拉控制台消息:结果
  斯卡拉> VAL myDF1 = sqlc2.sql(查询)结果
  org.apache.spark.sql.AnalysisException:无法解析'美国'给定的输入栏>标识,标签,价值;


  ID标签值
1美国50
1 UK 100
1罐125
2美国75
2 UK 150
2罐175
而且我要:ID美国英国制罐
1 50 100 125
2 75 150 175
我可以创建我想转动,然后创建一个包含SQL查询,我需要一个字符串值的列表。VAL国家=列出(美国,英国,能)
VAL NUMCOUNTRIES = countries.length - 1VAR的查询=SELECT *
为(ⅰ&下; - 0到NUMCOUNTRIES-1){
  查询+ =情况下,当标签=+国家(我)+,那么其他的价值最终0为+国家(我)+,
}
查询+ =情况下,当标签=+ countries.last +,那么其他的价值最终0为+ countries.last +从myTable的myDataFrame.registerTempTable(myTable的)
VAL myDF1 = sqlContext.sql(查询)


解决方案

国家codeS的文字,应该用引号括起来,否则SQL语法分析程序会把这些作为列的名称:

  VAL caseClause = countries.map(
    X =>的CASE WHEN标签='$ X',那么看重ELSE 0 END为$ X
).mkString(,)VAL aggClause = countries.map。(X =&GT氏SUM($ x)的AS $ X)mkString(,)VAL查询= S
   SELECT标识,$ aggClause
   FROM(SELECT ID,$ caseClause FROM myTable的)TMP
   GROUP BY IDsqlContext.sql(查询)

问题是,为什么即使从头开始构建SQL字符串烦恼呢?

 高清genCase(X:字符串)= {
  当($标签< = GT;点亮(X),$值),否则(0).alias(X)
}高清genAgg(F:柱= GT;柱)(X:字串)= F(山口(x)的)别名(x)的DF
 。选择($ID:: countries.map(genCase):_ *)
 .groupBy($ID)
 .agg($ID.alias(虚拟),countries.map(genAgg(SUM)):_ *)
 .drop(虚拟)

I'm trying to convert a dataframe from long to wide as suggested at Pivot Spark Dataframe However, the SQL seems to misinterpret the Countries list as a variable from the table. The below are the messages I saw from the console and the sample data and codes from the above link. Anyone knows how to resolve the issues?

Messages from the scala console:
scala> val myDF1 = sqlc2.sql(query)
org.apache.spark.sql.AnalysisException: cannot resolve 'US' given input columns >id, tag, value;

id  tag  value
1   US    50
1   UK    100
1   Can   125
2   US    75
2   UK    150
2   Can   175
and I want:

id  US  UK   Can
1   50  100  125
2   75  150  175
I can create a list with the value I want to pivot and then create a string containing the sql query I need.

val countries = List("US", "UK", "Can")
val numCountries = countries.length - 1

var query = "select *, "
for (i <- 0 to numCountries-1) {
  query += "case when tag = " + countries(i) + " then value else 0 end as " + countries(i) + ", "
}
query += "case when tag = " + countries.last + " then value else 0 end as " + countries.last + " from myTable"

myDataFrame.registerTempTable("myTable")
val myDF1 = sqlContext.sql(query)

解决方案

Country codes are literals and should be enclosed in quotes otherwise SQL parser will treat these as the names of the columns:

val caseClause = countries.map(
    x => s"""CASE WHEN tag = '$x' THEN value ELSE 0 END as $x"""
).mkString(", ")

val aggClause = countries.map(x => s"""SUM($x) AS $x""").mkString(", ")

val query = s"""
   SELECT id, $aggClause
   FROM (SELECT id, $caseClause FROM myTable) tmp
   GROUP BY id"""

sqlContext.sql(query)

Question is why even bother with building SQL strings from scratch?

def genCase(x: String) = {
  when($"tag" <=> lit(x), $"value").otherwise(0).alias(x)
}

def genAgg(f: Column => Column)(x: String) = f(col(x)).alias(x)

df
 .select($"id" :: countries.map(genCase): _*)
 .groupBy($"id")
 .agg($"id".alias("dummy"), countries.map(genAgg(sum)): _*)
 .drop("dummy")

这篇关于在星火SQL案例,当语句表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆