在星火SQL案例,当语句表 [英] List in the Case-When Statement in Spark SQL
问题描述
我想一个数据帧由长到宽的转换作为透视星火数据框建议
但是,SQL似乎misinter $ P $角的国家列表作为从表的变量。下面是从上面的链接我从控制台看到消息和样本数据和codeS。任何人都知道如何解决这些问题?
从斯卡拉控制台消息:结果
斯卡拉> VAL myDF1 = sqlc2.sql(查询)结果
org.apache.spark.sql.AnalysisException:无法解析'美国'给定的输入栏>标识,标签,价值;
块引用>ID标签值
1美国50
1 UK 100
1罐125
2美国75
2 UK 150
2罐175
而且我要:ID美国英国制罐
1 50 100 125
2 75 150 175
我可以创建我想转动,然后创建一个包含SQL查询,我需要一个字符串值的列表。VAL国家=列出(美国,英国,能)
VAL NUMCOUNTRIES = countries.length - 1VAR的查询=SELECT *
为(ⅰ&下; - 0到NUMCOUNTRIES-1){
查询+ =情况下,当标签=+国家(我)+,那么其他的价值最终0为+国家(我)+,
}
查询+ =情况下,当标签=+ countries.last +,那么其他的价值最终0为+ countries.last +从myTable的myDataFrame.registerTempTable(myTable的)
VAL myDF1 = sqlContext.sql(查询)
解决方案国家codeS的文字,应该用引号括起来,否则SQL语法分析程序会把这些作为列的名称:
VAL caseClause = countries.map(
X =>的CASE WHEN标签='$ X',那么看重ELSE 0 END为$ X
).mkString(,)VAL aggClause = countries.map。(X =&GT氏SUM($ x)的AS $ X)mkString(,)VAL查询= S
SELECT标识,$ aggClause
FROM(SELECT ID,$ caseClause FROM myTable的)TMP
GROUP BY IDsqlContext.sql(查询)问题是,为什么即使从头开始构建SQL字符串烦恼呢?
高清genCase(X:字符串)= {
当($标签< = GT;点亮(X),$值),否则(0).alias(X)
}高清genAgg(F:柱= GT;柱)(X:字串)= F(山口(x)的)别名(x)的DF
。选择($ID:: countries.map(genCase):_ *)
.groupBy($ID)
.agg($ID.alias(虚拟),countries.map(genAgg(SUM)):_ *)
.drop(虚拟)I'm trying to convert a dataframe from long to wide as suggested at Pivot Spark Dataframe However, the SQL seems to misinterpret the Countries list as a variable from the table. The below are the messages I saw from the console and the sample data and codes from the above link. Anyone knows how to resolve the issues?
Messages from the scala console:
scala> val myDF1 = sqlc2.sql(query)
org.apache.spark.sql.AnalysisException: cannot resolve 'US' given input columns >id, tag, value;
id tag value 1 US 50 1 UK 100 1 Can 125 2 US 75 2 UK 150 2 Can 175 and I want: id US UK Can 1 50 100 125 2 75 150 175 I can create a list with the value I want to pivot and then create a string containing the sql query I need. val countries = List("US", "UK", "Can") val numCountries = countries.length - 1 var query = "select *, " for (i <- 0 to numCountries-1) { query += "case when tag = " + countries(i) + " then value else 0 end as " + countries(i) + ", " } query += "case when tag = " + countries.last + " then value else 0 end as " + countries.last + " from myTable" myDataFrame.registerTempTable("myTable") val myDF1 = sqlContext.sql(query)
解决方案Country codes are literals and should be enclosed in quotes otherwise SQL parser will treat these as the names of the columns:
val caseClause = countries.map( x => s"""CASE WHEN tag = '$x' THEN value ELSE 0 END as $x""" ).mkString(", ") val aggClause = countries.map(x => s"""SUM($x) AS $x""").mkString(", ") val query = s""" SELECT id, $aggClause FROM (SELECT id, $caseClause FROM myTable) tmp GROUP BY id""" sqlContext.sql(query)
Question is why even bother with building SQL strings from scratch?
def genCase(x: String) = { when($"tag" <=> lit(x), $"value").otherwise(0).alias(x) } def genAgg(f: Column => Column)(x: String) = f(col(x)).alias(x) df .select($"id" :: countries.map(genCase): _*) .groupBy($"id") .agg($"id".alias("dummy"), countries.map(genAgg(sum)): _*) .drop("dummy")
这篇关于在星火SQL案例,当语句表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!