将集合函数某种类型的每一列 [英] Applying aggregate function to every column of certain type
问题描述
所以,我写了一篇关于如何在我的数据帧,平均每FloatType列像这样的基础上(即不工作):
So I wrote the basis (that doesnt work) on how to average every FloatType column in my data frame like so:
val descript = df.dtypes
var decimalArr = new ListBuffer[String]()
for(i <- 0 to (descript.length - 1)) {
if(descript(i)._2 == "FloatType") {
decimalArr += descript(i)._1
}
}
//Build Statsitical Arguments for DataFrame Pass
var averageList = new ListBuffer[String]()
for(i <- 0 to (decimalArr.length - 1)){
averageList += "avg(" + '"' + decimalArr(i) + '"' + ")"
}
//sample statsitical call
val sampAvg = df.agg(averageList).show
这得到由averageList产生的例子是:
The example that gets produced by averageList is:
ListBuffer(avg("offer_id"), avg("decision_id"), avg("offer_type_cd"), avg("promo_id"), avg("pymt_method_type_cd"), avg("cs_result_id"), avg("cs_result_usage_type_cd"), avg("rate_index_type_cd"), avg("sub_product_id"))
清晰的问题是,VAL sampAvg = df.agg(averageList).show不允许listBuffer作为输入。因此,即使把它的ToString不工作就都想org.apache.spark.sql.Column *。有谁知道一种方法,我可以做我想要的方式的东西。
The clear problem is that val sampAvg = df.agg(averageList).show does not allow listBuffer as the input. So even bringing it .toString doesnt work it wants org.apache.spark.sql.Column*. Does anyone know a way I can do something in the manner I am trying.
旁注我在星火1.3
推荐答案
您可以先建总前pressions名单
You can first build a list of the aggregate expressions
import org.apache.spark.sql.functions.{col, avg, lit}
val exprs = df.dtypes
.filter(_._2 == "DoubleType")
.map(ct => avg(col(ct._1))).toList
和两种模式匹配
exprs match {
case h::t => df.agg(h, t:_*)
case _ => sqlContext.emptyDataFrame
}
或使用虚拟列
df.agg(lit(1).alias("_dummy"), exprs: _*).drop("_dummy")
如果你想使用多种功能可以 flatMap
显式:
If you want to use multiple functions you can flatMap
either explicitly:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{avg, min, max}
val funs: List[(String => Column)] = List(min, max, avg)
val exprs: Array[Column] = df.dtypes
.filter(_._2 == "DoubleType")
.flatMap(ct => funs.map(fun => fun(ct._1)))
或使用COM prehension:
or using for comprehension:
val exprs: Array[Column] = for {
cname <- df.dtypes.filter(_._2 == "DoubleType").map(_._1)
fun <- funs
} yield fun(cname)
转换 exprs
到列表
如果你想使用模式匹配的方法。
Convert exprs
to List
if you want to use pattern match approach.
这篇关于将集合函数某种类型的每一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!