将集合函数某种类型的每一列 [英] Applying aggregate function to every column of certain type

查看：145 发布时间：2016/5/22 15:37:24 scala apache-spark apache-spark-sql

本文介绍了将集合函数某种类型的每一列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，我写了一篇关于如何在我的数据帧，平均每FloatType列像这样的基础上（即不工作）：

So I wrote the basis (that doesnt work) on how to average every FloatType column in my data frame like so:

val descript = df.dtypes

  var decimalArr = new ListBuffer[String]()
  for(i <- 0 to (descript.length - 1)) {
    if(descript(i)._2 == "FloatType") {
      decimalArr += descript(i)._1
    }
  }

  //Build Statsitical Arguments for DataFrame Pass
  var averageList = new ListBuffer[String]()
  for(i <- 0 to (decimalArr.length - 1)){
    averageList += "avg(" + '"' + decimalArr(i) + '"' + ")"
  }

  //sample statsitical call
  val sampAvg = df.agg(averageList).show

这得到由averageList产生的例子是：

The example that gets produced by averageList is:

ListBuffer(avg("offer_id"), avg("decision_id"), avg("offer_type_cd"), avg("promo_id"), avg("pymt_method_type_cd"), avg("cs_result_id"), avg("cs_result_usage_type_cd"), avg("rate_index_type_cd"), avg("sub_product_id"))

清晰的问题是，VAL sampAvg = df.agg（averageList）.show不允许listBuffer作为输入。因此，即使把它的ToString不工作就都想org.apache.spark.sql.Column *。有谁知道一种方法，我可以做我想要的方式的东西。

The clear problem is that val sampAvg = df.agg(averageList).show does not allow listBuffer as the input. So even bringing it .toString doesnt work it wants org.apache.spark.sql.Column*. Does anyone know a way I can do something in the manner I am trying.

旁注我在星火1.3

推荐答案

您可以先建总前pressions名单

You can first build a list of the aggregate expressions

import org.apache.spark.sql.functions.{col, avg, lit}

val exprs = df.dtypes
  .filter(_._2 == "DoubleType")
  .map(ct => avg(col(ct._1))).toList

和两种模式匹配

exprs match {
  case h::t => df.agg(h, t:_*)
  case _ => sqlContext.emptyDataFrame
}

或使用虚拟列

df.agg(lit(1).alias("_dummy"), exprs: _*).drop("_dummy")

如果你想使用多种功能可以 flatMap 显式：

If you want to use multiple functions you can flatMap either explicitly:

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{avg, min, max}

val funs: List[(String => Column)] = List(min, max, avg)

val exprs: Array[Column] = df.dtypes 
   .filter(_._2 == "DoubleType")
   .flatMap(ct => funs.map(fun => fun(ct._1)))

或使用COM prehension：

or using for comprehension:

val exprs: Array[Column] = for {
    cname <-  df.dtypes.filter(_._2 == "DoubleType").map(_._1)
    fun <- funs
} yield fun(cname)

转换 exprs 到列表如果你想使用模式匹配的方法。

Convert exprs to List if you want to use pattern match approach.

这篇关于将集合函数某种类型的每一列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将集合函数某种类型的每一列 [英] Applying aggregate function to every column of certain type

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将集合函数某种类型的每一列 [英] Applying aggregate function to every column of certain type

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭