如何在Spark中检索DataFrame的别名 [英] How can I retrieve the alias for a DataFrame in Spark

查看:454
本文介绍了如何在Spark中检索DataFrame的别名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spark 2.0.2.我有一个上面带有别名的DataFrame,我希望能够检索它.下面是为什么我想要一个简化的示例.

I'm using Spark 2.0.2. I have a DataFrame that has an alias on it, and I'd like to be able to retrieve that. A simplified example of why I'd want that is below.

def check(ds: DataFrame) = {
   assert(ds.count > 0, s"${df.getAlias} has zero rows!")    
}

上面的代码当然会失败,因为DataFrame没有 getAlias 函数.有办法吗?

The above code of course fails because DataFrame has no getAlias function. Is there a way to do this?

推荐答案

您可以尝试类似的方法,但我不会声称它受到支持:

You can try something like this but I wouldn't go so far to claim it is supported:

  • 火花< 2.1:

  • Spark < 2.1:

import org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias
import org.apache.spark.sql.Dataset

def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
  case SubqueryAlias(alias, _) => Some(alias)
  case _ => None
}

  • Spark 2.1 +:

  • Spark 2.1+:

    def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
      case SubqueryAlias(alias, _, _) => Some(alias)
      case _ => None
    }
    

  • 示例用法:

    val plain = Seq((1, "foo")).toDF
    getAlias(plain)
    

    Option[String] = None
    

    val aliased = plain.alias("a dataset")
    getAlias(aliased)
    

    Option[String] = Some(a dataset)
    

    这篇关于如何在Spark中检索DataFrame的别名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆