是否可以在 spark sql 中以编程方式对列进行别名? [英] Is it possible to alias columns programmatically in spark sql?

查看:69
本文介绍了是否可以在 spark sql 中以编程方式对列进行别名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 spark SQL(也许只有 HiveQL)中可以做到:

In spark SQL (perhaps only HiveQL) one can do:

select sex, avg(age) as avg_age
from humans
group by sex

这将产生一个 DataFrame,其中包含名为 "sex""avg_age" 的列.

which would result in a DataFrame with columns named "sex" and "avg_age".

如何在不使用文本 SQL 的情况下将 avg(age) 别名为 "avg_age"?

How can avg(age) be aliased to "avg_age" without using textual SQL?

在 zero323 的回答之后,我需要添加以下约束:

After zero323 's answer, I need to add the constraint that:

要重命名的列的名称可能未知/无法保证,甚至无法寻址.在文本 SQL 中,使用select EXPR as NAME"消除了对 EXPR 具有中间名称的要求.在上面的示例中也是如此,其中avg(age)"可以获得各种自动生成的名称(在 spark 版本和 sql-context 后端之间也有所不同).

The column-to-be-renamed's name may not be known/guaranteed or even addressable. In textual SQL, using "select EXPR as NAME" removes the requirement to have an intermediate name for EXPR. This is also the case in the example above, where "avg(age)" could get a variety of auto-generated names (which also vary among spark releases and sql-context backends).

推荐答案

结果是 def toDF(colNames: String*): DataFrame 正是如此.从 2.11.7 文档粘贴:

Turns out def toDF(colNames: String*): DataFrame does exactly that. Pasting from 2.11.7 documentation:

def toDF(colNames: String*): DataFrame

Returns a new DataFrame with columns renamed. This can be quite
convenient in conversion from a RDD of tuples into a DataFrame
with meaningful names. For example:

    val rdd: RDD[(Int, String)] = ...
    rdd.toDF()  // this implicit conversion creates a DataFrame
                // with column name _1 and _2
    rdd.toDF("id", "name")  // this creates a DataFrame with
                            // column name "id" and "name"

这篇关于是否可以在 spark sql 中以编程方式对列进行别名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆