是否可以在Spark sql中以编程方式对列进行别名? [英] Is it possible to alias columns programmatically in spark sql?

查看:1089
本文介绍了是否可以在Spark sql中以编程方式对列进行别名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在spark SQL中(也许只有HiveQL)可以做到:

In spark SQL (perhaps only HiveQL) one can do:

select sex, avg(age) as avg_age
from humans
group by sex

这将导致DataFrame包含名为"sex""avg_age"的列.

which would result in a DataFrame with columns named "sex" and "avg_age".

如何在不使用文本SQL的情况下将avg(age)别名为"avg_age"?

How can avg(age) be aliased to "avg_age" without using textual SQL?

在zero323的答案之后,我需要添加以下约束:

After zero323 's answer, I need to add the constraint that:

要重命名的列的名称可能不知道/无法保证,甚至无法寻址.在文本SQL中,使用选择EXPR作为名称"消除了对EXPR具有中间名称的要求.在上面的示例中也是如此,其中"avg(age)"可以获取各种自动生成的名称(这些名称在spark版本和sql-context后端之间也有所不同).

The column-to-be-renamed's name may not be known/guaranteed or even addressable. In textual SQL, using "select EXPR as NAME" removes the requirement to have an intermediate name for EXPR. This is also the case in the example above, where "avg(age)" could get a variety of auto-generated names (which also vary among spark releases and sql-context backends).

推荐答案

显示

Turns out def toDF(colNames: String*): DataFrame does exactly that. Pasting from 2.11.7 documentation:

def toDF(colNames: String*): DataFrame

Returns a new DataFrame with columns renamed. This can be quite
convenient in conversion from a RDD of tuples into a DataFrame
with meaningful names. For example:

    val rdd: RDD[(Int, String)] = ...
    rdd.toDF()  // this implicit conversion creates a DataFrame
                // with column name _1 and _2
    rdd.toDF("id", "name")  // this creates a DataFrame with
                            // column name "id" and "name"

这篇关于是否可以在Spark sql中以编程方式对列进行别名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆