火花数据框＆QUOT的使用;为＆QUOT;方法 [英] Usage of spark DataFrame "as" method

查看：301 发布时间：2016/5/22 15:15:43 scala apache-spark apache-spark-sql

本文介绍了火花数据框＆QUOT的使用;为＆QUOT;方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有

 为高清（别名：字符串）：数据帧
    返回与别名设置一个新的数据帧。
    以来
        1.3.0

什么是这种方法的目的是什么？它是如何使用的？能有一个例子吗？

我还没有成功地在网上找到有关此方法的任何和文档是pretty不存在。我还没有设法使任何类型的使用这种方法的别名。

解决方案

星火＆LT; = 1.5

有或多或少相当于SQL表的别名：

  SELECT *
FROM表AS别名;

用法示例改编自PySpark的 别名文档：

 进口org.apache.spark.sql.functions.col
案例类人（名称：字符串，年龄：智力）VAL DF = sqlContext.createDataFrame（
    人（爱丽丝，2）::人（鲍勃，5）::无）VAL df_as1 = df.as（DF1）
VAL df_as2 = df.as（DF2）
VAL joined_df = df_as1.join（
    df_as2，列（df1.name）=== COL（df2.name），内部）
joined_df.select（
    COL（df1.name），列（df2.name），列（df2.age））。秀

输出：

  + ----- + ----- + --- +
|名称|名称|年龄|
+ ----- + ----- + --- +
|爱丽丝|爱丽丝| 2 |
|鲍勃|鲍勃| 5 |
+ ----- + ----- + --- +

同样的事情，使用SQL查询：

  df.registerTempTable（DF）
sqlContext.sql（选择df1.name，df2.name，df2.age
                  从DF AS DF1 JOIN DF AS DF2
                  开df1.name == df2.name）

什么是这种方法的目的是什么？

pretty避免很多不明确的列引用。
星火1.6 +
还有一个新的为[U]（隐为arg0：恩codeR [U]）：数据集[U] 这是用来转换数据帧给定类型的的DataSet 。例如：
  df.as [人]
 
I am looking at spark.sql.DataFrame documentation.

There is
def as(alias: String): DataFrame
    Returns a new DataFrame with an alias set.
    Since
        1.3.0 
What is the purpose of this method? How is it used? Can there be an example?

I have not managed to find anything about this method online and the documentation is pretty non-existent. I have not managed to make any kind of alias using this method.
解决方案
Spark <= 1.5

It is more or less equivalent to SQL table aliases:
SELECT *
FROM table AS alias;
Example usage adapted from PySpark alias documentation:
import org.apache.spark.sql.functions.col
case class Person(name: String, age: Int)

val df = sqlContext.createDataFrame(
    Person("Alice", 2) :: Person("Bob", 5) :: Nil)

val df_as1 = df.as("df1")
val df_as2 = df.as("df2")
val joined_df = df_as1.join(
    df_as2, col("df1.name") === col("df2.name"), "inner")
joined_df.select(
    col("df1.name"), col("df2.name"), col("df2.age")).show
Output:
+-----+-----+---+
| name| name|age|
+-----+-----+---+
|Alice|Alice|  2|
|  Bob|  Bob|  5|
+-----+-----+---+
Same thing using SQL query:
df.registerTempTable("df")
sqlContext.sql("""SELECT df1.name, df2.name, df2.age
                  FROM df AS df1 JOIN df AS df2
                  ON df1.name == df2.name""")
What is the purpose of this method?

Pretty much avoiding ambiguous column references.

Spark 1.6+

There is also a new as[U](implicit arg0: Encoder[U]): Dataset[U] which is used to convert a DataFrame to a DataSet of a given type. For example:
df.as[Person]
这篇关于火花数据框＆QUOT的使用;为＆QUOT;方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

火花数据框＆QUOT的使用;为＆QUOT;方法 [英] Usage of spark DataFrame "as" method

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

火花数据框＆QUOT的使用;为＆QUOT;方法 [英] Usage of spark DataFrame &quot;as&quot; method

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

火花数据框＆QUOT的使用;为＆QUOT;方法 [英] Usage of spark DataFrame "as" method

登录关闭