火花数据框&QUOT的使用;为"方法 [英] Usage of spark DataFrame "as" method
问题描述
我在看 spark.sql.DataFrame 文档。
有
为高清(别名:字符串):数据帧
返回与别名设置一个新的数据帧。
以来
1.3.0
什么是这种方法的目的是什么?它是如何使用的?能有一个例子吗?
我还没有成功地在网上找到有关此方法的任何和文档是pretty不存在。我还没有设法使任何类型的使用这种方法的别名。
星火< = 1.5
有或多或少相当于SQL表的别名:
SELECT *
FROM表AS别名;
用法示例改编自PySpark的 别名
文档:
进口org.apache.spark.sql.functions.col
案例类人(名称:字符串,年龄:智力)VAL DF = sqlContext.createDataFrame(
人(爱丽丝,2)::人(鲍勃,5)::无)VAL df_as1 = df.as(DF1)
VAL df_as2 = df.as(DF2)
VAL joined_df = df_as1.join(
df_as2,列(df1.name)=== COL(df2.name),内部)
joined_df.select(
COL(df1.name),列(df2.name),列(df2.age))。秀
输出:
+ ----- + ----- + --- +
|名称|名称|年龄|
+ ----- + ----- + --- +
|爱丽丝|爱丽丝| 2 |
|鲍勃|鲍勃| 5 |
+ ----- + ----- + --- +
同样的事情,使用SQL查询:
df.registerTempTable(DF)
sqlContext.sql(选择df1.name,df2.name,df2.age
从DF AS DF1 JOIN DF AS DF2
开df1.name == df2.name)
什么是这种方法的目的是什么?
块引用>pretty避免很多不明确的列引用。
星火1.6 +
还有一个新的
为[U](隐为arg0:恩codeR [U]):数据集[U]
这是用来转换数据帧
给定类型的的DataSet
。例如:df.as [人]
I am looking at spark.sql.DataFrame documentation.
There is
def as(alias: String): DataFrame Returns a new DataFrame with an alias set. Since 1.3.0
What is the purpose of this method? How is it used? Can there be an example?
I have not managed to find anything about this method online and the documentation is pretty non-existent. I have not managed to make any kind of alias using this method.
解决方案Spark <= 1.5
It is more or less equivalent to SQL table aliases:
SELECT * FROM table AS alias;
Example usage adapted from PySpark
alias
documentation:import org.apache.spark.sql.functions.col case class Person(name: String, age: Int) val df = sqlContext.createDataFrame( Person("Alice", 2) :: Person("Bob", 5) :: Nil) val df_as1 = df.as("df1") val df_as2 = df.as("df2") val joined_df = df_as1.join( df_as2, col("df1.name") === col("df2.name"), "inner") joined_df.select( col("df1.name"), col("df2.name"), col("df2.age")).show
Output:
+-----+-----+---+ | name| name|age| +-----+-----+---+ |Alice|Alice| 2| | Bob| Bob| 5| +-----+-----+---+
Same thing using SQL query:
df.registerTempTable("df") sqlContext.sql("""SELECT df1.name, df2.name, df2.age FROM df AS df1 JOIN df AS df2 ON df1.name == df2.name""")
What is the purpose of this method?
Pretty much avoiding ambiguous column references.
Spark 1.6+
There is also a new
as[U](implicit arg0: Encoder[U]): Dataset[U]
which is used to convert aDataFrame
to aDataSet
of a given type. For example:df.as[Person]
这篇关于火花数据框&QUOT的使用;为&QUOT;方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!