从spark数据框获取特定行 [英] get specific row from spark dataframe

查看:677
本文介绍了从spark数据框获取特定行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在scala spark数据帧中是否有df[100, c("column")]的替代方案.我想从一列Spark数据框中选择特定的行. 例如上面R等价代码中的100th

Is there any alternative for df[100, c("column")] in scala spark data frames. I want to select specific row from a column of spark data frame. for example 100th row in above R equivalent code

推荐答案

首先,您必须了解DataFrames是分布式的,这意味着您无法以典型的过程方式访问它们,您必须先进行分析.虽然,您正在询问Scala,但我建议您阅读

Firstly, you must understand that DataFrames are distributed, that means you can't access them in a typical procedural way, you must do an analysis first. Although, you are asking about Scala I suggest you to read the Pyspark Documentation, because it has more examples than any of the other documentations.

但是,继续我的解释,我将使用RDD API的某些方法,因为所有DataFrame都有一个RDD作为属性.请看下面的示例,并注意如何记录第二条记录.

However, continuing with my explanation, I would use some methods of the RDD API cause all DataFrames have one RDD as attribute. Please, see my example bellow, and notice how I take the 2nd record.

df = sqlContext.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
myIndex = 1
values = (df.rdd.zipWithIndex()
            .filter(lambda ((l, v), i): i == myIndex)
            .map(lambda ((l,v), i): (l, v))
            .collect())

print(values[0])
# (u'b', 2)

希望有人能用更少的步骤提供另一个解决方案.

Hopefully, someone gives another solution with fewer steps.

这篇关于从spark数据框获取特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆