从spark数据框获取特定行 [英] get specific row from spark dataframe

查看：677 发布时间：2020/9/4 3:26:22 apache-spark apache-spark-sql

本文介绍了从spark数据框获取特定行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在scala spark数据帧中是否有df[100, c("column")]的替代方案.我想从一列Spark数据框中选择特定的行. 例如上面R等价代码中的100th行

Is there any alternative for df[100, c("column")] in scala spark data frames. I want to select specific row from a column of spark data frame. for example 100th row in above R equivalent code

推荐答案

首先，您必须了解DataFrames是分布式的，这意味着您无法以典型的过程方式访问它们，您必须先进行分析.虽然，您正在询问Scala，但我建议您阅读

Firstly, you must understand that DataFrames are distributed, that means you can't access them in a typical procedural way, you must do an analysis first. Although, you are asking about Scala I suggest you to read the Pyspark Documentation, because it has more examples than any of the other documentations.

但是，继续我的解释，我将使用RDD API的某些方法，因为所有DataFrame都有一个RDD作为属性.请看下面的示例，并注意如何记录第二条记录.

However, continuing with my explanation, I would use some methods of the RDD API cause all DataFrames have one RDD as attribute. Please, see my example bellow, and notice how I take the 2nd record.

df = sqlContext.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "name"])
myIndex = 1
values = (df.rdd.zipWithIndex()
            .filter(lambda ((l, v), i): i == myIndex)
            .map(lambda ((l,v), i): (l, v))
            .collect())

print(values[0])
# (u'b', 2)

希望有人能用更少的步骤提供另一个解决方案.

Hopefully, someone gives another solution with fewer steps.

这篇关于从spark数据框获取特定行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从spark数据框获取特定行 [英] get specific row from spark dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从spark数据框获取特定行 [英] get specific row from spark dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭