Spark SQL 中 OFFSET 的等价物是什么? [英] What the equivalent of OFFSET in Spark SQL?

查看：42 发布时间：2021/11/14 21:48:58 apache-spark apache-spark-sql spark-dataframe

本文介绍了Spark SQL 中 OFFSET 的等价物是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 Spark SQL 得到了 100 行的结果集.我想获得从第 6 行到第 15 行的最终结果.在 SQL 中，我们使用 OFFSET 跳过行，例如 OFFSET 5 LIMIT 10 用于获取从第 6 行到第 6 行的行15. 在 Spark SQL 中，如何实现相同的功能?

I got a result set of 100 rows using Spark SQL. I want to get final result starting from row number 6 to 15. In SQL we use OFFSET to skip rows like OFFSET 5 LIMIT 10 is used to get rows from number 6 to 15. In Spark SQL, How can I achieve the same?

推荐答案

我猜 SparkSQL 不支持 offset.所以我使用 id 作为过滤条件.每次，我只检索 N 条数据.

I guess SparkSQL does not support offset. So I use id as the filter condition. Each time, I only retrieve N data.

以下是我的示例代码:

sc = SparkContext()  
sqlContext = SQLContext(sc)

df = sqlContext.read.format('com.databricks.spark.csv')\
        .options(header='false', inferschema='true')\
        .load('your.csv')
sqlContext.registerDataFrameAsTable(df, "table")

batch_size = 10 ** 5
res = sqlContext.sql("select min(C0), max(C0) from table).collect()
index = int(res[0]._c0) - 1
N_max = int(res[0]._c1)
while index < N_max:
    prev = index
    sql = "select C0, C1, C2, C3 from table \
            where C0 > '%s' and C0 <= '%s' \
            order by C0 limit %d" % (index, index+batch_size, batch_size)
    res = sqlContext.sql(sql).collect()
    # do something ...

    if index < prev + batch_size:
        index = prev + batch_size

这篇关于Spark SQL 中 OFFSET 的等价物是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark SQL 中 OFFSET 的等价物是什么? [英] What the equivalent of OFFSET in Spark SQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark SQL 中 OFFSET 的等价物是什么? [英] What the equivalent of OFFSET in Spark SQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭