Spark SQL中的OFFSET等效于什么? [英] What the equivalent of OFFSET in Spark SQL?
本文介绍了Spark SQL中的OFFSET等效于什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用Spark SQL,我得到了100行的结果集.我想获得从第6到15行开始的最终结果.在SQL中,我们使用OFFSET
跳过类似OFFSET 5 LIMIT 10
用于从第6到15行获取行的结果.在Spark SQL中,我如何实现相同的目的?
I got a result set of 100 rows using Spark SQL. I want to get final result starting from row number 6 to 15. In SQL we use OFFSET
to skip rows like OFFSET 5 LIMIT 10
is used to get rows from number 6 to 15. In Spark SQL, How can I achieve the same?
推荐答案
我猜SparkSQL不支持偏移量.因此,我将 id 用作过滤条件.每次,我只检索 N 个数据.
I guess SparkSQL does not support offset. So I use id as the filter condition. Each time, I only retrieve N data.
以下是我的示例代码:
sc = SparkContext()
sqlContext = SQLContext(sc)
df = sqlContext.read.format('com.databricks.spark.csv')\
.options(header='false', inferschema='true')\
.load('your.csv')
sqlContext.registerDataFrameAsTable(df, "table")
batch_size = 10 ** 5
res = sqlContext.sql("select min(C0), max(C0) from table).collect()
index = int(res[0]._c0) - 1
N_max = int(res[0]._c1)
while index < N_max:
prev = index
sql = "select C0, C1, C2, C3 from table \
where C0 > '%s' and C0 <= '%s' \
order by C0 limit %d" % (index, index+batch_size, batch_size)
res = sqlContext.sql(sql).collect()
# do something ...
if index < prev + batch_size:
index = prev + batch_size
这篇关于Spark SQL中的OFFSET等效于什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文