如何按顺序运行火花作业? [英] How to run spark job sequentially?

查看:22
本文介绍了如何按顺序运行火花作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用例,其中有一个包含 SQL 查询序列的列的表.

I have a use case where in there is a table with one column which has sequence of SQL queries.

我想在 spark 程序中一个接一个地运行这些 SQL 查询,而不是并行运行.这是因为第 N 行的 SQL 查询将依赖于第 (N-1) 行.

I want to run these SQL queries in spark program one after the other and not in parallel. This is because SQL query on Nth row will have dependency on (N-1)th row.

现在由于这个限制,我想一个接一个地顺序执行这个,而不是并行执行.我怎样才能做到这一点?

Now due to this constraint I want to execute this sequentially one after the other rather than in parallel. How can I achieve this?

推荐答案

我认为你可以使用这样的方法:

I think you could use something like this:

val listOfQueryRows = spark.sqlContext.table("foo_db.table_of_queries")
  .select(col("sql_query"))
  .orderBy(col("query_index"))
  .collectAsList()

listOfQueryRows.forEach(queryRow => spark.sql(queryRow.getString(0)))

这将选择 sql_query 列中的所有查询,按照 query_index 中给定的索引对它们进行排序,并将它们收集在列表 listOfQueryRows 中> 在驱动程序中.然后对列表进行迭代,依次为每个返回的行执行查询.

This will select all your queries in the sql_query column, order them by the index given in the query_index and collects them in the list listOfQueryRows in the driver. The list is then iterated over sequentially executing the query for each returned row.

这篇关于如何按顺序运行火花作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆