如何刷新表并同时进行? [英] How to refresh a table and do it concurrently?
问题描述
我正在使用Spark Streaming 2.1.我想定期刷新一些缓存的表(由Spark提供的镶木地板,MySQL或用户定义的数据源之类的DataSource加载).
I'm using Spark Streaming 2.1. I'd like to refresh some cached table (loaded by spark provided DataSource like parquet, MySQL or user-defined data sources) periodically.
-
如何刷新表格?
how to refresh the table?
假设我加载了一些表
spark.read.format("").load().createTempView("my_table")
它也被缓存
spark.sql("cache table my_table")
用下面的代码刷新表是否足够? 该表将在接下来加载,它将被自动缓存
is it enough with following code to refresh the table, and when the table is loaded next, it will automatically be cached
spark.sql("refresh table my_table")
还是我必须手动操作
spark.table("my_table").unpersist
spark.read.format("").load().createOrReplaceTempView("my_table")
spark.sql("cache table my_table")
spark.table("my_table").unpersist
spark.read.format("").load().createOrReplaceTempView("my_table")
spark.sql("cache table my_table")
同时刷新表是否安全?
通过并发,我的意思是使用ScheduledThreadPoolExecutor
进行除主线程之外的刷新工作.
By concurrent I mean using ScheduledThreadPoolExecutor
to do the refresh work apart from the main thread.
当我在表上调用刷新时,如果Spark使用缓存的表会发生什么?
What will happen if the Spark is using the cached table when I call refresh on the table?
推荐答案
在Spark 2.2.0中,他们引入了刷新表元数据的功能,如果该表元数据是由hive或某些外部工具更新的.
In Spark 2.2.0 they have introduced feature of refreshing the metadata of a table if it was updated by hive or some external tools.
您可以使用API来实现,
You can achieve it by using the API,
spark.catalog.refreshTable("my_table")
此API将更新该表的元数据以使其保持一致.
This API will update the metadata for that table to keep it consistent.
这篇关于如何刷新表并同时进行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!