刷新缓存的数据框? [英] Refresh cached dataframe?

查看:70
本文介绍了刷新缓存的数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个小的配置单元表(大约50000条记录),每天更新一次.

We have a small hive table (around 50000 records) which is updated once, daily.

我们为此表缓存了一个数据帧,并与Spark Streaming数据结合在一起.在基础配置单元中加载新数据时,我们如何刷新数据帧?

We have a cached Data-frame for this table and is being joined with spark streaming data. How do we refresh the data frame when new data is loaded in base hive?

DataFrame tempApp = hiveContext.table("emp_data");

//Get Max Load-Date
Date max_date =  max_date = tempApp.select(max("load_date")).collect()[0].getDate(0);

//Get data for latest date and cache. This will be used to join with stream data.
DataFrame emp= hiveContext.table("emp_data").where("load_date='" + max_date + "'").cache();

// Get message from Kafka Stream
JavaPairInputDStream<String, String> messages  = KafkaUtils.createDirectStream(....);

JavaDStream<MobileFlowRecord> rddMobileFlorRecs = messages.map(Record::parseFromMessage);

kafkaRecs.foreachRDD(rdd->{DataFrame recordDataFrame = hiveContext.createDataFrame(rdd, Record.class);

DataFrame  joinedDataSet = recordDataFrame.join(emp, 
recordDataFrame.col("application").equalTo(app.col("emp_id"));
joinedDataSet. <Do furthur processing>
});

推荐答案

您可以手动执行.像这样:

You can do it manually. Something like this:

DataFrame refresh(DataFrame orig) {
    if (orig != null) {
        orig.unpersist();
    }
    DataFrame res = get the dataframe as you normally would
    res.cache()
    return res

现在每天或在您希望刷新时致电一次:

Now call this once a day or whenever you wish to refresh like this:

   DataFrame join_df = refresh(join_df)

这基本上是对先前版本进行持久化(删除缓存),读取新版本,然后对其进行缓存.因此,实际上,数据帧是刷新的.

What this basically does is unpersists (removes caching) of a previous version, reads the new one and then caches it. So in practice the dataframe is refreshed.

您应该注意,由于缓存是惰性的,因此只有在刷新后第一次使用该数据帧之后,该数据帧才会保留在内存中.

You should note that the dataframe would be persisted in memory only after the first time it is used after the refresh as caching is lazy.

这篇关于刷新缓存的数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆