如何在Apache中缓存Dataframe点燃 [英] How to cache Dataframe in Apache ignite

查看:109
本文介绍了如何在Apache中缓存Dataframe点燃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个代码来使用spark SQLContext JDBC连接来缓存RDBMS数据。一旦创建了Dataframe,我想用apache点燃来缓存reusltset,从而使其他应用程序能够使用结果集。

  object test 
{

def main(args:数组[b]
{

val配置=新配置()
val config =src / main / scala / config.xml

sparkConf = new SparkConf()。setAppName(test)。setMaster(local [*])
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark .sql.SQLContext(sc)
val sql_dump1 = sqlContext.read.format(jdbc)。option(url,jdbc URL)。选项(driver,com.mysql.jdbc。 (db,mysql_table_statement).option(user,username)。option(password,pass)。load()

val ic = new IgniteContext [Integer,Integer](sc,config)
$ b $ val sharedrdd = ic.fromCache(hbase_metadata)

//如何缓存sql_dump1数据框







$ b现在的问题是如何缓存一个数据帧,IgniteRDD有savepairs方法,但它接受ke y和值作为RDD [Integer],但我有一个数据帧,即使我将其转换为RDD,我只会获得RDD [Row]。由Integer的RDD组成的savepairs方法似乎更具体什么如果我有一串RDD作为值?

解决方案

没有理由将 DataFrame 在Ignite缓存(共享RDD)中,因为您不会从中受益太多:至少您无法执行 DataFrame上点燃SQL




  • 我会建议您执行以下操作:

    提供 CacheStore 实施用于hbase_metadata缓存,该缓存将预加载来自底层数据库的所有数据。然后,您可以使用 Ignite.loadCache 方法将所有数据预加载到缓存中。 此处您可以找到一个关于如何将JDBC持久性存储与Ignite缓存(共享RDD)一起使用的示例






或者,您可以像操作一样获取sql_dump1 ,遍历每行并使用 IgniteRDD.savePairs 方法将每行单独存储在共享RDD中。完成此操作后,您可以使用上述相同的Ignite Shared RDD SQL来查询数据。


I am writing a code to cache RDBMS data using spark SQLContext JDBC connection. Once a Dataframe is created I want to cache that reusltset using apache ignite thereby making other applications to make use of the resultset. Here is the code snippet.

object test
{

  def main(args:Array[String])
  {

      val configuration = new Configuration()
      val config="src/main/scala/config.xml"

      val sparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
      val sc=new SparkContext(sparkConf)
      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      val sql_dump1=sqlContext.read.format("jdbc").option("url", "jdbc URL").option("driver", "com.mysql.jdbc.Driver").option("dbtable", mysql_table_statement).option("user", "username").option("password", "pass").load()

      val ic = new IgniteContext[Integer, Integer](sc, config)

      val sharedrdd = ic.fromCache("hbase_metadata")

      //How to cache sql_dump1 dataframe

  }
}

Now the question is how to cache a dataframe, IgniteRDD has savepairs method but it accepts key and value as RDD[Integer], but I have a dataframe even if I convert that to RDD i would only be getting RDD[Row]. The savepairs method consisting of RDD of Integer seems to be more specific what if I have a string of RDD as value? Is it good to cache dataframe or any other better approach to cache the resultset.

解决方案

There is no reason to store DataFrame in an Ignite cache (shared RDD) since you won't benefit from it too much: at least you won't be able to execute Ignite SQL over the DataFrame.

I would suggest doing the following:

  • provide CacheStore implementation for hbase_metadata cache that will preload all the data from your underlying database. Then you can preload all the data into the cache using Ignite.loadCache method. Here you may find an example on how to use JDBC persistent stores along with Ignite cache (shared RDD)

Alternatively you can get sql_dump1 as you're doing, iterate over each row and store each row individually in the shared RDD using IgniteRDD.savePairs method. After this is done you can query over data using the same Ignite Shared RDD SQL mentioned above.

这篇关于如何在Apache中缓存Dataframe点燃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆