在Hive中加载SparkR数据帧 [英] loading SparkR data frame in Hive

查看:92
本文介绍了在Hive中加载SparkR数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要加载在SparkR中创建的DataFrame才能加载到Hive中.

I need to load the DataFrame created in SparkR to be loaded in Hive.

#created a dataframe df_test
df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015)))

#initialized the Hive context
>sc <- sparkR.init()
>hiveContext <- sparkRHive.init(sc)

#used the saveAsTable fn to save dataframe "df_test" in hive table named "table_hive"
>saveAsTable(df_test, "table_hive")

16/08/24 23:08:36错误RBackendHandler:13上的saveAsTable失败invokeJava(isStatic = FALSE,objId $ id,methodName,...)中的错误:java.lang.RuntimeException:使用SQLContext创建的表必须是TEMPORARY.请改用HiveContext.在scala.sys.package $ .error(package.scala:27)在org.apache.spark.sql.execution.SparkStrategies $ DDLStrategy $ .apply(SparkStrategies.scala:392)在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)在org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)在org.apache.spark.sql.execution.QueryExecution.sparkPlan $ lzycompute(QueryExecution.scala:47)在org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)在org.apache.spark.sql.execution.QueryExecution.executedPlan $ lzycompute(QueryExecution.scala:52)处在org.apache.spark.sql.execution.QueryExecution.exectedPlan(QueryExecution.scala:52)在org.apache.spark.sql.execution

16/08/24 23:08:36 ERROR RBackendHandler: saveAsTable on 13 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead. at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.SparkStrategies$DDLStrategy$.apply(SparkStrategies.scala:392) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.execution

引发以上错误.请帮助.

Throws the above error. Kindly help.

推荐答案

在范围内仅包含​​ HiveContext 是不够的.每个数据帧都绑定到特定的 SQLContext / SparkSession 实例,并且显然创建的 df_test 具有与 hiveContext 不同的上下文

Having HiveContext in scope is not enough. Each data frame is bound to a specific SQLContext / SparkSession instance and df_test is clearly created with different context than hiveContext

让我们用一个例子来说明这一点:

Lets illustrate that with an example:

 Welcome to
    ____              __ 
   / __/__  ___ _____/ /__ 
  _\ \/ _ \/ _ `/ __/  '_/ 
 /___/ .__/\_,_/_/ /_/\_\   version  1.6.1 
    /_/ 


 Spark context is available as sc, SQL context is available as sqlContext
> library(magrittr)
> createDataFrame(sqlContext, mtcars) %>% saveAsTable("foo")
16/08/24 20:22:13 ERROR RBackendHandler: saveAsTable on 22 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
  java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.
    at scala.sys.package$.error(package.scala:27)
    at org.apache.spark.sql.execution.SparkStrategies$DDLStrategy$.apply(SparkStrategies.scala:392)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
    at org.apache.spark.sql.execu
>
> hiveContext <- sparkRHive.init(sc)
> createDataFrame(hiveContext, mtcars) %>% saveAsTable("foo")
NULL

这篇关于在Hive中加载SparkR数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆