从spark HiveContext锁定配置单元表 [英] locking hive table from spark HiveContext

查看:281
本文介绍了从spark HiveContext锁定配置单元表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在为我们的数据仓库需求设置配置单元,并使用Spark进行处理,同时将配置单元用作存储。我们的文件非常小(<10KB),但数量巨大。要求是提供接近实时的数据。所以我的方法都在蜂巢上创建一个分区来指示它的CURRENT或PAST。保持将数据发布为当前数据,但在一定的时间间隔之后将其聚合并移至分区PAST。但是当移动操作正在进行时,我需要锁定表,因为它可能会给出不准确的数据。

We are setting up hive for our Data warehousing need and using spark for processing while hive as storage. Our files are really small (<10KB) but huge in numbers. Requirement is to provide data in near to realtime. So my approach each create a partition on hive to indicate its CURRENT or PAST. Keep publishing data in current but after certain interval aggregate it and move to partition PAST. But when move operation is going on I need to lock table, since it may give inaccurate data.

对于hive CLI没有问题。

for hive CLI there is no issue.

hive> LOCK TABLE t26013_75 exclusive;
OK
Time taken: 0.106 seconds

但是当我在火花上尝试时

But when I try same on spark

scala> val hiveContext = new HiveContext(sc)
16/04/07 07:14:55 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@723fadfe

scala> hiveContext.sql("LOCK TABLE ma.t26013_75 exclusive")
16/04/07 07:15:00 INFO parse.ParseDriver: Parsing command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:00 INFO parse.ParseDriver: Parse Completed
16/04/07 07:15:00 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.
16/04/07 07:15:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/07 07:15:01 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
16/04/07 07:15:01 INFO hive.metastore: Connected to metastore.
16/04/07 07:15:02 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO parse.ParseDriver: Parsing command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:02 INFO parse.ParseDriver: Parse Completed
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=parse start=1460027702353 end=1460027702784 duration=431 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Semantic Analysis Completed
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1460027702785 end=1460027702832 duration=47 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=compile start=1460027702328 end=1460027702841 duration=513 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Starting command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1460027702325 end=1460027702861 duration=536 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO lockmgr.DummyTxnManager: Concurrency mode is disabled, not creating a lock manager
16/04/07 07:15:02 ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: lock Table LockManager not specified
    at org.apache.hadoop.hive.ql.exec.DDLTask.lockTable(DDLTask.java:2880)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:405)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
    at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
    at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $line68.$read$$iwC$$iwC$$iwC.<init>(<console>:40)
    at $line68.$read$$iwC$$iwC.<init>(<console>:42)
    at $line68.$read$$iwC.<init>(<console>:44)
    at $line68.$read.<init>(<console>:46)
    at $line68.$read$.<init>(<console>:50)
    at $line68.$read$.<clinit>(<console>)
    at $line68.$eval$.<init>(<console>:7)
    at $line68.$eval$.<clinit>(<console>)
    at $line68.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

16/04/07 07:15:02 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1460027702841 end=1460027702880 duration=39 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1460027702880 end=1460027702880 duration=0 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 ERROR client.ClientWrapper: 
======================
HIVE FAILURE OUTPUT
======================
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified

======================
END HIVE FAILURE OUTPUT
======================

org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:349)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
    at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
    at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $iwC$$iwC$$iwC.<init>(<console>:40)
    at $iwC$$iwC.<init>(<console>:42)
    at $iwC.<init>(<console>:44)
    at <init>(<console>:46)
    at .<init>(<console>:50)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


推荐答案

我仔细观察了日志,并在添加以下内容后得到了修正:

I watched log more carefully and got fixed after add following

scala> hiveContext.setConf("hive.support.concurrency","true")

我不知道为什么它问道。我已经在hive / conf位置有hive-site.xml。

I do not know why it asks. I already have hive-site.xml in hive/conf location.

这可能是因为我在spark / conf的hive-site.xml中有以下条目

It may be because my hive-site.xml at spark/conf have just following entry

<configuration>
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
</property>

将来可能会看到请在spark / conf / hive-site.xml中添加此参数。

Will see, in future may be add this parameter at spark/conf/hive-site.xml as well

感谢您的宝贵时间。

这篇关于从spark HiveContext锁定配置单元表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆