Spark SQL 2.1.1 Thrift服务器-无法将源HDFS移动到目标 [英] Spark sql 2.1.1 thrift server - unable to move source hdfs to target
问题描述
与此问题有关[在yyy中选择*创建表xxx有时会出错
]
尝试在您的 这适用于从1.6.2升级到2.1.1且CTAS也存在相同问题的客户.在我们的开发集群上,这样做可以避免您遇到特定的错误,但是我们仍然遇到一些HDFS权限问题. 希望这会有所帮助. It is related to this question [create table xxx as select * from yyy sometimes get error ]1 When using spark thrift server, execute multiple statement like The full error stack trace: This is a normal This is the fail one, after some
Try setting This worked for a customer who upgraded from 1.6.2 to 2.1.1 and who had that same problem with CTAS. On our dev cluster, doing this got us past your particular error, but we still have some HDFS permission issues we are working through. Hope this helps. 这篇关于Spark SQL 2.1.1 Thrift服务器-无法将源HDFS移动到目标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!hive-site.xml
中像这样设置hive.exec.staging-dir
:<property>
<name>hive.exec.stagingdir</name>
<value>/tmp/hive/spark-${user.name}</value>
</property>
create table xxx as select * from yyy
, only first time will success, later tries will always fail, due to java.io.IOException: Filesystem closed
, or doAs
problems.17/05/29 08:44:53 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:92)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2892)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1640)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:676)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268)
at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:675)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:768)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 40 more
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:798)
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2966)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
... 59 more
17/05/29 08:44:53 ERROR thriftserver.SparkExecuteStatementOperation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000;
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
create table select as
log17/05/29 08:42:30 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/05/29 08:42:30 INFO scheduler.DAGScheduler: ResultStage 1 (run at AccessController.java:0) finished in 2.079 s
17/05/29 08:42:30 INFO scheduler.DAGScheduler: Job 1 finished: run at AccessController.java:0, took 2.100557 s
17/05/29 08:42:30 INFO metastore.HiveMetaStore: 2: get_table : db=task tbl=task_106
17/05/29 08:42:30 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_106
17/05/29 08:42:30 INFO metastore.HiveMetaStore: 2: get_table : db=task tbl=task_106
17/05/29 08:42:30 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_106
17/05/29 08:42:30 INFO metadata.Hive: Replacing src:hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_106/.hive-staging_hive_2017-05-29_08-42-26_232_2514893773205547001-1/-ext-10000/part-00000, dest: hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_106/part-00000, Status:true
17/05/29 08:42:30 INFO metadata.Hive: Replacing src:hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_106/.hive-staging_hive_2017-05-29_08-42-26_232_2514893773205547001-1/-ext-10000/part-00001, dest: hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_106/part-00001, Status:true
get_table
, it is executing some drop_table
, and then cause the Filesystem.close
, finally unable to move source
17/05/29 08:42:50 INFO cluster.YarnScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
17/05/29 08:42:50 INFO scheduler.DAGScheduler: ResultStage 6 (run at AccessController.java:0) finished in 2.567 s
17/05/29 08:42:50 INFO scheduler.DAGScheduler: Job 3 finished: run at AccessController.java:0, took 2.819549 s
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_database: task
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: task
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_database: task
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_database: task
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=task tbl=task_107
17/05/29 08:42:51 INFO metastore.HiveMetaStore: 6: drop_table : db=task tbl=task_107
17/05/29 08:42:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=drop_table : db=task tbl=task_107
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:51 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/05/29 08:42:52 INFO metastore.hivemetastoressimpl: deleting hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107
17/05/29 08:42:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
17/05/29 08:42:52 INFO metastore.hivemetastoressimpl: Deleted the diretory hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107
17/05/29 08:42:52 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
hive.exec.staging-dir
in your hive-site.xml
like this:<property>
<name>hive.exec.stagingdir</name>
<value>/tmp/hive/spark-${user.name}</value>
</property>