Hive / Hadoop间歇性故障：无法将源移动到目标 [英] Hive/Hadoop intermittent failure: Unable to move source to destination

查看：2673 发布时间：2018/5/31 19:02:58 hadoop hive apache-spark-sql

本文介绍了Hive / Hadoop间歇性故障：无法将源移动到目标的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一些关于 Hive / Hadoop 无法移动源错误的SO文章。然而，在我的网站中，我看到了同样的错误，但我确信它与权限问题无关。 这是因为问题是间歇性的 - 它有一天工作，但在另一天失败。

因此，我更深入地观察了错误消息。它抱怨未能从

... /。hive-stating_hive ... / - ext-10000 / part-00000 - $ {long-hash}
到目标路径的源路径
... / part-00000 - $ {long-hash}
文件夹。这个观察结果是否与某人敲响了？

这个错误是由一个超级简单的测试查询触发的：只需将一行插入到测试表中（见下文）

错误讯息

org.apache.hadoop .hive.ql.metadata.HiveException：无法移动源 hdfs：//namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02- 13_065_2316479064583526151-5 / -ext-10000 / part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000 到目标 hdfs：//namenodeHA/apps/hive/warehouse/some_db.db/ testTable1 /部分00000-832944cf-7db4-403b-b02e-55b6e61b1af1-C000;
触发此错误的查询（但只是间歇性地）
insert into testTable1 values（2）;

解决方案
感谢您的所有帮助。我找到了一个解决方案。

问题出在一个CTAS create table as ... 由于文件系统不正确关闭而导致失败 insert 命令之前的操作。这个迹象表明，会有一个 IOException：Filesystem closed 消息与失败的 HiveException一起显示：无法将源...移动到目标操作。（我发现我的Spark Thrift Server中的日志消息不是我的应用程序日志）
导致：java.io.IOException：文件系统在org.apache.hadoop.hdfs.DFSClient.checkOpen（DFSClient.java:808）上关闭 $ or $ $ b $ org.apache.hadoop.hdfs.DFSClient.getEZForPath（DFSClient.java:3288） at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath（DistributedFileSystem.java:2093） at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath（HdfsAdmin.java:289） at org.apache.hadoop.hive.shims.Hadoop23Shims $ HdfsEncryptionShim.isPathEncrypted（Hadoop23Shims.java:1221） at org.apache.hadoop.hive.ql.metadata.Hive.moveFile（Hive.java:2607）
解决方案实际上来自另一篇SO文章： https://stackoverflow.com/a/47067350/1168041

但我在这里提供了一篇文章摘录消失了：

添加该属性为hdfs-site.xml

< property> <名称> fs.hdfs.impl.disable.cache< / name> <值> true< /值> < / property>
原因：spark和hdfs使用相同的api（在底部它们使用相同的实例）。

当直线关闭文件系统实例时。它也关闭了thriftserver的
文件系统实例。第二次直线尝试获取实例，它将
总是报告由于：java.io.IOException：文件系统已关闭

请检查此问题：

https：//问题。 apache.org/jira/browse/SPARK-21725

我没有使用 beeline 但CTAS的问题是一样的。

我的测试顺序：

insert into testTable1 值（11）创建table anotherTable，如select 1 insert into testTable1 values（12）
在修复之前，在创建表之后，任何插入都会失败...
修复，这个问题没有了。

There have been some SO articles about Hive/Hadoop "Unable to move source" error. Many of them point to permission problem.

However, in my site I saw the same error but I am quite sure that it is not related to permission problem. This is because the problem is intermittent -- it worked one day but failed on another day.

I thus looked more deeply into the error message. It was complaining about failing to move from a
.../.hive-stating_hive.../-ext-10000/part-00000-${long-hash}
source path to a destination path of
.../part-00000-${long-hash}
folder. Would this observation ring a bell with someone?

This error was triggered by a super simple test query: just insert a row into a test table (see below)

Error message
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02-13_065_2316479064583526151-5/-ext-10000/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000 to destination hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000;
Query that triggered this error (but only intermittently)
insert into testTable1 values (2);

解决方案
Thanks for all the help. I have found a solution. I am providing my own answer here.

The problem was with a "CTAS" create table as ... operation that preceded the failing insert command due to an inappropriate close of the file system. The telltale sign was that there would be an IOException: Filesystem closed message shown together with the failing HiveException: Unable to move source ... to destination operation. ( I found the log message from my Spark Thrift Server not my application log )
Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288) at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093) at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289) at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221) at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
The solution was actually from another SO article: https://stackoverflow.com/a/47067350/1168041

But here I provide an excerpt in case that article is gone:

add the property to hdfs-site.xml
<property> <name>fs.hdfs.impl.disable.cache</name> <value>true</value> </property>
Reason: spark and hdfs use the same api (at the bottom they use the same instance).

When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed"

Please check this issue here:

https://issues.apache.org/jira/browse/SPARK-21725

I was not using beeline but the problem with CTAS was the same.

My test sequence:
insert into testTable1 values (11) create table anotherTable as select 1 insert into testTable1 values (12)
Before the fix, any insert would failed after the create table as … After the fix, this problem was gone.

这篇关于Hive / Hadoop间歇性故障：无法将源移动到目标的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive / Hadoop间歇性故障：无法将源移动到目标 [英] Hive/Hadoop intermittent failure: Unable to move source to destination

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive / Hadoop间歇性故障：无法将源移动到目标 [英] Hive/Hadoop intermittent failure: Unable to move source to destination

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭