Hive / Hadoop间歇性故障:无法将源移动到目标 [英] Hive/Hadoop intermittent failure: Unable to move source to destination
问题描述
有一些关于 Hive / Hadoop
无法移动源错误的SO文章。然而,在我的网站中,我看到了同样的错误,但我确信它与权限问题无关。 这是因为问题是间歇性的 - 它有一天工作,但在另一天失败。
因此,我更深入地观察了错误消息。它抱怨未能从
... /。hive-stating_hive ... / - ext-10000 / part-00000 - $ {long-hash}
到目标路径的源路径
... / part-00000 - $ {long-hash}
文件夹。这个观察结果是否与某人敲响了?
这个错误是由一个超级简单的测试查询触发的:只需将一行插入到测试表中(见下文)
错误讯息
org.apache.hadoop .hive.ql.metadata.HiveException:
无法移动源
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02- 13_065_2316479064583526151-5 / -ext-10000 / part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000
到目标
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/ testTable1 /部分00000-832944cf-7db4-403b-b02e-55b6e61b1af1-C000;
触发此错误的查询(但只是间歇性地)
insert into testTable1
values(2);
感谢您的所有帮助。我找到了一个解决方案。
问题出在一个CTAS create table as ...
由于文件系统不正确关闭而导致失败 insert
命令之前的操作。这个迹象表明,会有一个 IOException:Filesystem closed
消息与失败的 HiveException一起显示:无法将源...移动到目标
操作。 (我发现我的Spark Thrift Server中的日志消息不是我的应用程序日志)
导致:java.io.IOException:文件系统在org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)上关闭
$ or $ $ b $ org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
at org.apache.hadoop.hive.shims.Hadoop23Shims $ HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
解决方案实际上来自另一篇SO文章: https://stackoverflow.com/a/47067350/1168041
但我在这里提供了一篇文章摘录消失了:
添加该属性为hdfs-site.xml
< property>
<名称> fs.hdfs.impl.disable.cache< / name>
<值> true< /值>
< / property>
原因:spark和hdfs使用相同的api(在底部它们使用相同的实例)。
当直线关闭文件系统实例时。它也关闭了thriftserver的
文件系统实例。第二次直线尝试获取实例,它将
总是报告由于:java.io.IOException:文件系统已关闭
请检查此问题:
https://问题。 apache.org/jira/browse/SPARK-21725
我没有使用 beeline
但CTAS的问题是一样的。
我的测试顺序:
insert into testTable1
值(11)
创建table anotherTable,如select 1
insert into testTable1
values(12)
在修复之前,在创建表之后,任何插入都会失败...
修复,这个问题没有了。
There have been some SO articles about Hive/Hadoop
"Unable to move source" error. Many of them point to permission problem.
However, in my site I saw the same error but I am quite sure that it is not related to permission problem. This is because the problem is intermittent -- it worked one day but failed on another day.
I thus looked more deeply into the error message. It was complaining about failing to move from a
.../.hive-stating_hive.../-ext-10000/part-00000-${long-hash}
source path to a destination path of
.../part-00000-${long-hash}
folder. Would this observation ring a bell with someone?
This error was triggered by a super simple test query: just insert a row into a test table (see below)
Error message
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/.hive-staging_hive_2018-02-02_23-02-13_065_2316479064583526151-5/-ext-10000/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000
to destination
hdfs://namenodeHA/apps/hive/warehouse/some_db.db/testTable1/part-00000-832944cf-7db4-403b-b02e-55b6e61b1af1-c000;
Query that triggered this error (but only intermittently)
insert into testTable1
values (2);
Thanks for all the help. I have found a solution. I am providing my own answer here.
The problem was with a "CTAS" create table as ...
operation that preceded the failing insert
command due to an inappropriate close of the file system. The telltale sign was that there would be an IOException: Filesystem closed
message shown together with the failing HiveException: Unable to move source ... to destination
operation. ( I found the log message from my Spark Thrift Server not my application log )
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
The solution was actually from another SO article: https://stackoverflow.com/a/47067350/1168041
But here I provide an excerpt in case that article is gone:
add the property to hdfs-site.xml
<property> <name>fs.hdfs.impl.disable.cache</name> <value>true</value> </property>
Reason: spark and hdfs use the same api (at the bottom they use the same instance).
When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed"
Please check this issue here:
I was not using beeline
but the problem with CTAS was the same.
My test sequence:
insert into testTable1
values (11)
create table anotherTable as select 1
insert into testTable1
values (12)
Before the fix, any insert would failed after the create table as …
After the fix, this problem was gone.
这篇关于Hive / Hadoop间歇性故障:无法将源移动到目标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!