重建索引失败的蜂巢在Azure HDInsight与TEZ [英] Rebuild index failed on Hive on Azure HDInsight with Tez

查看:478
本文介绍了重建索引失败的蜂巢在Azure HDInsight与TEZ的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试着在Azure HDInsight创建蜂巢指标启用TEZ。
我可以成功地创建索引,但我不能重建它们:与此输出作业失败:

I try to create indexes on Hive on Azure HDInsight with Tez enabled. I can successfully create indexes but I can't rebuild them : the job failed with this output :

Map 1: -/-  Reducer 2: 0/1  
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1421234198072_0091_1_00, diagnostics=[Vertex > received Kill in INITED state.]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

我已经创建了我的表和索引具有以下工作:

I have created my table and indexes with the following job :

DROP TABLE IF EXISTS Measures;
CREATE TABLE Measures(
    topology string,
    val double,
    date timestamp,
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE LOCATION 'wasb://<mycontainer>@<mystorage>.blob.core.windows.net/';

CREATE INDEX measures_index_topology ON TABLE Measures (topology) AS 'COMPACT' WITH DEFERRED REBUILD;
CREATE INDEX measures_index_date ON TABLE Measures (date) AS 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX measures_index_topology ON Measures REBUILD;
ALTER INDEX measures_index_date ON Measures REBUILD;

我在哪里错了?而为什么我重建索引失败?
最好的问候

Where am I wrong ? And why my rebuilding index fail ? Best regards

推荐答案

看起来TEZ可能与在空表生成索引的一个问题。我能得到同样的错误,你(不使用SERDE JSON),如果你看一下失败的DAG中的应用程序日志,您可能会看到类似这样的:

It looks like Tez might have a problem with generating an index on an empty table. I was able to get the same error as you (without using the JSON SerDe), and if you look at the application logs for the DAG that fails, you might see something like:

java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:299)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getSplits(TezGroupedSplitsInputFormat.java:68)
    at org.apache.tez.mapreduce.hadoop.MRHelpers.generateOldSplits(MRHelpers.java:263)
    at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:139)
    at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154)
    at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146)
    ...

如果你填充一个虚拟的记录表,似乎很好地工作。我用:

If you populate the table with a single dummy record, it seems to work fine. I used:

INSERT INTO TABLE Measures SELECT market,0,0 FROM hivesampletable limit 1;

在此之后,索引重建能够无错误运行

After that, the index rebuild was able to run without error.

这篇关于重建索引失败的蜂巢在Azure HDInsight与TEZ的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆