Hive查询不使用索引 [英] Hive query not using index
问题描述
我正在分析在hive表上使用索引的影响。我创建了一个包含5列( COL1
, COL2
, COL3
, COL4
, COL5
)并加载100000行。我还在这张桌子上的COL1上创建了一个索引。
我用运行
子句,它是一个索引列。 select *
COL1
中的WHERE
在查询运行时,与创建索引之前运行相同查询时相比,我没有看到任何改进。
我对select查询做了EXPLAIN,它显示TableScan而不是IndexScan,我无法弄清楚它为什么不使用索引。
请帮忙。 / p>
-
创建索引
CREATE INDEX .. ON TABLE ...
-
构建索引
ALTER INDEX .. ON .. REBUILD;
-
使用索引
INSERT OVERWRITE DIRECTORY'/ tmp / indexes / ..'SELECT
_bucketname
,_offsets
FROM default__t _..__...
SET hive.index .compact.file = / tmp / indexes / x;
SET hive.input.format = org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
SELECT ... from ... where ... group by ...;
希望它有帮助
I am analyzing the impact of using an index on hive table. I created a table with 5 columns (COL1
,COL2
,COL3
,COL4
,COL5
) and loaded 100000 rows in it. I also created an index on COL1 on this table.
I ran select *
with WHERE
clause on COL1
which is an index column.
I see no improvement in query run-time compared to when I ran the same query before creating the index.
I did an EXPLAIN on my select query and it shows TableScan instead of IndexScan and I am unable to figure out why it's not using the index.
Please help.
You can check this and this but basically it is as following;
Create the index
CREATE INDEX .. ON TABLE...
Build the index
ALTER INDEX .. ON .. REBUILD;
Use the index
INSERT OVERWRITE DIRECTORY '/tmp/indexes/..' SELECT
_bucketname
,_offsets
FROM default__t_..__...SET hive.index.compact.file=/tmp/indexes/x;
SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
SELECT ... from ... where ... group by ...;
Hope it helps
这篇关于Hive查询不使用索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!