hbase-indexer solr numFound与hbase表行的大小不同 [英] hbase-indexer solr numFound different from hbase table rows size

查看:145
本文介绍了hbase-indexer solr numFound与hbase表行的大小不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我的团队正在CDH上使用hbase-indexer将hbase table列索引到solr。当我们部署hbase-indexer服务器(称为Key-Value Store Indexer)并开始测试时。我们发现hbase表和solr索引之间的行大小不同:



我们使用Phoenix来计算hbase表行数:

  0:jdbc:phoenix:slave1,slave2,slave3:2181> SELECT / * + NO_INDEX * / COUNT(1)FROM C_PICRECORD; 

+ ----------------------------------------- - +
| COUNT(1)|
+ ------------------------------------------ +
| 4084355 |
+ ------------------------------------------ +

我们使用Solr Web UI来计算索引索引大小:

  numFound:4060479 

我们找不到任何来自hbase-indexer日志和solr日志的错误日志。但是hbase表和solr索引之间的行大小真的不同!有没有人遇到这种情况?我不知道该怎么做

我的理解:

Hbase rowcount - Solr rowcount(numfound)=缺少记录

4084355 - 4060479 = 23876(其中有Hbase和Solr缺少)

Key-Value Store Indexer服务使用Lily HBase NRT索引器来索引添加到HBase表的记录流。



NRT工作在增量数据而不是整个数据上。

我的经验是这些可能的原因: 1)NRT最初工作时,如果突然NRT不工作(由于一些健康问题),那么可能存在数字差异。

<2>如果在将记录插入到HBASE(可能..出于性能原因)时关闭WAL,则NRT在WAL上工作(提前写入日志),NRT将无法正常工作。



可能的解决方案:
1)删除Solr文档并新加载数据来自Hbase的Solr。
Hbase批量索引器,您可以在整个数据上运行(批量索引器不​​会处理增量数据,它可以在整个数据集上运行)
$ b $ 2)作为数据处理的一部分,编写一个map-reduce程序来将数据插入solr。(我们在我们的一个实现中完成的)

Recently my team is using hbase-indexer on CDH for indexing hbase table column to solr . When we deploy hbase-indexer server (which is called Key-Value Store Indexer) and begin testing. We found a situation that the rows size between hbase table and solr index is different :

We used Phoenix to count hbase table rows:

0: jdbc:phoenix:slave1,slave2,slave3:2181> SELECT /*+ NO_INDEX */  COUNT(1) FROM C_PICRECORD;

+------------------------------------------+
|                 COUNT(1)                 |
+------------------------------------------+
| 4084355                                  |
+------------------------------------------+

And we use Solr Web UI to count solr index size :

numFound : 4060479

We could not found any error log from hbase-indexer log and solr log. But the rows size between hbase table and solr index is really different ! Is there anyone meet this situation ? I don't know how to do

解决方案

My understanding :

Hbase rowcount - Solr rowcount(numfound) = missing records

4084355 - 4060479 = 23876 (which are there in Hbase and missing in Solr)

The Key-Value Store Indexer service uses the Lily HBase NRT Indexer to index the stream of records being added to HBase tables.

NRT works on incremental data not whole data.

Out of my experience these are possible reasons :

1) NRT worked initially, and if suddenly NRT is not working(due to some health issues) then there is a possibility of discrepancy in numbers.

2) NRT works on WAL(write ahead log) if WAL is switched off while inserting the records in to HBASE (possible.. for performance reasons), NRT wont work.

Possible solution : 1) Delete Solr documents and freshly load data in to Solr from Hbase. Hbase batch indexer you can run on whole data (Batch indexer wont work on incremental data, it works on whole dataset)

2) As part of data-flow pipe line, Write a map-reduce program to insert the data in to solr.(what we have done in one of our implementation)

这篇关于hbase-indexer solr numFound与hbase表行的大小不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆