在Cassandra上添加辅助索引会索引历史数据? [英] Adding Secondary index on Cassandra indexes historical data?

查看:439
本文介绍了在Cassandra上添加辅助索引会索引历史数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果在特定的列族上,以后在列上添加索引,它将索引历史数据或在添加索引后现在的数据。



在这里 Cassandra什么时候将列更新为辅助索引后的DB索引数据
接受的答案表示它将只索引创建索引后插入的数据。



我尝试创建一个列索引CF(我使用Cassandra 1.0.7)



创建列家庭用户与comparator = UTF8Type
和column_metadata = [{column_name:full_name,validation_class:UTF8Type},
{column_name:birth_date,validation_class:LongType,index_type:KEYS},
{column_name:state,validation_class:UTF8Type,index_type:KEYS} p>

添加了一些数据,然后



删除索引by drop index users.birth_date
然后将其添加回来通过使用comparator = UTF8Type
和column_metadata = [{column_name:full_name,validation_class:UTF8Type},
{column_name}更新CF



:birth_date,validation_class:LongType,index_type:KEYS},
{column_name:state,validation_class:UTF8Type,index_type:KEYS}];



数据再次



但是当我在birth_data上查询时,我也获取历史数据?



有人可以清除我的混乱对此?
有两种方法可以创建索引,一个有历史数据,一个没有?

解决方案

Cassandra没有为历史数据构建索引,但是根据Cassandra 1.2的代码,索引创建是一个异步过程,如果您添加了辅助索引,它会发生在历史数据上:



https://github.com/apache/cassandra/blob/cassandra-1.2.15/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java#L240 p>

在您的场景中,发生的是您删除了索引并添加了索引。因为旧的索引文件已经加载并且没有从磁盘中删除,Cassandra链接它们再次使用。



如果您不确定您的辅助索引是否同步,您可以使用:

 
nodetool rebuild_index


if on a particular column family i add a index on a column later on will it index the historical data too or data which comes now after adding the index.

Here in this When does Cassandra DB index data after updating a column as secondary index The accepted answer says it will index only data which is inserted after creating the index.

I tried creating a CF with index on a column.(i am using Cassandra 1.0.7)

create column family users with comparator=UTF8Type and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, {column_name: birth_date, validation_class: LongType, index_type: KEYS}, {column_name: state, validation_class: UTF8Type, index_type: KEYS}];

Added some data , then did

removed index by drop index users.birth_date then added it back by updating CF

update column family users with comparator=UTF8Type and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, {column_name: birth_date, validation_class: LongType, index_type: KEYS}, {column_name: state, validation_class: UTF8Type, index_type: KEYS}];

and then added some data again

But when i am querying on birth_data i get historical data too ?

Can someone clear my confusion on this ? Are there two ways to create index , one with historical data and one without ?

解决方案

Maybe the previous version of Cassandra didn't build indexes for historical data, but according to the code post Cassandra 1.2, the index creation is an async process that does happen on historical data if you add a secondary index:

https://github.com/apache/cassandra/blob/cassandra-1.2.15/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java#L240

In your scenario, what has happened is that you removed the index and added the index. Because the old index files where already loaded and not removed from disk, Cassandra linked them for usage again. Otherwise, it would have attempted to create them.

In case you are not sure about your secondary indexes being in sync, you can use:

nodetool rebuild_index

这篇关于在Cassandra上添加辅助索引会索引历史数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆