Cassandra中的版本控制 [英] Versioning in cassandra

查看:117
本文介绍了Cassandra中的版本控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用cassandra进行版本控制.

I have a requirement of versioning to be done using cassandra.

以下是我的列族定义

创建表file_details(id文本主键,fname文本,version int,mimetype文本);

我在fname列上创建了一个二级索引.

I have a secondary index created on fname column.

每当我为相同的'fname'插入时,版本都应该增加.当我检索带有fname的行时,它应该返回我最新版本的行.

Whenever I do an insert for the same 'fname', the version should be incremented. And when I retrieve a row with fname it should return me the latest version row.

请提出需要采取的方法.

Please suggest what approach needs to be taken.

推荐答案

如果无法放松将版本增加1的要求,一种选择是使用计数器.

If it's not possible to relax the requirement of versions increasing by 1, one option is to use counters.

为数据创建表:

create table file_details(id text primary key, fname text, mimetype text);

和版本的单独表格:

create table file_details_version(id text primary key, version counter);

这必须是一个单独的表,因为表可以包含所有计数器,也可以不包含计数器.

This needs to be a separate table because tables can either contain all counters or no counters.

然后可以进行更新:

insert into file_details(id, fname, mimetype) values ('id1', 'fname', 'mime');
update file_details_version set version = version + 1 where id = 'id1';

然后,从file_details读取将始终返回最新版本,并且您可以从file_details_version中找到最新版本号.

Then a read from file_details will always return the latest, and you can find the latest version number from file_details_version.

这有很多问题.您不能使用计数器进行原子批处理,因此这两个更新不是原子更新-某些失败情况可能导致仅持久保留对file_details的插入.此外,没有读取隔离,因此,如果在更新期间进行读取,则可能会导致两个表之间的数据不一致.最后,Cassandra中的计数器更新不能容忍失败,因此,如果在计数器更新期间发生失败,您可能会加倍计数即增加太多版本.

There are numerous problems with this though. You can't do atomic batches with counters, so the two updates are not atomic - some failure scenarios could lead to only the insert into file_details being persisted. Also, there is no read isolation, so if you read during an update you may get inconsistent data between the two tables, Finally, counter updates in Cassandra are not tolerant of failures, so if a failure happens during a counter update you may double count i.e. increment the version too much.

我认为所有涉及柜台的解决方案都会解决这些问题.您可以通过为每次更新生成唯一的ID(例如,较大的随机数)并将其插入到单独表的一行中来避免使用计数器.然后,版本将是该行中ID的数量.现在,您可以执行原子更新,并且计数可以容忍失败.但是,读取时间将为O(更新次数),并且读取仍不会被隔离.

I think all solutions involving counters will hit these issues. You could avoid counters by generating a unique ID (e.g. a large random number) for each update and inserting that into a row in a separate table. The version would then be the number of IDs in the row. Now you can do atomic updates, and the counts would be tolerant to failures. However, the read time would be O(number of updates) and reads would still not be isolated.

这篇关于Cassandra中的版本控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆