在 Cassandra 中实现数据版本控制的方法 [英] Ways to implement data versioning in Cassandra

查看:21
本文介绍了在 Cassandra 中实现数据版本控制的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您能否分享您的想法,您将如何在 Cassandra 中实现数据版本控制.

Can you share your thoughts how would you implement data versioning in Cassandra.

假设我需要在一个简单的地址簿中版本记录.(地址簿记录存储为 ColumnFamily 中的行).我希望历史:

Suppose that I need to version records in an simple address book. (Address book records are stored as Rows in a ColumnFamily). I expect that the history:

  • 将很少使用
  • 将一次全部使用,以时间机器"的方式呈现
  • 单个记录的版本不会超过几百个.
  • 历史不会过期.

我正在考虑以下方法:

  • 将地址簿转换为超级列族,并将多个版本的地址簿记录存储在一行中(按时间戳)作为超级列.

  • Convert the address book to Super Column Family and store multiple version of address book records in one Row keyed (by time stamp) as super columns.

创建新的超级列族来存储旧记录或对记录的更改.这样的结构如下所示:

Create new Super Column Family to store old records or changes to the records. Such structure would look as follows:

{'地址簿行键':{'时间戳1':{'first name': '新名字','修改者':'用户ID',},

{ 'address book row key': { 'time stamp1': { 'first name': 'new name', 'modified by': 'user id', },

'time stamp2': {
        'first name': 'new name',
        'modified by': 'user id',
    },
},

'另一个地址簿行键':{'时间戳':{....

'another address book row key': { 'time stamp': { ....

将版本存储为附加在新 ColumnFamilly 中的序列化 (JSON) 对象.将版本集表示为行,将版本表示为列.(模仿 使用 CouchDB 进行简单文档版本控制)p>

Store versions as serialized (JSON) object attached in new ColumnFamilly. Representing sets of version as rows and versions as columns. (modelled after Simple Document Versioning with CouchDB)

推荐答案

如果您可以添加这样一个假设,即地址簿中的条目通常少于 10,000 个,那么在超级列族中每个地址簿时间线使用一行将是一个体面的方法.

If you can add the assumption that address books typically have fewer than 10,000 entries in them, then using one row per address book time line in a super column family would be a decent approach.

一行看起来像:

{'address_book_18f3a8':
  {1290635938721704: {'entry1': 'entry1_stuff', 'entry2': 'entry2_stuff'}},
  {1290636018401680: {'entry1': 'entry1_stuff_v2', ...},
  ...
}

其中行键标识地址簿,每个超级列名称是一个时间戳,子列代表该版本的地址簿内容.

where the row key identifies the address book, each super column name is a time stamp, and the subcolumns represent the address book's contents for that version.

这将允许您仅通过一个查询读取最新版本的地址簿,并通过一次插入编写一个新版本.

This would allow you to read the latest version of an address book with only one query and also write a new version with a single insert.

如果地址簿少于 10,000 个元素,我建议使用它的原因是当您阅读单个子列时,必须完全反序列化超级列.总的来说,在这种情况下还不错,但需要牢记这一点.

The reason I suggest using this if address books are less than 10,000 elements is that super columns must be completely deserialized when you read even a single subcolumn. Overall, not that bad in this case, but it's something to keep in mind.

另一种方法是为地址簿的每个版本使用一行,并使用单独的 CF,每个地址簿都有一个时间线行,例如:

An alternative approach would be to use a single row per version of the address book, and use a separate CF with a time line row per address book like:

{'address_book_18f3a8': {1290635938721704: some_uuid1, 1290636018401680: some_uuid2...}}

这里,some_uuid1 和 some_uuid2 对应于这些版本的地址簿的行键.这种方法的缺点是每次读取地址簿时都需要进行两次查询.好处是它可以让您有效地只阅读地址簿的选定部分.

Here, some_uuid1 and some_uuid2 correspond to the row key for those versions of the address book. The downside to this approach is that it requires two queries every time the address book is read. The upside is that it lets you efficiently read only select parts of an address book.

这篇关于在 Cassandra 中实现数据版本控制的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆