在Cassandra中实现数据版本控制的方法 [英] Ways to implement data versioning in Cassandra

查看:440
本文介绍了在Cassandra中实现数据版本控制的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可以分享您的想法,如何在Cassandra中实现数据版本控制。



假设我需要一个简单的通讯录中的版本记录。 (地址簿记录作为Rows存储在ColumnFamily中)。
我希望历史:




  • 很少使用


  • 将不会有超过几百个版本的单个记录。


  • b ul>
  • 将地址簿转换为超级列族,并将多个版本的地址簿记录存储在一个按键(通过时间戳)作为超级列。


  • 创建新的超级列族以存储记录的旧记录或更改。
    这样的结构如下:



    {
    '地址簿行键':{
    'time stamp1':{
    'first name':'new name',
    'by':user id',
    },

     'time stamp2':{
    'name':'new name',
    'by':'user id',
    } b},

    '另一个地址簿行键':{
    ' {
    ....


  • 将版本存储为新ColumnFamilly中附带的序列化(JSON)对象。将版本集表示为行和版本作为列。 (在使用CouchDB的简单文档版本控制之后建模)



解决方案

如果您可以添加假设地址簿通常包含少于10,000个条目,



一行将如下所示:

  {'address_book_18f3a8':
{1290635938721704:{'entry1':'entry1_stuff','entry2':'entry2_stuff'}},
{1290636018401680 :{'entry1':'entry1_stuff_v2',...},
...
}


b $ b

其中行键标识地址簿,每个超级列名称是时间戳,子列表示该版本的地址簿内容。



这将允许您只读取一个查询的最新版本的通讯录,并用一个插入来写一个新版本。



我建议使用这个如果地址簿少于10,000个元素,那就是当你读取一个子列时,超级列必须完全反序列化。



另一种方法是在每个版本的通讯录中使用一行,并且使用单独的CF和每个地址簿的时间行行,如:

  {'address_book_18f3a8':{1290635938721704:some_uuid1,1290636018401680: some_uuid2 ...}} 

这里,some_uuid1和some_uuid2对应于地址簿。这种方法的缺点是每次读取地址簿时需要两个查询。它的优点是,它可以让你有效地只读选择的地址簿的部分。


Can you share your thoughts how would you implement data versioning in Cassandra.

Suppose that I need to version records in an simple address book. (Address book records are stored as Rows in a ColumnFamily). I expect that the history:

  • will be used infrequently
  • will be used all at once to present it in a "time machine" fashion
  • there won't be more versions than few hundred to a single record.
  • history won't expire.

I'm considering the following approach:

  • Convert the address book to Super Column Family and store multiple version of address book records in one Row keyed (by time stamp) as super columns.

  • Create new Super Column Family to store old records or changes to the records. Such structure would look as follows:

    { 'address book row key': { 'time stamp1': { 'first name': 'new name', 'modified by': 'user id', },

    'time stamp2': {
            'first name': 'new name',
            'modified by': 'user id',
        },
    },
    

    'another address book row key': { 'time stamp': { ....

  • Store versions as serialized (JSON) object attached in new ColumnFamilly. Representing sets of version as rows and versions as columns. (modelled after Simple Document Versioning with CouchDB)

解决方案

If you can add the assumption that address books typically have fewer than 10,000 entries in them, then using one row per address book time line in a super column family would be a decent approach.

A row would look like:

{'address_book_18f3a8':
  {1290635938721704: {'entry1': 'entry1_stuff', 'entry2': 'entry2_stuff'}},
  {1290636018401680: {'entry1': 'entry1_stuff_v2', ...},
  ...
}

where the row key identifies the address book, each super column name is a time stamp, and the subcolumns represent the address book's contents for that version.

This would allow you to read the latest version of an address book with only one query and also write a new version with a single insert.

The reason I suggest using this if address books are less than 10,000 elements is that super columns must be completely deserialized when you read even a single subcolumn. Overall, not that bad in this case, but it's something to keep in mind.

An alternative approach would be to use a single row per version of the address book, and use a separate CF with a time line row per address book like:

{'address_book_18f3a8': {1290635938721704: some_uuid1, 1290636018401680: some_uuid2...}}

Here, some_uuid1 and some_uuid2 correspond to the row key for those versions of the address book. The downside to this approach is that it requires two queries every time the address book is read. The upside is that it lets you efficiently read only select parts of an address book.

这篇关于在Cassandra中实现数据版本控制的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆