是否可以避免Cassandra的墓碑问题? [英] Is it possible to avoid tombstone problems with Cassandra?

查看:426
本文介绍了是否可以避免Cassandra的墓碑问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Cassandra作为数据库系统编写CMS的代码.

I am writing code for a CMS using Cassandra as the database system.

CMS的优势之一是使用后端计算机预先计算各种事物,该后端计算机将根据CMS中更改的数据永久运行.

One of the strength of the CMS is to pre-calculate all sorts of things using a backend computer that permanently runs against data that changes in the CMS.

例如,CMS通知列表系统页面已创建或更改.列表系统将该信息保存在名为list的表中.这些信息只是告诉我必须处理哪一页的信息.

For example, the CMS tells the list system that a page was created or changed. The list system saves that information in a table called list. That information is just a one liner which tells me which page has to be worked on.

Column family: list
   Row: concerned website (i.e. http://www.example.com/)
     Column: full URI (i.e. http://www.example.com/this/page)
        Value: true (because you need something for the column to exist)

偶尔(通常在简单页面编辑后不到一秒钟),该列表后端系统就会唤醒,并看到某个页面已更改,并通过更新包含(或不包含)的所有列表来开始处理该页面包括在内)将该页面作为元素.这使前端可以立即知道列表中的元素数量,并快速读取列表,而无需在需要列表时运行复杂的查询(这与许多CMS使用SQL所做的事情相反.). )

Once in a while (most often less than a second after a simple page edit), that list backend system wakes up and sees that a certain page changed and starts working on it by updating all the lists that include (or do not include anymore) that page as an element. This allows the front end to instantly know the number of elements in a list and to read lists very quickly without running complex queries at the time the list is needed (opposed to what many CMS do using SQL...)

实际上,我将list表用作TODO列表.我必须处理的一组页面.因此,前端将页面引用添加到该列表,后端则将它们删除.结果,我可以在list表中得到大量的墓碑.现实世界的影响:我发生了逻辑删除错误,并且系统在 random 位置开始出现故障.一旦列表停止工作,系统中的许多其他事情就会停止工作,并且这些网站将变得不可用.

In effect, I am using the list table as a TODO list. A set of pages I have to work on. So the front end adds page references to that list, and the backend deletes them once done with them. As a result I can end up with a very large number of tombstones in the list table. The real world effect: I had tombstone failures and the system started failing in random places. And once when the list stops working, many other things in the system stop working and the websites become unusable.

我减少了Cassandra处理该特定表(以及其他一些表)中的墓碑所花费的时间,但是我想知道我是否按预期使用了Cassandra.在这种环境下,是否有更好的方法来处理此类TODO列表?

I decreased the time it takes Cassandra to take care of tombstones in that specific table (and a few others) but I am wondering whether I'm using Cassandra as expected. Whether there is a better way to handle a TODO list of this sort in this environment?

作为旁注:可以在各种不同的后端计算机上处​​理TODO列表.在小型系统上,您可能只对列表数据运行一个后端,在具有数千个用户的大型系统上,您不太可能只有2个或3个后端来处理列表.因此,在Cassandra中存储数据对于在计算机之间快速共享数据非常实用.

As a side note: the TODO list may be worked on from various different backend computers. On a small system, you are likely to have only one backend running against the list data, on larger systems with thousands of users, you are not unlikely to have 2 or 3 backends just to handle lists. So having the data in Cassandra is very practical to share it quickly between computers.

推荐答案

您实际上实现了一个队列,该队列被认为是cassandra的反模式: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

You essentially implemented a queue which is considered an anti-pattern for cassandra: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

有很多工作要做,人们正在做些事情来使它们变得更好,但这是一个很难玩的游戏.确保使用LeveledCompactionStrategy而非默认值,这将在较小的工作负载中有很大帮助.考虑一下解决方法,例如对分区装箱(旧的节俭术语中的行),以及上面链接的文章中的内容,但是您可能希望寻找其他解决方案.

There are work arounds and things people do to make them better but its a hard game to play. Be sure to use LeveledCompactionStrategy and not the default, this will help a lot in smaller workloads. Consider the work arounds like time boxing the partitions (rows in old thrift terminology) and whats in the article linked above but you may want to look for a different solution.

这篇关于是否可以避免Cassandra的墓碑问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆