删除Cassandra中大型数据集的列 [英] Deleting column in cassandra for large dataset

查看:96
本文介绍了删除Cassandra中大型数据集的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个多余的列,希望从我们的Cassandra数据库(版本2.1.15)中删除。这是一个文本列,代表磁盘上的大多数数据(15个节点X每个节点1.8 TB)。

We have a redundant column that we'd like to delete from our Cassandra database (version 2.1.15). This is a text column represents the majority of data on disk (15 nodes X 1.8 TB per node).

最简单的选择似乎只是更改表以删除该列,然后让Cassandra压缩处理事务(还运行Cassandra Reaper来管理维修)。但是,考虑到我担心的数据集的大小,我将使用大量删除操作来删除群集。

The easiest option just seems to be an alter table to remove that column, and then let Cassandra compaction take care of things (also running Cassandra Reaper to manage repairs). However, given the size of the dataset I'm concerned I will knock over the cluster with a massive delete.

我考虑的其他选项是将运行的过程通过键空间将值设置为null,但是我认为这将与删除列具有相同的效果,但是更多地处于不可控制的范围内(但还需要编写一些内容来实现)。

Other options I've consider is a process that will run through the keyspace setting the value to null, but I think this will have the same effect as removing the column, but is more under out control (but also requires writing something to do this).

有人对如何解决这个问题有任何建议吗?

Would anyone have any advice on how to approach this?

谢谢!

推荐答案

删除列的确会将删除的值标记为逻辑删除。列值立即变得不可用,并且在下一个压缩周期中将删除列数据。

Dropping a column does mark the deleted values as tombstones. The column value becomes unavailable immediately and the column data is removed in the next compaction cycle.

如果您想在压缩发生之前加快列的删除速度,您可以使用 ALTER TABLE 命令更改列的元数据后,运行 nodetool upgradesstables 删除数据。

If you want to to expedite the removal of the column before the compaction occurs, you can run nodetool upgradesstables to remove the data, after you use the ALTER TABLE command to change the metadata for the column.

请参阅文档> ://docs.datastax.com/en/cql/3.1/cql/cql_reference/alter_table_r.html

这篇关于删除Cassandra中大型数据集的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆