逻辑删除的单元格没有删除 [英] Tombstoned cells without DELETE

查看:73
本文介绍了逻辑删除的单元格没有删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Cassandra集群

I'm running Cassandra cluster

Software version: 2.0.9
Nodes: 3
Replication factor: 2

我有一个非常简单的表,用于插入和更新数据。

I'm having a very simple table where I insert and update data.

CREATE TABLE link_list (
      url text,
      visited boolean,
      PRIMARY KEY ((url))
    );

行没有过期,我也没有进行任何删除。一旦运行我的应用程序,由于逻辑删除单元的数量增加,它很快就会变慢:

There is no expire on rows and I'm not doing any DELETEs. As soon as I run my application it quickly slows down due to the increasing number of tombstoned cells:

Read 3 live and 535 tombstoned cells

它在几分钟之内就达到了数千个。

It gets up to thousands in few minutes.

我的问题是,如果我不执行任何删除操作,那么负责生成那些单元格的原因是什么?

My question is what is responsible for generating those cells if I'm not doing any deletions?

//更新

这是我用来与 com.datastax.driver 与Cassandra交谈的实现。

This is the implementation I'm using to talk to Cassandra with com.datastax.driver.

public class LinkListDAOCassandra implements DAO {


    public void save(Link link) {
        save(new VisitedLink(link.getUrl(), false));
    }

    @Override
    public void save(Model model) {
        save((Link) model);
    }

    public void update(VisitedLink link) {
        String cql = "UPDATE link_list SET visited = ? WHERE url = ?";
        Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getVisited(), link.getUrl());
    }

    public void save(VisitedLink link) {
        String cql = "SELECT url FROM link_list_inserted WHERE url = ?";

        if(Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl()).all().size() == 0) {
            cql = "INSERT INTO link_list_inserted (url) VALUES (?)";
            Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl());

            cql = "INSERT INTO link_list (url, visited) VALUES (?,?)";
            Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl(), link.getVisited());
        }
    }

    public VisitedLink getByUrl(String url) {
        String cql = "SELECT * FROM link_list WHERE url = ?";

        for(Row row : Cassandra.DB.execute(cql, url)) {
            return new VisitedLink(row.getString("url"), row.getBool("visited"));
        }

        return null;
    }

    public List<Link> getLinks(int limit) {
        List<Link> links = new ArrayList();
        ResultSet results;

        String cql = "SELECT * FROM link_list WHERE visited = False LIMIT ?";

        for(Row row : Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, limit)) {
            try {
                links.add(new Link(new URL(row.getString("url"))));
            }
            catch(MalformedURLException e) { }
        }

        return links;
    }
}

这是执行实现

public ResultSet execute(String cql, ConsistencyLevel cl, Object... values) {
        PreparedStatement statement = getSession().prepare( cql ).setConsistencyLevel(cl);
        BoundStatement boundStatement = new BoundStatement( statement );
        boundStatement.bind(values);

        return session.execute(boundStatement);
    }

//更新2

cfstats的一个有趣发现表明,只有一个表具有墓碑。 link_list_visited

An interesting finding from the cfstats shows that only one table has tombstones. It's link_list_visited. Does it mean that updating a column with a secondary index will create tombstones?

Table (index): link_list.link_list_visited
                SSTable count: 2
                Space used (live), bytes: 5055920
                Space used (total), bytes: 5055991
                SSTable Compression Ratio: 0.3491883995187955
                Number of keys (estimate): 256
                Memtable cell count: 15799
                Memtable data size, bytes: 1771427
                Memtable switch count: 1
                Local read count: 85703
                Local read latency: 2.805 ms
                Local write count: 484690
                Local write latency: 0.028 ms
                Pending tasks: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used, bytes: 32
                Compacted partition minimum bytes: 8240
                Compacted partition maximum bytes: 7007506
                Compacted partition mean bytes: 3703162
                Average live cells per slice (last five minutes): 3.0
                Average tombstones per slice (last five minutes): 674.0


推荐答案

二级索引和一个用于手动保存索引的额外的列族是,二级索引仅包含有关当前节点的信息(即它不包含有关其他节点数据的信息),并且由于对主表进行更新而导致的对二级索引的操作是原子操作。除此之外,您可以将其视为具有相同弱点的常规列系列,但主列系列上的大量更新将导致索引表上的大量删除,因为主表上的更新将被翻译作为对索引表的删除/插入操作。索引表中的所说删除是墓碑的来源。 Cassandra删除是逻辑删除,直到下一个修复过程(将删除墓碑时)。

The only major differences between a secondary index and an extra column family to manually hold the index is that the secondary index only contains information about the current node (i.e. it does not contain information about other node's data) and the operations over the secondary index as a result of an update on the primary table are atomic operations. Other than that you can see it as a regular column family with the same weak spots, a high number of updates on the primary column family will lead to a high number of deletes on the index table because the updates on the primary table will be translated as a delete/insert operation on the index table. Said deletions in the index table are the source of the tombstones. Cassandra deletes are logical deletes until the next repair process (when the tombstones will be removed).

希望它会有所帮助!

这篇关于逻辑删除的单元格没有删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆