如何有效地使用使用datastax java驱动程序批量写入cassandra? [英] How to efficiently use Batch writes to cassandra using datastax java driver?

查看:1612
本文介绍了如何有效地使用使用datastax java驱动程序批量写入cassandra?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Datastax Java驱动程序写批处理到Cassandra,这是我第一次尝试使用datastax java驱动程序批处理,所以我有一些混乱 -

I need to write in Batches to Cassandra using Datastax Java driver and this is my first time I am trying to use batch with datastax java driver so I am having some confusion -

下面是我的代码,我试图创建一个Statement对象并将其添加到Batch并将ConsistencyLevel设置为QUORUM。

Below is my code in which I am trying to make a Statement object and adding it to Batch and setting the ConsistencyLevel as QUORUM as well.

Session session = null;
Cluster cluster = null;

// we build cluster and session object here and we use  DowngradingConsistencyRetryPolicy as well
// cluster = builder.withSocketOptions(socketOpts).withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)

public void insertMetadata(List<AddressMetadata> listAddress) {
    // what is the purpose of unloggedBatch here?
    Batch batch = QueryBuilder.unloggedBatch();

    try {
        for (AddressMetadata data : listAddress) {
            Statement insert = insertInto("test_table").values(
                    new String[] { "address", "name", "last_modified_date", "client_id" },
                    new Object[] { data.getAddress(), data.getName(), data.getLastModifiedDate(), 1 });
            // is this the right way to set consistency level for Batch?
            insert.setConsistencyLevel(ConsistencyLevel.QUORUM);
            batch.add(insert);
        }

        // now execute the batch
        session.execute(batch);
    } catch (NoHostAvailableException e) {
        // log an exception
    } catch (QueryExecutionException e) {
        // log an exception
    } catch (QueryValidationException e) {
        // log an exception
    } catch (IllegalStateException e) {
        // log an exception
    } catch (Exception e) {
        // log an exception
    }
}

下面是我的 AddressMetadata class -

And below is my AddressMetadata class -

public class AddressMetadata {

    private String name;
    private String address;
    private Date lastModifiedDate;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getAddress() {
        return address;
    }

    public void setAddress(String address) {
        this.address = address;
    }

    public Date getLastModifiedDate() {
        return lastModifiedDate;
    }

    public void setLastModifiedDate(Date lastModifiedDate) {
        this.lastModifiedDate = lastModifiedDate;
    }
}

现在我的问题是 - 使用Datastax Java Driver插入cassandra的批处理是否正确?如果重试策略,如果批处理语句执行失败,那么会发生什么,它会再次重试吗?

Now my question is - Does the way I am using Batch to insert into cassandra with Datastax Java Driver is correct? And what about retry policies, meaning if batch statement execution failed, then what will happen, will it retry again?

还有更好的方法使用批量写入cassandra使用java驱动程序?

And is there any better way of using batch writes to cassandra using java driver?

推荐答案

第一个字符:



Cassandra中的批次关键字是用于批量加载大量数据桶的性能优化。

First a bit of a rant:

The batch keyword in Cassandra is not a performance optimization for batching together large buckets of data for bulk loads.

批次用于分组原子操作,您期望一起发生的操作。

Batches are used to group together atomic operations, actions that you expect to occur together. Batches guarantee that if a single part of your batch is successful, the entire batch is successful.

使用批次可能不会提高批量提取的运行速度

Using batches will probably not make your mass ingestion run faster


unloggedBatch这里?

what is the purpose of unloggedBatch here?

Cassandra使用称为批处理日志记录的机制,以确保批处理的原子性。通过指定未记录的批处理,您将关闭此功能,因此该批处理不再是原子性的,可能会失败并部分完成。自然,对于记录您的批次并确保其原子性,使用未记录的批次将会导致性能损失。

Cassandra uses a mechanism called batch logging in order to ensure a batch's atomicity. By specifying unlogged batch, you are turning off this functionality so the batch is no longer atomic and may fail with partial completion. Naturally, there is a performance penalty for logging your batches and ensuring their atomicity, using unlogged batches will removes this penalty.

在某些情况下,您可能需要使用未记录的批次,以确保属于同一分区的请求(插入)一起发送。如果将批处理操作一起进行,并且它们需要在不同的分区/节点中执行,则基本上为协调器创建更多的工作。请参阅Ryan的博客中的具体示例:

There are some cases in which you may want to use unlogged batches to ensure that requests (inserts) that belong to the same partition, are sent together. If you batch operations together and they need to be performed in different partitions / nodes, you are essentially creating more work for your coordinator. See specific examples of this in Ryan's blog:


现在我的问题是 - 使用Batch插入到
cassandra与Datastax Java驱动程序是正确的?

Now my question is - Does the way I am using Batch to insert into cassandra with Datastax Java Driver is correct?

我没有看到你的代码有什么问题这里,只是取决于你想实现的。

I don't see anything wrong with your code here, just depends on what you're trying to achieve. Dig into that blog post I shared for more insight.


如果重试策略,如果批处理语句执行
失败,

And what about retry policies, meaning if batch statement execution failed, then what will happen, will it retry again?

如果失败,它自己的批处理将不会自己重试。

A batch on it's own will not retry on it's own if it fails. The driver does have retry policies but you have to apply those separately.

在这种情况下,java驱动程序中的默认策略只能重试:

The default policy in the java driver only retries in these scenarios:


  • 在读取超时时,如果足够的副本回复,但数据不是
    检索。

  • 在写入超时时,如果我们在写入批处理语句使用的
    分布式日志时超时。

详细了解默认策略

Read more about the default policy and consider less conservative policies based on your use case.

这篇关于如何有效地使用使用datastax java驱动程序批量写入cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆