spring 数据 cassandra 存储库上的缓慢插入和保存性能 [英] Slow insert and saveall performance on spring data cassandra repository

查看:16
本文介绍了spring 数据 cassandra 存储库上的缓慢插入和保存性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 spring 将 1500 条记录插入到 cassandra 中.我有一个包含这 1500 条记录的 POJO 列表,当我调用 saveAll 或插入此数据时,完成此操作需要 30 秒.有人可以建议我更快地完成这项工作吗?我目前正在将 Cassandra 3.11.2 作为单节点测试集群运行.

I am trying to insert 1500 records using spring into cassandra. I have a list of POJOs which hold these 1500 records and when I call saveAll or insert on this data it takes 30 seconds to complete this operation. Can someone suggest a way for me to get this done faster? I am currently running Cassandra 3.11.2 as a single node Test cluster.

实体 POJO:

package com.samplepoc.pojo;

import static org.springframework.data.cassandra.core.cql.PrimaryKeyType.PARTITIONED;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

import org.springframework.data.cassandra.core.mapping.Column;
import org.springframework.data.cassandra.core.mapping.PrimaryKeyColumn;
import org.springframework.data.cassandra.core.mapping.Table;

@Table("health")
public class POJOHealth
{
    @PrimaryKeyColumn(type=PARTITIONED)
    UUID primkey;
    @Column
    String col1;
    @Column
    String col2;
    @Column
    String col3;
    @Column
    String col4;
    @Column
    String col5;
    @Column
    Date ts;
    @Column
    boolean stale;
    @Column
    String col6;
    @Column
    String col7;
    @Column
    String col8;
    @Column
    String col9;
    @Column
    Map<String,String> data_map = new HashMap<String,String>();

    public POJOHealth(
             String col1,
             String col2,
             String col3,
             String col4,
             String col5,
             String col6,
             String col7,
             String col8,
             String col9,
             boolean stale,
             Date ts,
             Map<String,String> data_map
             )
    {
        this.primkey = UUID.randomUUID();
        this.col1=col1;
        this.col2=col2;
        this.col3=col3;
        this.col4=col4;
        this.col5=col5;
        this.col6=col6;
        this.col7=col7;
        this.col8=col8;
        this.col9=col9;
        this.ts=ts;
        this.data_map = data_map;
        this.stale=stale;
    }

    //getters & setter ommitted
}

持久服务代码段:

public void persist(List<POJO> l_POJO)
{
        System.out.println("Enter Persist: "+new java.util.Date());

        List<l_POJO> l_POJO_stale = repository_name.findBycol1AndStale("sample",false);
        System.out.println("Retrieve Old: "+new java.util.Date());

        l_POJO_stale.forEach(s -> s.setStale(true));
        System.out.println("Set Stale: "+new java.util.Date());

        repository_name.saveAll(l_POJO_stale);
        System.out.println("Save stale: "+new java.util.Date());

        try 
        {
            repository_name.insert(l_POJO);
        } 
        catch (Exception e) 
        {
            System.out.println("Error in persisting new data");
        }
        System.out.println("Insert complete: "+new java.util.Date());
}

推荐答案

我不知道 spring,但它使用的 java 驱动程序可以进行异步插入.以这种方式保存实例的延迟决定了您的吞吐量 - 而不是查询的效率.即假设您对 C* 协调器有 10 毫秒的延迟,一次保存一个需要 30 秒(10 毫秒后 10 毫秒 * 1,500).

I dont know about spring, but the java driver that its using can do the inserts asynchronously. With you saving in this way the latency to your instance dictates your throughput - not the efficiency of your query. ie assume you have a 10ms latency to the C* coordinator, saving one at a time thats going to take 30 seconds (10ms there 10ms back * 1,500).

如果您同时使用 executeAsync 插入所有这些并阻止它们全部完成,您应该能够在不到一秒的时间内完成 1500 次,除非您的硬件供电不足(几乎应该是树莓派以外的任何东西)至少能够处理突发事件).也就是说,如果您的应用有任何并发​​性,您不希望每个人同时发送 1000 个插入,因此设置某种飞行油门(即限制为 128 的信号量)将是一个非常好的主意.

If you insert all of them with executeAsync at same time and block on them all completing you should be able to do 1500 in less than a second unless your hardware is very under powered (pretty much anything more than a raspberry pi should be able to handle that in bursts at least). That said if your app has any concurrency you don't want each sending 1000 inserts at same time so putting some kind of in flight throttle (ie a Semaphore with a 128 limit) would be a very good idea.

这篇关于spring 数据 cassandra 存储库上的缓慢插入和保存性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆