Spring Data Cassandra存储库上的慢插入和保存性能 [英] Slow insert and saveall performance on spring data cassandra repository

查看:313
本文介绍了Spring Data Cassandra存储库上的慢插入和保存性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用spring将1500条记录插入cassandra.我有一个保存这1500条记录的POJO列表,当我调用saveAll或在此数据上插入时,需要30秒才能完成此操作.有人可以建议我更快地完成此工作吗?我目前正在将Cassandra 3.11.2作为单节点测试集群运行.

I am trying to insert 1500 records using spring into cassandra. I have a list of POJOs which hold these 1500 records and when I call saveAll or insert on this data it takes 30 seconds to complete this operation. Can someone suggest a way for me to get this done faster? I am currently running Cassandra 3.11.2 as a single node Test cluster.

实体POJO:

package com.samplepoc.pojo;

import static org.springframework.data.cassandra.core.cql.PrimaryKeyType.PARTITIONED;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

import org.springframework.data.cassandra.core.mapping.Column;
import org.springframework.data.cassandra.core.mapping.PrimaryKeyColumn;
import org.springframework.data.cassandra.core.mapping.Table;

@Table("health")
public class POJOHealth
{
    @PrimaryKeyColumn(type=PARTITIONED)
    UUID primkey;
    @Column
    String col1;
    @Column
    String col2;
    @Column
    String col3;
    @Column
    String col4;
    @Column
    String col5;
    @Column
    Date ts;
    @Column
    boolean stale;
    @Column
    String col6;
    @Column
    String col7;
    @Column
    String col8;
    @Column
    String col9;
    @Column
    Map<String,String> data_map = new HashMap<String,String>();

    public POJOHealth(
             String col1,
             String col2,
             String col3,
             String col4,
             String col5,
             String col6,
             String col7,
             String col8,
             String col9,
             boolean stale,
             Date ts,
             Map<String,String> data_map
             )
    {
        this.primkey = UUID.randomUUID();
        this.col1=col1;
        this.col2=col2;
        this.col3=col3;
        this.col4=col4;
        this.col5=col5;
        this.col6=col6;
        this.col7=col7;
        this.col8=col8;
        this.col9=col9;
        this.ts=ts;
        this.data_map = data_map;
        this.stale=stale;
    }

    //getters & setter ommitted
}

持续服务代码段:

public void persist(List<POJO> l_POJO)
{
        System.out.println("Enter Persist: "+new java.util.Date());

        List<l_POJO> l_POJO_stale = repository_name.findBycol1AndStale("sample",false);
        System.out.println("Retrieve Old: "+new java.util.Date());

        l_POJO_stale.forEach(s -> s.setStale(true));
        System.out.println("Set Stale: "+new java.util.Date());

        repository_name.saveAll(l_POJO_stale);
        System.out.println("Save stale: "+new java.util.Date());

        try 
        {
            repository_name.insert(l_POJO);
        } 
        catch (Exception e) 
        {
            System.out.println("Error in persisting new data");
        }
        System.out.println("Insert complete: "+new java.util.Date());
}

推荐答案

我不了解spring,但是它使用的java驱动程序可以异步插入.通过这种方式进行保存,实例的延迟将决定您的吞吐量,而不是查询的效率.也就是说,假设您对C *协调器有10毫秒的延迟,一次节省了30秒(10毫秒那里有10毫秒返回* 1,500).

I dont know about spring, but the java driver that its using can do the inserts asynchronously. With you saving in this way the latency to your instance dictates your throughput - not the efficiency of your query. ie assume you have a 10ms latency to the C* coordinator, saving one at a time thats going to take 30 seconds (10ms there 10ms back * 1,500).

如果同时插入所有它们与executeAsync并阻止它们全部完成,则除非硬件电源不足,否则您应该能够在不到一秒钟的时间内完成1500(比树莓派还强得多)至少可以突发处理).就是说,如果您的应用具有任何并发​​性,那么您不希望每个应用都同时发送1000个插入,因此放置某种飞行节流阀(即信号量限制为128的信号量)将是一个好主意.

If you insert all of them with executeAsync at same time and block on them all completing you should be able to do 1500 in less than a second unless your hardware is very under powered (pretty much anything more than a raspberry pi should be able to handle that in bursts at least). That said if your app has any concurrency you don't want each sending 1000 inserts at same time so putting some kind of in flight throttle (ie a Semaphore with a 128 limit) would be a very good idea.

这篇关于Spring Data Cassandra存储库上的慢插入和保存性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆