如何提高cassandra的写性能? [英] How to increase the write performance in cassandra?

查看:561
本文介绍了如何提高cassandra的写性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为电子邮件列的家庭,我将电子邮件保存到此CF,它需要100 +秒来写入5000个邮件。



我使用i3处理器,8gb ram。
我的数据中心有6个节点,复制因子= 2。



我们存储到Cassandra的数据大小会影响性能吗?
所有影响写入性能的因素如何提高性能?



提前感谢..




  • 解决方案

解决方案

客户端和群集之间以及群集中的计算机之间的延迟(如 @omnibear 所述)
  • 您正在使用的复制因素 - 如果一个接一个地插入电子邮件复制因素可能会影响单个操作的延迟,这将导致总时间增加;我是说 - 你可以考虑批量写入操作。

  • 你写了你使用i3 / 8gb - 是客户端还是服务器的配置?配置服务器机器,尤其是内存和其上运行的其他进程的数量显然可能会影响性能

  • 提交日志和数据文件位置 - 建议放置提交日志在与数据文件不同的物理磁盘上。

  • 压缩策略 - 我认为在您的情况下并不重要,但一般来说,它也会影响写入的性能; Cassandra首先将数据写入memtable和提交日志,然后提交日志刷新到sstables,最后sstables被合并(这称为compaction);可以调整该过程的参数以在特定使用情况下提高性能;您可以在C * 此处中阅读有关写入路径

  • 您还可以浏览有关性能的大量DataStax文档说明:( http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_throughput_c.html ),( http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html )和( http://www.datastax.com/documentation/ cassandra / 2.0 / cassandra / operations / ops_tune_jvm_c.html



  • 另外,也许你应该考虑增加复制因子到3,因为rf = 2不会给你太多 - 如果使用一致性级别=仲裁,并且一个节点失败,您将无法使用您的集群;如果你决定在cl = quorum中使用rf = 3,你仍然需要读/写2个节点,如果你想实现强的一致性,但是另外,放弃一个节点不会使集群不可用。


    I have a column family called Emails and i am saving mails into this CF, it is taking 100+seconds to write 5000 mails .

    I am using i3 processor, 8gb ram . My data center has 6 nodes with replication factor = 2.

    Does the size of the data what we store into the Cassandra affects the performance ? What are all the factors that affects write performance and how do i increase the performance ?

    Thanks in advance..

    解决方案

    Some of factors you are asking about are:

    • connection speed and latency between the client and the cluster, and between machines in the cluster (as mentioned by @omnibear)
    • replication factor you are using - if you insert emails one after another replication factor may affect the latency of the single operation, which will result in increased total time; I mean - you may consider batching write operations.
    • you've written that you use i3/8gb - is it a configuration of the client or server machines? configuration of the server machines, especially the amount of memory and other processes that are running on them obviously may affect the performance
    • commit log and data files location - it is recommended to place the commit log on a separate physical disk than data files
    • compaction strategy - I bet it does not matter in your case, but in general it also affects the performance of writes; Cassandra firstly writes data to the memtable and commit log, then commit logs are flushed to sstables, and finally sstables are merged (which is called compaction); the parameters of this process can be tuned to improve performance in particular use cases; you may read about the write path in C* here
    • you can also browse great DataStax documentation notes regarding performance: (http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_throughput_c.html), (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html) and (http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html)

    As an aside, maybe you should consider increasing replication factor to 3, because rf=2 will not give you much - if you use consistency level = quorum, and one node fails, you will not be able to use your cluster; if you decide to use rf=3 with cl=quorum, you still have to read/write to 2 nodes if you want to achieve strong consistency, but in addition, loosing a node will not make the cluster unavailable.

    这篇关于如何提高cassandra的写性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆