NoSpamLogger.java达到最大内存使用量Cassandra [英] NoSpamLogger.java Maximum memory usage reached Cassandra

查看:110
本文介绍了NoSpamLogger.java达到最大内存使用量Cassandra的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由5个节点组成的Cassandra集群,每个节点上约有650 GB的数据,涉及的复制因子为3。我最近开始在/var/log/cassandra/system.log中看到以下错误。 p>

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91-达到最大内存使用率(1.000GiB),无法分配1.000MiB的块



我试图增加file_cache_size_in_mb,但很快就会出现同样的错误。我尝试将此参数设置为2GB,但无济于事。



发生错误时,CPU使用率猛增,读取延迟非常不稳定。我看到这种激增大约每1/2小时出现一次。请注意以下列表中的时间。



INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91-达到最大内存使用率( 1.000GiB),无法分配1.000MiB
信息块[ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91-达到最大内存使用率(1.000GiB),无法分配1.000 MiB
INFO [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91-达到最大内存使用率(2.000GiB),无法分配1.000MiB
INFO [ReadStage- 28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91-达到最大内存使用量(2.000GiB),无法分配1.000MiB的块



两个我的表按小时分区,并且分区很大。例如以下是它们来自nodetool表统计信息的输出

 读取计数:4693453 
读取延迟:0.36752741680805157毫秒。
写入计数:561026
写入延迟:0.03742310516803143 ms。
待刷新:0
表:raw_data
SS表计数:55
已使用空间(活动):594395754275
已使用空间(总计):594395754275
空间快照使用的内存(总计):0
已用的堆内存不足(总计):360753372
SSTable压缩比:0.20022598072758296
密钥数(估计):45163
内存单元数: 90441
Memtable数据大小:685647925
Memtable使用的堆外内存:0
Memtable切换计数:1
本地读取计数:0
本地读取延迟:NaN ms
本地写入计数:126710
本地写入延迟:0.096 ms
待刷新:0
修复百分比:52.99
Bloom过滤器误报:167775
Bloom过滤器假比率:0.16152
使用的Bloom筛选器空间:264448
使用的Bloom筛选器堆内存:264008
索引摘要已使用的堆外内存:31060
已使用的堆外压缩压缩元数据:360458304
压缩分区的最小字节:51
**压缩分区的最大字节:3449259151 **
压缩分区的平均字节:16642499
每片平均活细胞(最近五分钟):1.0005435888450147
每片最大活细胞(最近五分钟):42
每片平均活块(最近五分钟):1.0
每片最大墓碑数(最近五分钟):1
掉落的变异数:0



读取计数:4712814
读取延迟:0.3356051004771247多发性硬化症。
写计数:643718
写延迟:0.04168356951335834 ms
待刷新:0
表:customer_profile_history
SS表计数:20
已使用空间(实时):9423364484
已使用空间(总计):9423364484
空间快照使用的内存(总计):0
使用的堆内存不足(总计):6560008
SSTable压缩比:0.1744084338623116
密钥数(估计):69
内存表计数: 35242
Memtable数据大小:789595302
Memtable使用的堆外内存:0
Memtable切换计数:1
本地读取计数:2307
本地读取延迟:NaN ms
本地写计数:51772
本地写延迟:0.076 ms
待刷新:0
修复百分比:0.0
Bloom过滤器误报:0
Bloom过滤器错误比率:0.00000
使用的Bloom筛选器空间:384
使用的Bloom筛选器关闭的堆内存:224
索引摘要关闭已使用的堆内存:400
已使用的堆内存压缩数据:6559384
压缩分区的最小字节:20502
**压缩分区的最大字节:4139110981 **
压缩分区的平均字节: 708736810
每片平均活细胞(最近五分钟):NaN
每片平均活细胞(最近五分钟):0
每片平均墓碑(最近五分钟):NaN
每片最大墓碑(最近五分钟):0
删除的变异:0

  cdsdb / raw_data直方图
百分比SSTables写入延迟读取延迟分区大小单元格计数
( micros)(micros)(bytes)
50%0.00 61.21 0.00 1955666 642
75%1.00 73.46 0.00 17436917 4768
95%3.00 105.78 0.00 107964792 24601
98%8.00 219.34 0.00 186563160 42510
99%12.00 315.85 0.00 268650950 61214
最低0.00 6.87 0.00 51 0
最大值14.00 1358.10 0.00 3449259151 7007506

cdsdb / customer_profile_history直方图
SSTables写入延迟读取延迟分区大小单元格计数
(micros)(micros)(bytes)
50 %0.00 73.46 0.00 223875792 61214
75%0.00 88.15 0.00 668489532 182785
95% 0.00 152.32 0.00 1996099046 654949
98%0.00 785.94 0.00 3449259151 1358102
99%0.00 943.13 0.00 3449259151 1358102
最低0.00 24.60 0.00 5723 4
最高0.00 5839.59 0.00 5960319812 1955666

能否请您提出减轻此问题的方法?

解决方案

基于发布的cfhistograms输出,分区是巨大的。


raw_data表的95%的分区大小为107MB,最大
为3.44GB。 customer_profile_history的95%百分位数的分区
大小为1.99GB,最大分区为5.96GB。


这显然与问题有关您会注意到每隔半小时就会将这些巨大的分区写入sstable。数据模型必须更改,并且最好根据其上方的分区大小来将分区间隔设置为分钟而不是小时。因此2GB的分区将减少到33MB的分区。



建议的分区大小是将其最大保留为100MB。虽然理论上我们可以存储100MB以上的内存,但是性能却会受到影响。记住,每次读取该分区时都会通过网络传输超过100MB的数据。在您的情况下,它的容量超过2GB,并因此影响了所有性能。


I have a 5 node cluster of Cassandra, with ~650 GB of data on each node involving a replication factor of 3. I have recently started seeing the following error in /var/log/cassandra/system.log.

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB

I have attempted to increase the file_cache_size_in_mb, but sooner rather than later this same error catches up. I have tried to go as high as 2GB for this parameter, but to no avail.

When the error happens, the CPU utilisation soars and the read latencies are terribly erratic. I see this surge show up approximated every 1/2 hour. Note the timings in the list below.

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91 - Maximum memory usage reached (1.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB INFO [ReadStage-28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91 - Maximum memory usage reached (2.000GiB), cannot allocate chunk of 1.000MiB

Two of the tables that I have are partitioned by hour, and the partitions are large. Ex. Here are their outputs from nodetool table stats

    Read Count: 4693453
    Read Latency: 0.36752741680805157 ms.
    Write Count: 561026
    Write Latency: 0.03742310516803143 ms.
    Pending Flushes: 0
        Table: raw_data
        SSTable count: 55
        Space used (live): 594395754275
        Space used (total): 594395754275
        Space used by snapshots (total): 0
        Off heap memory used (total): 360753372
        SSTable Compression Ratio: 0.20022598072758296
        Number of keys (estimate): 45163
        Memtable cell count: 90441
        Memtable data size: 685647925
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 126710
        Local write latency: 0.096 ms
        Pending flushes: 0
        Percent repaired: 52.99
        Bloom filter false positives: 167775
        Bloom filter false ratio: 0.16152
        Bloom filter space used: 264448
        Bloom filter off heap memory used: 264008
        Index summary off heap memory used: 31060
        Compression metadata off heap memory used: 360458304
        Compacted partition minimum bytes: 51
        **Compacted partition maximum bytes: 3449259151**
        Compacted partition mean bytes: 16642499
        Average live cells per slice (last five minutes): 1.0005435888450147
        Maximum live cells per slice (last five minutes): 42
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1
        Dropped Mutations: 0



    Read Count: 4712814
    Read Latency: 0.3356051004771247 ms.
    Write Count: 643718
    Write Latency: 0.04168356951335834 ms.
    Pending Flushes: 0
        Table: customer_profile_history
        SSTable count: 20
        Space used (live): 9423364484
        Space used (total): 9423364484
        Space used by snapshots (total): 0
        Off heap memory used (total): 6560008
        SSTable Compression Ratio: 0.1744084338623116
        Number of keys (estimate): 69
        Memtable cell count: 35242
        Memtable data size: 789595302
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 2307
        Local read latency: NaN ms
        Local write count: 51772
        Local write latency: 0.076 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 384
        Bloom filter off heap memory used: 224
        Index summary off heap memory used: 400
        Compression metadata off heap memory used: 6559384
        Compacted partition minimum bytes: 20502
        **Compacted partition maximum bytes: 4139110981**
        Compacted partition mean bytes: 708736810
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 0

Here goes:

cdsdb/raw_data histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             61.21              0.00           1955666               642
75%             1.00             73.46              0.00          17436917              4768
95%             3.00            105.78              0.00         107964792             24601
98%             8.00            219.34              0.00         186563160             42510
99%            12.00            315.85              0.00         268650950             61214
Min             0.00              6.87              0.00                51                 0
Max            14.00           1358.10              0.00        3449259151           7007506

cdsdb/customer_profile_history histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             73.46              0.00         223875792             61214
75%             0.00             88.15              0.00         668489532            182785
95%             0.00            152.32              0.00        1996099046            654949
98%             0.00            785.94              0.00        3449259151           1358102
99%             0.00            943.13              0.00        3449259151           1358102
Min             0.00             24.60              0.00              5723                 4
Max             0.00           5839.59              0.00        5960319812           1955666

Could you please suggest a way forward to mitigate this issue?

解决方案

Based on the cfhistograms output posted, the partitions are enormous.

95% percentile of raw_data table has partition size of 107MB and max of 3.44GB. 95% percentile of customer_profile_history has partition size of 1.99GB and max of 5.96GB.

This clearly relates to the problem you notice every half-hour as these huge partitions are written to the sstable. The data-model has to change and based on the partition size above its better to have a partition interval as "minute" instead of "hour". So a 2GB partition would reduce to 33MB partition.

Recommended partition size is to keep it as close to 100MB maximum. Though theoretically we can store more than 100MB, the performance is going to suffer. Remember every read of that partition is over 100MB of data through the wire. In your case, its over 2GB and hence all the performance implications along with it.

这篇关于NoSpamLogger.java达到最大内存使用量Cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆