用于大列的 Cassandra Wide 与 Skinny Rows [英] Cassandra Wide Vs Skinny Rows for large columns

查看：15 发布时间：2021/12/31 17:35:15 performance cassandra schema

本文介绍了用于大列的 Cassandra Wide 与 Skinny Rows的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我每天需要向 cassandra 中插入 60GB 的数据.

I need to insert 60GB of data into cassandra per day.

这分解为
100套钥匙
每组 150,000 个键
每个键 4KB 数据

This breaks down into
100 sets of keys
150,000 keys per set
4KB of data per key

就写入性能而言，我最好使用
每组 1 行，每行 150,000 个键
每组 10 行，每行 15,000 个键
每组 100 行，每行 1,500 个键
每组 1000 行，每行 150 个键

In terms of write performance am I better off using
1 row per set with 150,000 keys per row
10 rows per set with 15,000 keys per row
100 rows per set with 1,500 keys per row
1000 rows per set with 150 keys per row

要考虑的另一个变量，我的数据在 24 小时后过期，所以我使用 TTL=86400 自动过期

Another variable to consider, my data expires after 24 hours so I am using TTL=86400 to automate expiration

关于我的配置的更多具体细节:

More specific details about my configuration:

CREATE TABLE stuff (
  stuff_id text,
  stuff_column text,
  value blob,
  PRIMARY KEY (stuff_id, stuff_column)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=39600 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'tombstone_compaction_interval': '43200', 'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

访问模式详细信息:
4KB 值是一组 1000 个 4 字节浮点数打包成一个字符串.

Access pattern details:
The 4KB value is a set of 1000 4 byte floats packed into a string.

一个典型的请求需要随机选择 20 - 60 个这些浮点数.

A typical request is going to need a random selection of 20 - 60 of those floats.

最初，这些浮点数都存储在相同的逻辑行和列中.如果所有数据都写入具有 150,000 列的一行，则此处的逻辑行表示给定时间的一组数据.

Initially, those floats are all stored in the same logical row and column. A logical row here represents a set of data at a given time if it were all written to one row with 150,000 columns.

随着时间的推移，一些数据被更新，在列集内的逻辑行内，打包字符串内的一组随机级别将被更新.新级别不是就地更新，而是与其他新数据一起写入新的逻辑行，以避免重写所有仍然有效的数据.这会导致碎片化，因为现在需要访问多行来检索那组 20 - 60 个值.现在，请求通常会从 1 - 5 行不同的同一列读取.

As time passes some of the data is updated, within a logical row within the set of columns, a random set of levels within the packed string will be updated. Instead of updating in place, the new levels are written to a new logical row combined with other new data to avoid rewriting all of the data which is still valid. This leads to fragmentation as multiple rows now need to be accessed to retrieve that set of 20 - 60 values. A request will now typically read from the same column across 1 - 5 different rows.

测试方法我为每个配置编写了 5 个随机数据样本，并对结果求平均值.费率计算为 (Bytes_written/(time * 10^6)).时间以秒为单位，精确到毫秒.Pycassa 被用作 Cassandra 接口.使用了 Pycassa 批量插入操作符.每次插入将多列插入到一行中，插入大小限制为 12 MB.队列刷新为 12MB 或更少.大小不考虑行和列开销，只考虑数据.数据源和数据接收器在不同系统上的同一网络上.

Test Method I wrote 5 samples of random data for each configuration and averaged the results. Rates were calculated as (Bytes_written / (time * 10^6)). Time was measured in seconds with millisecond precision. Pycassa was used as the Cassandra interface. The Pycassa batch insert operator was used. Each insert inserts multiple columns to a single row, insert sizes are limited to 12 MB. The queue is flushed at 12MB or less. Sizes do not account for row and column overhead, just data. The data source and data sink are on the same network on different systems.

写入结果请记住，由于 Cassandra 配置的复杂性，还有许多其他变量在起作用.
1 行每行 150,000 个键:14 MBps
10 行每行 15,000 个键:15 MBps
100 行每行 1,500 个键:18 MBps
1000 行每行 150 个键:11 MBps

Write results Keep in mind there are a number of other variables in play due to the complexity of the Cassandra configuration.
1 row 150,000 keys per row: 14 MBps
10 rows 15,000 keys per row: 15 MBps
100 rows 1,500 keys per row: 18 MBps
1000 rows 150 keys per row: 11 MBps

用于大列的 Cassandra Wide 与 Skinny Rows [英] Cassandra Wide Vs Skinny Rows for large columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于大列的 Cassandra Wide 与 Skinny Rows [英] Cassandra Wide Vs Skinny Rows for large columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭