Cassandra读取超时是否由于达到某些响应大小限制? [英] Cassandra read timeout because of some response size limit reached?

查看:82
本文介绍了Cassandra读取超时是否由于达到某些响应大小限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在cassandra 3.0上遇到了一个奇怪的行为:

I encountered a strange behavior on cassandra 3.0:

我有下表:

CREATE TABLE table (
  id text,
  ts text,
  score decimal,
  type text,
  values text,
  PRIMARY KEY (id, ts)
) WITH CLUSTERING ORDER BY (ts DESC) 

以下查询(立即返回):

and the following query (which returns instantly):

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');

如果我在IN子句中添加另一天,则响应永远不会到来(即使10分钟!!!!):

SELECT * FROM keyspace.table WHERE id ='someId'AND ts IN('2017-10-15',' 2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017- 10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10- 28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03' ,'2017-11-04','2017-11-05','2017-11-06', '2017-11-07' );

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06', '2017-11-07');

值列可能包含较大的json数据。 我想在cassandra.yaml中有一些带有某个大小阈值或类似阈值的标志吗?我想在查询中添加另一天达到某个限制...在cassandra system.log中,我没有看到与此相关的任何内容

The 'values' column may have large json data. There is some flag in cassandra.yaml with some size threshold or something like this? I guess adding another day in the query reaches some limit somewhere...in cassandra system.log I didn't see anything relevant to this.

推荐答案

如果它在一个节点上成功而不在另一个节点上成功,而查询将使用少1个 in子句,我猜这是一个内存压力问题。要消除查询解析问题,您可以将查询重新编写为:

If it succeeds on one node and not another while the query will work with 1 less 'in' clause I would guess this is a memory pressure issue. To eliminate the 'query parsing problem' you can re-write your query as:

SELECT * FROM myTable WHERE id ='x'AND ts > ='2017-10-15'AND ts< ='2017-11-07';

仅当您开始存储数据时,in子句才真正有用。如果您有热点,或者看到1个节点的负载比其他节点高得多,则这是一种好方法。

The in clause is only truly useful if you start bucketing your data. This is a good approach if you have hotspots or if you see 1 node with much higher load than the others.

要存储数据,您需要执行以下操作:

CREATE TABLE表(
id文本,
ts文本,
分数小数点,
类型的文本,
个值文本,
主键((id,ts),类型)
)使用簇排序依据(类型为DESC)

现在,您的数据将按id和day进行分区。您的查询将变成您现在拥有的内容:

To bucket your data you would want to do something like: CREATE TABLE table ( id text, ts text, score decimal, type text, values text, PRIMARY KEY ((id, ts), type) ) WITH CLUSTERING ORDER BY (type DESC) Your data would now be partitioned by id AND day. Your query would then become what you have now:

SELECT * FROM myTable WHERE id ='x'AND ts in('2017-01- 01')

这将更好地在HDD上分发数据,并允许更好地与cassandra并行化。这将不会解决内存压力问题。要解决此问题,您需要将数据聚合从协调器移动到应用程序层。

This will better distribute data on the HDDs and allow better parallelization from cassandra. This WILL NOT fix the memory pressure issue. To fix that you would want to move the aggregation of data from the coordinator to your application layer.

这意味着运行N SELECT ... WHERE id ='x'并且ts ='2017-01-01'; 查询。

This means running N SELECT ... WHERE id='x' and ts = '2017-01-01'; queries.

这篇关于Cassandra读取超时是否由于达到某些响应大小限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆