Cassandra读超时 [英] Cassandra read timeout

查看:1491
本文介绍了Cassandra读超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从cassandra 2.0提取大量的数据,但不幸的是得到超时异常。
我的表:

  CREATE KEYSPACE StatisticsKeyspace 
WITH REPLICATION = {'class':'SimpleStrategy' 'replication_factor':3};


CREATE TABLE StatisticsKeyspace.HourlyStatistics(
KeywordId text,
日期时间戳,
HourOfDay int,
Impressions int,
点击int,
AveragePosition double,
ConversionRate double,
AOV double,
AverageCPC double,
Cost double,
Bid double,
PRIMARY KEY(KeywordId,Date,HourOfDay)
);
CREATE INDEX ON StatisticsKeyspace.HourlyStatistics(Date);

我的查询:

  SELECT KeywordId,Date,HourOfDay,Impressions,Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
From StatisticsKeyspace.hourlystatistics
WHERE Date> ='2014-03-22' AND Date< ='2014-03-24'

我更改了 cassandra.yaml 文件。

  read_request_timeout_in_ms:60000 
range_request_timeout_in_ms:60000
write_request_timeout_in_ms:40000
cas_contention_timeout_in_ms:3000
truncate_request_timeout_in_ms:60000
request_timeout_in_ms:60000

但它仍然会在10秒内抛出超时。任何想法如何解决这个问题?

解决方案

如果使用datastax的java客户端,分页默认启用如果你仍然得到一个超时,你可以尝试使用

减少这个。

  public Statement setFetchSize(int fetchSize)

阅读更多



如果您使用cli,可能需要进行实验某些人工分页:

 选择KeywordId,Date,HourOfDay,Impressions,Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
FROM StatisticsKeyspace.hourlystatistics
WHERE日期> ='2014-03-22'AND日期< ='2014-03-24'
限制100;

SELECT * FROM .... WHERE token(KeywordId)> token([Last KeywordId received])AND ...
LIMIT 100;

要检测一些集群问题,您可以尝试一个限制为1的select,也许有一个底层



如果您的查询仍然遇到性能问题,我会查看在你的二级索引,因为数据传输量似乎是合理的(只返回小数据类型)。如果我是对的,改变抓取大小不会改变太多。
相反,您是否只在日期(时间戳)列中插入日期?如果要插入实际时间戳,则由于基数,此列上的辅助索引将非常慢。如果仅插入日期,则时间戳将默认为日期+00 :00:00+ TZ,其应该降低基数,从而提高查找速度。 (注意时区问题!)要绝对确定,请尝试在具有不同数据类型的列上使用辅助索引,例如Date(计算自1970-01-01或sth之后的天数)的int。


I am pulling big amount of data from cassandra 2.0, but unfortunately getting timeout exception. My table:

CREATE KEYSPACE StatisticsKeyspace
  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };


CREATE TABLE StatisticsKeyspace.HourlyStatistics(
KeywordId text,
Date timestamp,
HourOfDay int,
Impressions int,
Clicks int,
AveragePosition double,
ConversionRate double,
AOV double,
AverageCPC double,
Cost double,
Bid double,
PRIMARY KEY(KeywordId, Date, HourOfDay)
);
CREATE INDEX ON StatisticsKeyspace.HourlyStatistics(Date);

My query:

SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
FROM StatisticsKeyspace.hourlystatistics 
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24'

I've changed configurations in my cassandra.yaml file.

read_request_timeout_in_ms: 60000
range_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 40000
cas_contention_timeout_in_ms: 3000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 60000

But it still throws timeout approximately in 10 seconds. Any ideas how can I fix this problem?

解决方案

If using the java client from datastax, pagination is enabled by default with a row set of 5000. If you still get a timeout, you may try to reduce this using

public Statement setFetchSize(int fetchSize)

(read more)

If you are using the cli, you may need to experiment with some kind of manual pagination:

SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
FROM StatisticsKeyspace.hourlystatistics 
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24' 
LIMIT 100;

SELECT * FROM ....  WHERE token(KeywordId) > token([Last KeywordId received]) AND ...
LIMIT 100;

To detect some cluster issues you can try a select with a limit of 1, maybe there is an underlying problem.

Hope that helps.

If you are still experiencing performance issues with your query, I would look at your secondary index, since the amount of data transferred seems to reasonable (only 'small' data types are returned). If I am right, changing the fetch size will not change much. Instead, do you insert dates only in your "Date" (timestamp) column? If you are inserting actual timestamps instead, the secondary index on this column will be very slow due to the cardinality. If you insert a date only, the timestamp will default to date + "00:00:00" + TZ which should reduce the cardinality and thus improve the look-up speed. (watch out for timezone issues!) To be absolutely sure, try a secondary index on a column with a different data type, like an int for Date (counting the days since 1970-01-01 or sth).

这篇关于Cassandra读超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆