Cassandra数据库淹没了? [英] Cassandra Database overwhelmed?

查看:168
本文介绍了Cassandra数据库淹没了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下查询在cassandra数据库中创建了一个表:

I created a table in a cassandra database with the following query:

CREATE TABLE table(
  num int,
  part_key int,
  val1 int,
  val2 float,
  val3 text,
  ...,
  PRIMARY KEY((part_key),num)
);

表存储来自技术设备的数据。分区键part_key对于每个记录为1,因为我想只在一个服务器上执行范围查询。我知道这是Cassandra的一个坏的用例,但我需要做这个比较。

The table stores data from a technical device. The partitioning key part_key is 1 for the every record, because I want to execute range queries on only one server. I know this is a bad use case for Cassandra, but I need to do this for a comparison.

主键num是记录的编号(从1到

The primary key num is the number of the record (from 1 to 8.000.000).

每个记录有四百个其他值,分别是float,int和text类型。 I在此表中插入8.000.000条记录(43 GB),并想要运行我的查询:

There are like 400 other values per record that are float, int and text type. I Inserted 8.000.000 records to this table (43 GB) and wanted to run my queries like:

SELECT num, val1, val45, val90 
FROM ks.table 
WHERE part_key=1 AND num>9999 AND num<20001;

我在cql-shell中执行查询并得到操作超时。因此,我将cassandra.yaml文件中的read_request_timeout_in_ms和range_request_timeout_in_ms更改为60000(2分钟)。

I executed the query in the cql-shell and got "operation timed out". So I changed read_request_timeout_in_ms and range_request_timeout_in_ms in the cassandra.yaml file to 60000 (2 minutes).

再次执行查询时出现错误10054:由remotehost5分钟后。 Datastax Cassandra社区服务器2.0.11服务在服务器上不再运行。

When executing the query again I got "Error 10054: the existing connection was closed by the remotehost" after 5 minutes. The Datastax Cassandra Community Server 2.0.11 Service was not running anymore on the server.

我重新启动了服务,再次尝试,服务再次崩溃。我甚至不能重新启动服务,我不得不重新启动服务器。
我也尝试过使用Cassandra cpp驱动程序,也无法执行此查询。

I restarted the service, tried it again and the service crashed again. I could not even restart the service and I had to restart the server. I also tried this using the Cassandra cpp-driver and also could not execute this query.

小查询,例如

... AND num<1000;

是:我做错了什么吗?我知道Cassandra是更好的与更多的节点,但我认为Cassandra只需要一些更多的时间。是否可能,Cassandra无法执行这样的查询?

My question is: did I do something wrong? I know Cassandra is better with more nodes but I thought that Cassandra would only need some more time. Is it possible, that Cassandra is unable to execute a query like that?

谢谢!

服务器:

Intel(R)Xeon(R)CPU E5504 @ 2.00GHz 2.00GHz(2颗处理器)/ 16GB RAM

Intel(R) Xeon(R) CPU E5504 @ 2.00GHz 2.00GHz (2 processors) / 16GB RAM

CPU利用率:50% - 60%,15秒后约30%/ RAM:2.9 GB整个时间

CPU utilization: 50% - 60% and after 15 seconds around 30% / RAM: 2,9 GB the whole time

编辑:

我的Cassandra键空间现在是60GB和小型查询

My Cassandra keyspace is now 60GB and small queries like

... AND num<10;

,甚至插入返回超时。有时服务崩溃...
请有人知道一个想法解释了吗?一个回答说,43GB的节点在具有更多节点的集群中与在仅具有一个节点的集群中不同。有人可以解释这个吗?

and even the Inserts return time out. Sometimes the service crashes... Please can someone who got an idea explain that? One answer said that a node with 43GB is not the same in a cluster with more nodes as in my cluster with only one node. Can somebody explain this?

谢谢!

推荐答案

这里的关键问题是cqlsh与您正在运行的C *版本不通过结果页面。这意味着整个结果集必须在查询时序列化,给定您的数据模型将非常大(如kha指出)。我将尝试使用启用分页的驱动程序执行类似的查询,当然确保您有足够的网络带宽用于返回数据。

One of the key issues here is that cqlsh with the version of C* that you are running does not page through results. This means the entire result-set has to be serialized at the time of the query which given your data model will be quite large(as pointed out by kha). I would try performing similar queries using a paging enabled driver and of course make sure that you have sufficient network bandwith for returning the data.

43GB应该可以通过单个C *节点轻松处理,虽然只使用单个节点运行C *集群几乎牺牲了C *提供的所有优点。

43GB Should be easily handled by a single C* node, although operating a C* cluster with only a single node sacrifices almost all of the benefits that C* offers.

这篇关于Cassandra数据库淹没了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆