即使数据存在,Cassandra也不会提供数据 [英] Cassandra gives no data even if data exists

查看:84
本文介绍了即使数据存在,Cassandra也不会提供数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复制系数为3的键空间。我正在将数据插入写一致性级别为1的Cassandra(具有单个数据中心的4节点集群)中。插入完成后,我正在读取一致性级别为定额(2)的数据。但是有时即使有数据,我有时也无法获取数据,但一段时间后,我仍使用相同的查询来获取数据。我不知道为什么Cassandra会这样。

I have a keyspace with replication factor of 3. I am inserting data into Cassandra (4 node cluster with single data center) with write consistency level one. After completion of insertions I am reading data with consistency level quorum (2). But I am not getting data sometimes even if data exists, after some time I am getting data with same query. I don't know why Cassandra behaves like this.

我的列族模式

CREATE TABLE input_data_profile.input_log_profile_1 (
    cid text,
    ctdon bigint,
    ctdat bigint,
    email text,
    addrs set<frozen<udt_addrs>>,
    asset set<frozen<udt_asset>>,
    cntno set<frozen<udt_cntno>>,
    dob frozen<udt_date>,
    dvc set<frozen<udt_dvc>>,
    eaka set<text>,
    edmn text,
    educ set<frozen<udt_educ>>,
    gen tinyint,
    hobby set<text>,
    income set<frozen<udt_income>>,
    interest set<text>,
    lang set<frozen<udt_lang>>,
    levnt set<frozen<udt_levnt>>,
    like map<text, frozen<set<text>>>,
    loc set<frozen<udt_loc>>,
    mapp set<text>,
    name frozen<udt_name>,
    params map<text, frozen<set<text>>>,
    prfsn set<frozen<udt_prfsn>>,
    rel set<frozen<udt_rel>>,
    rel_s tinyint,
    skills_prfsn set<frozen<udt_skill_prfsn>>,
    snw set<frozen<udt_snw>>,
    sport set<text>,
    status tinyint,
    z_addrs tinyint,
    z_asset tinyint,
    z_cntno tinyint,
    z_dob tinyint,
    z_dvc tinyint,
    z_eaka tinyint,
    z_educ tinyint,
    z_email tinyint,
    z_gen tinyint,
    z_hobby tinyint,
    z_income tinyint,
    z_interest tinyint,
    z_lang tinyint,
    z_levnt tinyint,
    z_like tinyint,
    z_loc tinyint,
    z_mapp tinyint,
    z_name tinyint,
    z_params tinyint,
    z_prfsn tinyint,
    z_rel tinyint,
    z_rel_s tinyint,
    z_skills_prfsn tinyint,
    z_snw tinyint,
    z_sport tinyint,
    PRIMARY KEY (cid, ctdon, ctdat, email)
) WITH CLUSTERING ORDER BY (ctdon ASC, ctdat ASC, email ASC)
CREATE INDEX input_log_profile_1_z_snw_idx ON input_data_profile.input_log_profile_1 (z_snw);
CREATE INDEX input_log_profile_1_z_prfsn_idx ON input_data_profile.input_log_profile_1 (z_prfsn);
CREATE INDEX input_log_profile_1_z_hobby_idx ON input_data_profile.input_log_profile_1 (z_hobby);
CREATE INDEX input_log_profile_1_z_rel_idx ON input_data_profile.input_log_profile_1 (z_rel);
CREATE INDEX input_log_profile_1_z_gen_idx ON input_data_profile.input_log_profile_1 (z_gen);
CREATE INDEX input_log_profile_1_z_mapp_idx ON input_data_profile.input_log_profile_1 (z_mapp);
CREATE INDEX input_log_profile_1_z_dvc_idx ON input_data_profile.input_log_profile_1 (z_dvc);
CREATE INDEX input_log_profile_1_z_skills_prfsn_idx ON input_data_profile.input_log_profile_1 (z_skills_prfsn);
CREATE INDEX input_log_profile_1_z_eaka_idx ON input_data_profile.input_log_profile_1 (z_eaka);
CREATE INDEX input_log_profile_1_z_name_idx ON input_data_profile.input_log_profile_1 (z_name);
CREATE INDEX input_log_profile_1_z_cntno_idx ON input_data_profile.input_log_profile_1 (z_cntno);
CREATE INDEX input_log_profile_1_z_educ_idx ON input_data_profile.input_log_profile_1 (z_educ);
CREATE INDEX input_log_profile_1_z_loc_idx ON input_data_profile.input_log_profile_1 (z_loc);
CREATE INDEX input_log_profile_1_z_email_idx ON input_data_profile.input_log_profile_1 (z_email);
CREATE INDEX input_log_profile_1_z_interest_idx ON input_data_profile.input_log_profile_1 (z_interest);
CREATE INDEX input_log_profile_1_z_asset_idx ON input_data_profile.input_log_profile_1 (z_asset);
CREATE INDEX input_log_profile_1_z_like_idx ON input_data_profile.input_log_profile_1 (z_like);
CREATE INDEX input_log_profile_1_z_rel_s_idx ON input_data_profile.input_log_profile_1 (z_rel_s);
CREATE INDEX input_log_profile_1_z_lang_idx ON input_data_profile.input_log_profile_1 (z_lang);
CREATE INDEX input_log_profile_1_z_addrs_idx ON input_data_profile.input_log_profile_1 (z_addrs);
CREATE INDEX input_log_profile_1_z_dob_idx ON input_data_profile.input_log_profile_1 (z_dob);
CREATE INDEX input_log_profile_1_z_income_idx ON input_data_profile.input_log_profile_1 (z_income);
CREATE INDEX input_log_profile_1_z_sport_idx ON input_data_profile.input_log_profile_1 (z_sport);
CREATE INDEX input_log_profile_1_z_params_idx ON input_data_profile.input_log_profile_1 (z_params);

我需要明智地处理字段,因此我索引了每个字段的状态。我想提高读写tps。建议我对模式进行一些修改。

I need to process fields wise so I indexed the every field status. I want to improve the read and write tps. Suggest me some modifications in schema.

推荐答案

如果我对您的理解正确,那么您实际上是在问两个问题:

If I understand you correctly then you are really asking two questions here:

首先,您正在使用CL = 1写入数据,并使用CL = Quorum读取数据,并且想知道为什么不总是检索您已写入的数据,但是以后可以检索它。如果正确,那么这是Cassandra的预期行为。当使用CL = 1写入时,则要响应的3个副本中的第一个副本将成功写入客户端。如果随后在将数据写入其他副本之前尝试使用Quorum进行读取,则可能无法获得任何(或过时的)数据返回给您。这是Cassandra最终的一致性部分。如果您试图在成功写入后立即读取数据,则可能是造成问题的原因,因为在Cassandra和大多数其他分布式系统中,写入后读取是一种反模式。

First, you are writing data with a CL=1 and reading it with a CL=Quorum and wondering why you are not always retrieving the data you have written but then can retrieve it later. If this is correct then this is the expected behavior of Cassandra. When writing with a CL=1 then the first of the 3 replicas to respond will return a successful write to the client. If you then tried to read using Quorum prior to the data being written to the other replicas then its possible you could get nothing (or stale) data returned to you. This is the eventual consistency part of Cassandra. If you are trying to read the data immediately after a successful write then this is likely the cause of your problems as "Read after Write" is an anti-pattern in Cassandra and most other distributed systems.

第二,在您的数据模式中,索引使用不正确。如果您使用索引来允许您在这些字段上进行查询,那么这是一种反模式,尤其是对于您拥有的数字而言,更是如此。 Cassandra中的索引是一项昂贵的操作,仅应在少数情况下(其中要索引的列具有低基数)使用。参见 https://docs.datastax.com/en/ cql / 3.1 / cql / ddl / ddl_when_use_index_c.html

Second, in your data schema you are using Indexes incorrectly. If you are using indexes to allow you to query on those fields then this is an anti-pattern especially with the large number that you have. Indexes in Cassandra are an expensive operation that should only be used in rare cases where the column being indexed has low-cardinality. See this https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html

如果需要按大量列进行查询,则需要重新评估数据模型为Cassandra在按表查询的方法上进行了优化,在该方法中,您仅对主键中的字段进行查询。这要求您将数据非规范化为多个不同的表,以便构建具有合理复杂性的任何应用程序。这是您在选择Cassandra提供的性能,高可用性和可伸缩性时要做出的权衡之一。如果您确实需要对数据执行即席查询的功能,建议您使用其他数据存储。

If you need to query by that large number of columns then you need to reevaluate your data model as Cassandra is optimized on a table-per-query methodology where you query on the fields in the primary key alone. This requires you to denormalize your data into multiple different tables in order to build any application of reasonable complexity. This is one of the tradeoffs you make when choosing the performance, high availability and scalability that Cassandra provides. If you truly need the ability to perform ad-hoc queries on your data I suggest you look at a different datastore.

这篇关于即使数据存在,Cassandra也不会提供数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆