如何避免卡桑德拉二级索引? [英] how to avoid secondary indexes in cassandra?

查看:73
本文介绍了如何避免卡桑德拉二级索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我反复听到二级索引(在cassandra中)只是为了方便,而不是为了提高性能。只有在基数较低的情况下(例如,性别具有两个男性或女性值),才建议使用二级索引

I have heard repeatedly that secondary indexes (in cassandra) is only for convenience but not for better performance. The only case where it is recommended to use secondary indexes when you have low cardinality (such as gender column which has two values male or female)

请考虑以下示例:

CREATE TABLE users ( 
userID uuid, 
firstname text, 
lastname text, 
state text, 
zip int, 
PRIMARY KEY (userID) 
);

现在我无法执行此查询,除非在个用户上创建二级索引名字索引上的

right now I cannot do this query unless I create a secondary index on users on firstname index

select * from users where firstname='john'

如何对该表进行非规范化,以便可以进行以下查询:
这是使用复合键的唯一有效方法吗?
还有其他选择或建议吗?

How do I denormalize this table such that I can have this query: Is this the only efficient way by using composite keys? Any other alternatives or suggestions?

CREATE TABLE users ( 
    userID uuid, 
    firstname text, 
    lastname text, 
    state text, 
    zip int, 
    PRIMARY KEY (firstname,userID) 
    );


推荐答案

为了提出一个好的数据模型,您需要确定要执行的第一个 ALL 查询。如果您只需要按用户的名字(或名字和用户名)查找用户,那么您的第二种设计就可以了...

In order to come up with a good data model, you need to identify first ALL queries you would like to perform. If you only need to look up users by their firstname (or firstname and userID), then your second design is fine...

如果您还需要查找用户根据他们的姓氏,那么您可以创建另一个具有相同字段但主键为(lastname,userID)的表。显然,您将需要同时更新两个表。数据复制在Cassandra中很好。

If you also need to look up users by their last name, then you could create another table having the same fields but a primary key on (lastname, userID). Obviously you will need to update both tables in the same time. Data duplication is fine in Cassandra.

不过,如果您担心两个或多个表所需的空间,则可以创建一个按用户ID分区的单个用户表。 ,以及您要查询的字段的附加表:

Still, if you are concerned about the space needed for the two or more tables, you could create a single users table partitioned by user id, and additional tables for the fields you want to query by:

CREATE TABLE users ( 
    userID uuid, 
    firstname text, 
    lastname text, 
    state text, 
    zip int, 
    PRIMARY KEY (userID) 
);

CREATE TABLE users_by_firstname (
    firstname text,
    userid uuid,
    PRIMARY KEY (firstname, userid)
);

此解决方案的缺点是您将需要两个查询以用户的名字检索用户:

The disadvantage of this solution is that you will need two queries to retrieve users by their first name:

SELECT userid FROM users_by_firstname WHERE firstname = 'Joe';
SELECT * FROM users WHERE userid IN (...);

希望这会有所帮助

这篇关于如何避免卡桑德拉二级索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆