可以在UUID列上创建索引吗? [英] Can an index be created on a UUID Column?
问题描述
是否可以在Cassandra的UUID / TIMEUUID列上创建索引?我正在测试一种模型设计,该模型在UUID列上有一个索引,但是对该列的查询总是返回找到的0行。
Is it possible to create an index on a UUID/TIMEUUID column in Cassandra? I'm testing out a model design which would have an index on a UUID column, but queries on that column always return 0 rows found.
我有一个像这样的表:
create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));
我使用以下命令创建索引:
I create an index with this command:
create index idx on some_data (run_id) ;
在创建此索引时,CQL不会引发任何错误。
No errors are thrown by CQL when I create this index.
我在表中有一小部分测试数据:
I have a small bit of test data in the table:
site_id | user_id | run_id | value
---------+---------+--------------------------------------+-----------------
1 | 1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d | 3
但是,当我运行查询时:
However, when I run the query:
select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d
CQLSH仅返回:(0行)
CQLSH just returns: (0 rows)
如果我使用 int
表示 run_id
,则索引的行为符合预期。
If I use an int
for the run_id
then the index behaves as expected.
推荐答案
是的,您可以在UUID上创建二级索引。真正的问题是你应该吗?
Yes, you can create a secondary index on a UUID. The real question is "should you?"
在任何情况下,我都按照您的步骤进行操作。
In any case, I followed your steps, and got it to work.
Connected to Test Cluster at 192.168.23.129:9042.
[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
aploetz@cqlsh> use stackoverflow ;
aploetz@cqlsh:stackoverflow> create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));
aploetz@cqlsh:stackoverflow> create index idx on some_data (run_id) ;
aploetz@cqlsh:stackoverflow> INSERT INTO some_data (site_id, user_id, run_id, value) VALUES (1,1,9e118af0-ac92-11e4-81ae-8d1bc921f26d,3);
aploetz@cqlsh:stackoverflow> select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;
code=2200 [Invalid query] message="unconfigured columnfamily usr_rec3"
aploetz@cqlsh:stackoverflow> select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;
site_id | user_id | run_id | value
---------+---------+--------------------------------------+-------
1 | 1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d | 3
(1 rows)
不过请注意,当我运行此程序时命令,它失败了:
Notice though, that when I ran this command, it failed:
select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d
您确定不是要从 some_data
中选择
此外,在高基数列(例如UUID)上创建二级索引通常也不是一个好主意。如果需要通过 run_id
进行查询,则应重新访问数据模型并提供一个合适的查询表来提供服务。
Also, creating secondary indexes on high-cardinality columns (like a UUID) is generally not a good idea. If you need to query by run_id
, then you should revisit your data model and come up with an appropriate query table to serve that.
说明:
- 通常,不建议使用二级索引。在新书 Cassandra高可用性中,Robbie Strickland
- 仅仅因为列属于UUID数据类型并不一定使其具有高基数,所以将它们标识为反模式。
- 这更是您的数据模型问题。但是,知道UUID的性质及其唯一性的潜在目的是在引爆危险信号。
- 将这两点放在一起,在UUID上创建索引没有任何意义。听起来很吸引我。如果它是我的集群,并且(更重要的是)我以后必须支持它,那我就不会这样做。
- Using secondary indexes in general is not considered good practice. In the new book Cassandra High Availability, Robbie Strickland identifies their use as an anti-pattern, due to poor performance.
- Just because a column is of the UUID data type doesn't necessarily make it high-cardinality. That's more of a data model question for you. But knowing the nature of UUIDs and their underlying purpose toward being unique, is setting off red flags.
- Put these two points together, and there isn't anything about creating an index on a UUID that sounds appealing to me. If it were my cluster, and (more importantly) I had to support it later, I wouldn't do it.
这篇关于可以在UUID列上创建索引吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!