可以在UUID列上创建索引吗? [英] Can an index be created on a UUID Column?

查看:984
本文介绍了可以在UUID列上创建索引吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在Cassandra的UUID / TIMEUUID列上创建索引?我正在测试一种模型设计,该模型在UUID列上有一个索引,但是对该列的查询总是返回找到的0行。

Is it possible to create an index on a UUID/TIMEUUID column in Cassandra? I'm testing out a model design which would have an index on a UUID column, but queries on that column always return 0 rows found.

我有一个像这样的表:

create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));

我使用以下命令创建索引:

I create an index with this command:

create index idx on some_data (run_id) ;

在创建此索引时,CQL不会引发任何错误。

No errors are thrown by CQL when I create this index.

我在表中有一小部分测试数据:

I have a small bit of test data in the table:

 site_id | user_id | run_id                               | value
---------+---------+--------------------------------------+-----------------
       1 |       1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d |               3

但是,当我运行查询时:

However, when I run the query:

select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d

CQLSH仅返回:(0行)

CQLSH just returns: (0 rows)

如果我使用 int 表示 run_id ,则索引的行为符合预期。

If I use an int for the run_id then the index behaves as expected.

推荐答案

是的,您可以在UUID上创建二级索引。真正的问题是你应该吗?

Yes, you can create a secondary index on a UUID. The real question is "should you?"

在任何情况下,我都按照您的步骤进行操作。

In any case, I followed your steps, and got it to work.

Connected to Test Cluster at 192.168.23.129:9042.
[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
aploetz@cqlsh> use stackoverflow ;
aploetz@cqlsh:stackoverflow> create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));
aploetz@cqlsh:stackoverflow> create index idx on some_data (run_id) ;
aploetz@cqlsh:stackoverflow> INSERT INTO some_data (site_id, user_id, run_id, value) VALUES (1,1,9e118af0-ac92-11e4-81ae-8d1bc921f26d,3);
aploetz@cqlsh:stackoverflow> select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;
code=2200 [Invalid query] message="unconfigured columnfamily usr_rec3"
aploetz@cqlsh:stackoverflow> select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;

 site_id | user_id | run_id                               | value
---------+---------+--------------------------------------+-------
       1 |       1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d |     3

(1 rows)

不过请注意,当我运行此程序时命令,它失败了:

Notice though, that when I ran this command, it failed:

select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d

您确定不是要从 some_data 中选择

此外,在高基数列(例如UUID)上创建二级索引通常也不是一个好主意。如果需要通过 run_id 进行查询,则应重新访问数据模型并提供一个合适的查询表来提供服务。

Also, creating secondary indexes on high-cardinality columns (like a UUID) is generally not a good idea. If you need to query by run_id, then you should revisit your data model and come up with an appropriate query table to serve that.

说明:


  • 通常,不建议使用二级索引。在新书 Cassandra高可用性中,Robbie Strickland

  • 仅仅因为列属于UUID数据类型并不一定使其具有高基数,所以将它们标识为反模式。

  • 这更是您的数据模型问题。但是,知道UUID的性质及其唯一性的潜在目的是在引爆危险信号。

  • 将这两点放在一起,在UUID上创建索引没有任何意义。听起来很吸引我。如果它是我的集群,并且(更重要的是)我以后必须支持它,那我就不会这样做。

  • Using secondary indexes in general is not considered good practice. In the new book Cassandra High Availability, Robbie Strickland identifies their use as an anti-pattern, due to poor performance.
  • Just because a column is of the UUID data type doesn't necessarily make it high-cardinality. That's more of a data model question for you. But knowing the nature of UUIDs and their underlying purpose toward being unique, is setting off red flags.
  • Put these two points together, and there isn't anything about creating an index on a UUID that sounds appealing to me. If it were my cluster, and (more importantly) I had to support it later, I wouldn't do it.

这篇关于可以在UUID列上创建索引吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆