在客户端验证行比使用整个主键的二级索引更好? [英] Validating row at client side better than secondary index with whole primary key?

查看:15
本文介绍了在客户端验证行比使用整个主键的二级索引更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 cassandra 中,众所周知应该非常谨慎地使用二级索引.

In cassandra, it's well known that secondary indexes should be used very sparingly.

如果我有一张桌子,例如:

If I have a table for example:

User(username, usertype, email, etc..)

这里的用户名是分区键.现在我想支持当且仅当 usertype 是特定值 X 时返回特定用户(将给出用户名)的操作.

Here username is partition key. Now I want to support operation which returns a specific user(username will be given) if and only if usertype is a specific value X.

有两种方法可以做到:

一:在用户类型、可能的值('A'、'B'、'C')上创建二级索引用户名是分区键.

One: Create a secondary index on usertype, possible values ('A', 'B', 'C') and username is partition key.

SELECT * FROM user WHERE username='something' AND usertype='A';

二:

我可以将带有用户名的行提取到客户端,然后检查用户类型是否为 A.

I can just fetch the row with username to client and then check if usertype is A.

哪种方法更好?还请考虑一个宽行(不是很大,10 秒)的场景,其中并非分区的所有行都可能具有给定值(这需要一些客户端过滤).

Which approach is better? Please also consider a wide row(not so big, 10s) scenario where not all rows of a partition might have the given value (which requires some client side filtering).

关于二级索引,我不清楚的是如何在特定节点中查找数据.

What I'm not clear about secondary indexes is how data is looked up in a particular node.

例如:SELECT * FROM user WHERE username='something' AND usertype='A'

例如 usertype hidden CF has data 'A'-> 'jhon', 'miller', 'chris',...等,100个用户名

For example usertype hidden CF has data 'A'-> 'jhon', 'miller', 'chris',...etc, 100 usernames

并且带有分区键的查询与用户类型一起给出,它是否扫描所有这 100 个用户名以与用户名某物"匹配,或者它是否首先按用户名获取并查看用户类型列,如果它与A"匹配?它究竟是如何进行搜索的?给定索引的查询费用是基于低基数数据,并且每个数据都映射到多行?

And the query with partition key is given along with usertype does it scan through all these 100 usernames to match with the username 'something' or does it just fetches by username first and sees the usertype column if it matches with 'A'? How exactly it does that searching? How does the query fares given the index is on low cardinality data and each one is mapped to many rows?

如果重要的话,我使用 java 作为客户端.

I'm using java as client if that matters.

更新:我知道我可以在这个特定示例中使用集群(用户类型)键,但我想知道我问过的权衡.我的原始表格要复杂得多.

Update: I understand that I can use clustering (usertype) key for this particular example but I wanted to know the trade off I've asked. My original tables are much more complex.

推荐答案

在这个例子中,假设我创建了一个表来按船和 id 跟踪船员:

For this example, let's say I create a table to keep track of crew members by ship and id:

CREATE TABLE crewByShip (
  ship text,
  id int,
  firstname text,
  lastname text,
  gender text,
  PRIMARY KEY(ship,id));

我将在 gender 上创建一个索引:

And I'll create an index on gender:

CREATE INDEX crewByShipG_idx ON crewByShip(gender);

插入一些数据后,我的表格如下所示:

After inserting some data, my table looks like this:

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+-----------
 Serenity |  1 |     Hoban |      M | Washburne
 Serenity |  2 |      Zoey |      F | Washburne
 Serenity |  3 |   Malcolm |      M |  Reynolds
 Serenity |  4 |    Kaylee |      F |      Frye
 Serenity |  5 |  Sheppard |      M |      Book
 Serenity |  6 |     Jayne |      M |      Cobb
 Serenity |  7 |     Simon |      M |       Tam
 Serenity |  8 |     River |      F |       Tam
 Serenity |  9 |     Inara |      F |     Serra

现在我将打开跟踪,并使用 PRIMARY KEY 查询一个不同的行,但也通过我们对 gender 的索引进行限制.

Now I'll turn tracing on, and query a distinct row with the PRIMARY KEY, but also restrict by our index on gender.

aploetz@cqlsh:stackoverflow2> tracing on;
aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                                                                                                                                                                                                                                       | timestamp                  | source         | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                                                                                                                                                                                                                             Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 |              0
                                                                                                                                                                                                          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 |           3715
                                                                                                                                                                                                                                                                      Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 |           4846
                                                                                                                                                                                                                                                Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5730
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5757
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5793
                                                                                                                                                                                                                                                              Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5848
                                                                                                                                                                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           5856
                                                                                                                                                                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           7056
                                                                                                                                                                                                                                               Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 |           7080
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7143
                                                                                                                                                                                                                                                                Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7578
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7742
                                                                                                                                                                                              Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7807
                                                                                                                                                                                                                                  Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7851
                                                                                                                                                                                                                                          Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10848
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10936
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11007
                                                                                                                                                                                                                           Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11130
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11139
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11155
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11253
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11262
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 |          11281
                                                                                                                                                                                                                                           Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 |          11369
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11375
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11383
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 |          11409
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 |          11415
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11430
                                                                                                                                                                                                                                                             Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11490
                                                                                                                                                                                                                                                                                               Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 |          13679

现在,我将重新运行相同的查询,但在 gender 上没有多余的索引.

Now, I'll re-run the same query, but without the superfluous index on gender.

aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                        | timestamp                  | source         | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                              Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 |              0
          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 |             87
                                                       Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 |            246
                                 Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 |           1185
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1197
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1215
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1249
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1278
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3309
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3333
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           3368
                            Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           4607
                                              Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4633
                                               Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4643
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4678
                                Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4683
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 |           4697
                                                                                Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 |           5676

如您所见,带有二级索引的查询的source_elapsed"是没有索引的相同查询(返回同一行)的两倍多.

As you can see, the "source_elapsed" for the query with the secondary index was more than twice what it was for the same query (which returned the same row) without the index.

我认为我们可以肯定地说,在宽行表中的低基数列上使用二级索引效果不佳.现在虽然我不会说过滤客户端是一个好主意,在这种情况下,如果结果集很小,它可能是更好的选择.

I think we can definitely say that using a secondary index on a low-cardinality column in a wide-row table will not perform well. Now while I won't say that filtering client-side is a good idea, in this case, with a small result set, it probably would be the better option.

这篇关于在客户端验证行比使用整个主键的二级索引更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆