Cassandra CQLEngine允许过滤 [英] Cassandra CQLEngine Allow Filtering

查看:121
本文介绍了Cassandra CQLEngine允许过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python Cassandra Cqlengine扩展.我创建了多对多表,但是在user_applications模型查询过滤过程中收到错误.我已阅读有关此问题的其他资源,但我并未完全理解此问题.

I'm using Python Cassandra Cqlengine extension. I create many-to-many table but I receive error in user_applications model query filtering process. I'm readed different resource for this problem, but I did not fully understand this problem.

来源: https://ohioedge.com/2017/07/05/cassandra-primary-key-partitioning-key-clustering-key-a-simple-explanation/

Cassandra允许过滤

在Cassandra中允许过滤对于以下查询有效吗?

数据库模型:

class UserApplications(BaseModel):
    __table_name__ = "user_applications"

    user_id = columns.UUID(required=True, primary_key=True, index=True)
    application_id = columns.UUID(required=True, primary_key=True, index=True)
    membership_id = columns.UUID(required=True, primary_key=True, index=True)

错误消息:

无法执行此查询,因为它可能涉及数据过滤,因此可能具有不可预测的性能.如果您在性能无法预测的情况下仍要执行此查询,请使用ALLOW FILTERING"

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Python CQLEngine代码:

Python CQLEngine Code:

q = UserApplications.filter(membership_id=r.membership_id,
                                    user_id=r.user_id,
                                    application_id=r.application_id)

CQLEngine SQL语句:

CQLEngine SQL Statements:

SELECT "id", "status", "created_date", "update_date" FROM db.user_applications WHERE "membership_id" = %(0)s AND "user_id" = %(1)s AND "application_id" = %(2)s LIMIT 10000

描述表结果:

CREATE TABLE db.user_applications (
    id uuid,
    user_id uuid,
    application_id uuid,
    membership_id uuid,
    created_date timestamp,
    status int,
    update_date timestamp,
    PRIMARY KEY (id, user_id, application_id, membership_id)
) WITH CLUSTERING ORDER BY (user_id ASC, application_id ASC, membership_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX user_applications_membership_id_idx ON db.user_applications (membership_id);

等待您的帮助.

推荐答案

出现此错误的原因是,如果在查询末尾添加ALLOW FILTERING,则不会在查询中添加ALLOW FILTERING标志它应该起作用.

The reason you are getting this error is that you are not adding ALLOW FILTERING flag to your query, if you add ALLOW FILTERING to the end of your query it should work.

在Cassandra查询中使用ALLOW FILTERING实际上允许cassandra在加载某些行后过滤掉某些行(也许在从表加载所有行之后).例如,对于您的查询,Cassandra执行此查询的唯一方法是从表UserApplications中检索所有行,然后过滤出您要限制的每个列都不具有请求值的行.

Using ALLOW FILTERING in Cassandra queries actually allows cassandra to filter out some rows after loading them (maybe after it loads all rows from a table). For example in the case of your query the only way Cassandra can execute this query is by retrieving all the rows from the table UserApplications and then by filtering out the ones which do not have the requested value for the each of the columns your are restricting.

使用ALLOW FILTERING可能会产生不可预测的性能结果,而实际性能取决于表内的数据分布.例如,如果您的表包含一百万行,并且其中95%的列具有您要求的列值,则您指定的查询仍然相对有效,您应该使用ALLOW FILTERING.另一方面,如果您的表包含一百万行,而只有两行包含请求的值,则查询效率极低. Cassandra将加载999、998行,而不进行任何操作.通常,如果您的查询需要添加ALLOW FILTERING,那么您可能应该重新考虑架构或为经常查询的列添加二级索引.

Using ALLOW FILTERING can have unpredictable performance outcomes and the actual performance depends on data distribution inside your table. If your table contains for example a 1 million rows and 95% of them have the requested value for the columns your are specifying the query will still be relatively efficient and you should use ALLOW FILTERING. On the other hand, if your table contains 1 million rows and only 2 rows contain the requested values , your query is extremely inefficient. Cassandra will load 999, 998 rows for nothing. In general if your queries require adding ALLOW FILTERING then probably you should rethink about your schema or add secondary indexes for the columns you are querying often.

在您的情况下,我建议将Membership_id,user_id,application_id列作为复合分区键.如果这样做,您将不再需要在加载后过滤掉任何行,因为三列具有相同值的所有行都将位于同一分区(在同一物理节点中),并且应在查询(您已经在问题中添加的查询中执行了此操作).这是您可以这样做的方法:

In your case I suggest making columns membership_id, user_id, application_id as a composite partition key. If you do so you will no longer need to filter out any rows after loading because all rows having the same values for the three column will reside on the same partition (in the same physical node), and you should provide the three values in the query (you are already doing so in the query you added in the question). Here is the way you can do so:

CREATE TABLE db.user_applications (
    user_id uuid,
    application_id uuid,
    membership_id uuid,
    created_date timestamp,
    status int,
    update_date timestamp,
    PRIMARY KEY ((user_id, application_id, membership_id))
);

这篇关于Cassandra CQLEngine允许过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆