如何有效地从 Cassandra 批量选择数据? [英] How to batch select data from Cassandra effectively?

查看:40
本文介绍了如何有效地从 Cassandra 批量选择数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 Cassandra 不支持批量查询,也不推荐使用 IN,因为它会降低性能.但是我必须通过id获取数据,例如:

I know Cassandra doesn't support batch query, and it also doesn't recommend to use IN, because it can degrade performance. But I have to get the data by id, for example:

select * from visit where id in ([visit_id array])

描述表:

CREATE TABLE visit (
    enterprise_id int,
    id text,
    ........
    PRIMARY KEY (enterprise_id, id)

该数组可能有数千个元素.有什么办法可以让它有效吗?

The array maybe has thousands of elements. Is there any way can make it effectively?

推荐答案

Large In query 创建 GC 暂停和堆压力,导致整体性能降低.当您在查询中执行大型查询时,这意味着您正在等待这个单个协调器节点给您一个响应,它将所有这些查询及其响应保存在堆中,如果这些查询之一失败,或者协调器失败,您有重试整个过程.

Large In query create GC pauses and heap pressure that leads to overall slower performance. When you execute large in query this means you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.

方法一:

尝试将您的 in 查询转换为范围查询 (>=, <=)

Try to convert your in query into range query (>=, <=)

SELECT * visit WHERE enterprise_id = ? and id >= ? and id <= ?

方法二:

使用 executeAsync,Java 示例

Use executeAsync, Java Example

PreparedStatement statement = session.prepare("SELECT * FROM visit where enterprise_id = ? and id = ?");

List<ResultSetFuture> futures = new ArrayList<>();
for (int i = 1; i < 4; i++) {
    ResultSetFuture resultSetFuture = session.executeAsync(statement.bind(i, i));
    futures.add(resultSetFuture);
}

List<String> results = new ArrayList<>();
for (ResultSetFuture future : futures){
     ResultSet rows = future.getUninterruptibly();
     Row row = rows.one();
     results.add(row.getString("name"));
}
return results; 

方法三:

如果可能的话,而不是在查询中,创建另一个表,当您将在查询中执行的数据即将插入或更新时,也将数据插入到新表中,然后您可以在没有查询的情况下从新表中查询

If possible then instead of in query, create another table and when a data that you will perform in query are about to insert or update also insert the data to new table, then you can just query from the new table without in query

来源:
http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clausehttps://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

这篇关于如何有效地从 Cassandra 批量选择数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆