在分区键上基于令牌范围的查询的性能？ [英] Performance of token range based queries on partition keys?

查看：58 发布时间：2020/9/29 20:37:34 cassandra datastax-enterprise cassandra-3.0

本文介绍了在分区键上基于令牌范围的查询的性能？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在根据分区键的令牌范围从cassandra节点中选择所有记录。

I am selecting all records from cassandra nodes based on token range of my partition key.

下面是代码：

public static synchronized List<Object[]> getTokenRanges(
      final Session session) {

    if (cluster == null) {
      cluster = session.getCluster();
    }

    Metadata metadata = cluster.getMetadata();

    return unwrapTokenRanges(metadata.getTokenRanges());
  }

  private static List<Object[]> unwrapTokenRanges(Set<TokenRange> wrappedRanges) {

    final int tokensSize = 2;
    List<Object[]> tokenRanges = new ArrayList<>();
    for (TokenRange tokenRange : wrappedRanges) {
      List<TokenRange> unwrappedTokenRangeList = tokenRange.unwrap();
      for (TokenRange unwrappedTokenRange : unwrappedTokenRangeList) {
        Object[] objects = new Object[tokensSize];
        objects[0] = unwrappedTokenRange.getStart().getValue();
        objects[1] = unwrappedTokenRange.getEnd().getValue();
        tokenRanges.add(objects);
      }
    }
    return tokenRanges;
  }

getTokenRanges 给了我所有节点上的vnode的所有令牌范围。

getTokenRanges gives me all token range of vnodes across all nodes.

然后我正在使用这些令牌范围来查询cassandra。 object [0] 保存vnode的起始令牌，而 object [1] 结束令牌。

Then I am using these token range to query cassandra. object[0] holds start token of vnode and object[1] end token.

哪个生成以下查询：

SELECT * FROM my_key_space.tablename WHERE token(id)><start token number> AND token(id)<= <end token number>;

在上面的 id 列中是分区键。

In above id column is partition key.

在Cassandra中，不建议执行范围查询，那么，该查询是否会有效？

In Cassandra it is not recommended to perform range queries, So, will this query be performant?

据我所知，此查询将仅调用单个分区/ vnode，并且不会调用多个分区，因此应该不会有任何性能问题？

From what I know, this query will call, only the individual partition/vnode and will not call multiple partitions and hence there should not be any performance issue? Is this correct?

Cassandra版本：3.x

Cassandra version: 3.x

推荐答案

对令牌范围的查询是高效的，Spark使用它们进行有效的数据提取。但是您需要牢记以下内容- getTokenRanges 将为您提供所有现有的令牌范围，但是有一些边缘情况-最后一个范围将是从一些正数到代表第一个范围的负数，因此，您的查询不会执行任何操作。基本上，您会错过 MIN_TOKEN 与第一个令牌之间以及最后一个令牌与 MAX_TOKEN 之间的数据。 Spark Connector 根据令牌生成不同的CQL语句。另外，您需要将查询路由到正确的节点-这可以通过 setRoutingToken 完成。

Queries on the token ranges are performant, and Spark uses them for effective data fetching. But you need to need to keep in mind following - getTokenRanges will give you all existing token ranges, but there are some edge cases - the last range will be from some positive number to negative number that represents first range, and as such, your query won't do anything. Basically you miss data between MIN_TOKEN and first token, and between last token and MAX_TOKEN. Spark Connector generates different CQL statements based on the token. Plus you need to route query to correct node - this could be done via setRoutingToken.

可以使用类似的方法在Java代码中（完整代码）：

Similar approach could be used in Java code (full code):

    Metadata metadata = cluster.getMetadata();
    Metadata metadata = cluster.getMetadata();
    List<TokenRange> ranges = new ArrayList(metadata.getTokenRanges());
    Collections.sort(ranges);
    System.out.println("Processing " + (ranges.size()+1) + " token ranges...");

    Token minToken = ranges.get(0).getStart();
    String baseQuery = "SELECT id, col1 FROM test.range_scan WHERE ";
    Map<String, Token> queries = new HashMap<>();
    // generate queries for every range
    for (int i = 0; i < ranges.size(); i++) {
        TokenRange range = ranges.get(i);
        Token rangeStart = range.getStart();
        Token rangeEnd = range.getEnd();
        if (i == 0) {
            queries.put(baseQuery + "token(id) <= " + minToken, minToken);
            queries.put(baseQuery + "token(id) > " + rangeStart + " AND token(id) <= " + rangeEnd, rangeEnd);
        } else if (rangeEnd.equals(minToken)) {
            queries.put(baseQuery + "token(id) > " + rangeStart, rangeEnd);
        } else {
            queries.put(baseQuery + "token(id) > " + rangeStart + " AND token(id) <= " + rangeEnd, rangeEnd);
        }
    }

    // Note: It could be speedup by using async queries, but for illustration it's ok
    long rowCount = 0;
    for (Map.Entry<String, Token> entry: queries.entrySet()) {
        SimpleStatement statement = new SimpleStatement(entry.getKey());
        statement.setRoutingToken(entry.getValue());
        ResultSet rs = session.execute(statement);
        // .... process data
   }

这篇关于在分区键上基于令牌范围的查询的性能？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在分区键上基于令牌范围的查询的性能？ [英] Performance of token range based queries on partition keys?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在分区键上基于令牌范围的查询的性能？ [英] Performance of token range based queries on partition keys?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭