通过cqlsh请求对Cassandra执行减法的最佳方法是什么? [英] What is the best way to perform a subtraction on Cassandra by cqlsh request?

查看:80
本文介绍了通过cqlsh请求对Cassandra执行减法的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Cassandra上发出cqlsh请求。这个想法是分析来自基因组中注释的不同基因寄存器的平均长度。输入数据后,我有以下athaliana.tab表:

I am trying to make a cqlsh request on Cassandra. The idea is analyse the average length from different 'gene' registers that had been annotated in a genome. After enter the data, I have this athaliana.tab table:

 id                                   | chr | comments          | end      | orf | sense | start    | type
--------------------------------------+-----+-------------------+----------+-----+-------+----------+------
 d2ab2520-6734-11e5-955c-234085c1edec |   1 | gene_id AT1G16340 |  5590338 |   0 |     - |  5590241 |  CDS
 d4169c00-6734-11e5-955c-234085c1edec |   1 | gene_id AT1G16610 |  5676495 |   . |     - |  5676429 | exon
 a8c792c0-6734-11e5-955c-234085c1edec |   1 | gene_id AT1G07485 |  2301889 |   0 |     + |  2301665 |  CDS
 3bd5c0a0-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G51980 | 19326916 |   . |     - | 19326733 | exon
 263b5b60-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G44990 | 17007808 |   . |     - | 17007542 | gene
 67989a50-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G63110 | 23405144 |   . |     + | 23404821 |  UTR
 26f7f4a0-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G45180 | 17101207 |   0 |     + | 17101109 |  CDS
 3743dc70-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G50840 | 18841644 |   0 |     + | 18840965 |  CDS
 e5099940-6734-11e5-955c-234085c1edec |   1 | gene_id AT1G20620 |  7145780 |   . |     + |  7145691 | exon
 2ba30620-6735-11e5-955c-234085c1edec |   1 | gene_id AT1G48180 | 17793717 |   . |     - | 17792449 | gene

这个想法是获得最终起​​始列数据之间的差值并计算平均值。我已经尝试过这种方式:

The idea is obtain the subtraction between end-start columns data and calculate the average. I have tried by this way:

SELECT avg(end-start) FROM athaliana.tab WHERE chr = '1' AND type = 'gene';

,但结果为:

SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line 1:14 no viable alternative at input '-' (SELECT avg([end]-...)">

有些想法?任何帮助将不胜感激。

Some ideas? Any help will be really appreciated.

推荐答案

Cassandra旨在实现最大的读写效率,并且不内置支持应用于多个列的函数,例如 plus 也不用于聚集,即适用于多行的函数,例如 avg count

Cassandra is designed for maximum read and write efficiency and has no built-in support for functions applying to multiple columns such as plus or minus nor for aggregates, i.e., functions applying to multiple rows, such as avg or count.

直到最近,解决方法是将函数的结果存储在

Until recently, the workaround was to store the results of your functions in a separate column and perform aggregations in your own application code.

但是,Cassandra 2.2和3.0现在提供了用户定义的函数(UDF)和用户定义的聚合(UDA)。您现在可以定义自己的f功能,然后让Cassandra为您调用它们。请参见 UDF UDA 此博客文章

However, Cassandra 2.2 and 3.0 now offer User-Defined Functions (UDFs) and User-Defined Aggregates (UDAs). You can now define your own functions, then have Cassandra invoke them for you. See the CQL spec chapters on UDFs and UDAs or this blog post.

首先,创建以下函数和聚合:

First, create the following functions and aggregates:

USE athaliana;

CREATE FUNCTION minus (x int, y int) RETURNS NULL ON NULL INPUT RETURNS int LANGUAGE java AS 'return x-y;';

CREATE FUNCTION avgState (state tuple<int,bigint>, val int) CALLED ON NULL INPUT RETURNS tuple<int,bigint>  LANGUAGE java AS '
    if (val != null) {
      state.setInt(0, state.getInt(0)+1);
      state.setLong(1, state.getLong(1)+val.intValue());
    }
    return state;
';

CREATE FUNCTION avgFinal (state tuple<int,bigint>) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS '
    double r = 0;
    if (state.getInt(0) == 0) return null;
    r = state.getLong(1);
    r /= state.getInt(0);
    return Double.valueOf(r);
  ';

CREATE AGGREGATE avg(int) SFUNC avgState STYPE tuple<int,bigint> FINALFUNC avgFinal INITCOND (0, 0);

然后可以用以下方式重写查询:

Then your query could be rewritten this way:

SELECT avg(minus(end,start)) FROM athaliana.tab WHERE chr = '1' AND type = 'gene';

请注意,只应在单个分区键上使用聚合;如果不是这种情况,则会发出警告。

Note that you should only use aggregates on a single partition key; if this is not the case, a warning will be raised.

这篇关于通过cqlsh请求对Cassandra执行减法的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆