如何获得前五的记录在cassandra 2.2 [英] How to get top 5 records in cassandra 2.2

查看:162
本文介绍了如何获得前五的记录在cassandra 2.2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要帮助。我有一个查询,它按日期(不是日期+时间)和总金额获得前5名记录。

I need a help. I have a query which get top 5 records group by date (not date + time) and sum of amount.

我写了以下内容,但它返回所有记录,前5个记录

I wrote the following but it returns all the records not just top 5 records

CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java 
AS 'String date = datetime.substring(0,10); Double count = (Double) state.get(date);  if (count == null) count = Double.parseDouble(amount); else count = count +  Double.parseDouble(amount); state.put(date, count); return state;' ;


CREATE OR REPLACE AGGREGATE groupbyandsum(text, text) 
SFUNC state_groupbyandsum
STYPE map<text, double>
INITCOND {};

select groupbyandsum(datetime, amout) from warehouse;

您能帮忙得到5条记录。

Could you please help out to get just 5 records.

推荐答案

这里有一种方法。您的群组状态函数可能是这样:

Here's one way to do that. Your group by state function could be like this:

CREATE FUNCTION state_group_and_total( state map<text, double>, type text, amount double )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
     Double count = (Double) state.get(type);
     if (count == null)
         count = amount;
     else
         count = count + amount;
     state.put(type, count);
     return state;
';

这将建立一个由查询WHERE子句选择的所有金额行的映射。现在棘手的部分是如何保持只有前N个。一种方法是使用FINALFUNC,它在所有行都被放入地图后执行。所以这里有一个函数,使用循环来查找地图中的最大值,并将其移动到结果图。因此,要找到最高的N,它会遍历地图N次(有更有效的算法,但是它只是一个快速和肮脏的例子)。

That will build up a map of all the amount rows selected by your query WHERE clause. Now the tricky part is how to keep just the top N. One way to do it is by using a FINALFUNC which gets executed after all the rows have been put in the map. So here's a function to do that using a loop to find the maximum value in the map and move it to a result map. So to find the top N it would iterate over the map N times (there are more efficient algorithms than this, but it's just a quick and dirty example).

找到前两个的示例:

CREATE FUNCTION topFinal (state map<text, double>)
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
    java.util.Map<String, Double> inMap = new java.util.HashMap<String, Double>(),
                                  outMap = new java.util.HashMap<String, Double>();

    inMap.putAll(state);

    int topN = 2;
    for (int i = 1; i <= topN; i++) {
        double maxVal = -1;
        String moveKey = null;
        for (java.util.Map.Entry<String, Double> entry : inMap.entrySet()) {

            if (entry.getValue() > maxVal) {
                maxVal = entry.getValue();
                moveKey = entry.getKey();
            }
        }
        if (moveKey != null) {
            outMap.put(moveKey, maxVal);
            inMap.remove(moveKey);
        }
    }

    return outMap;
';

然后最后需要定义AGGREGATE来调用你定义的两个函数:

Then lastly you need to define the AGGREGATE to call the two functions you defined:

CREATE OR REPLACE AGGREGATE group_and_total(text, double) 
     SFUNC state_group_and_total 
     STYPE map<text, double> 
     FINALFUNC topFinal
     INITCOND {};

所以让我们看看是否有效。

So let's see if that works.

CREATE table test (partition int, clustering text, amount double, PRIMARY KEY (partition, clustering));
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2015', 99.1);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2016', 18.12);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2017', 44.889);
SELECT * from test;

 partition | clustering | amount
-----------+------------+--------
         1 |       2015 |   99.1
         1 |       2016 |  18.12
         1 |       2017 | 44.889

现在,鼓轮...

SELECT group_and_total(clustering, amount) from test where partition=1;

 agg.group_and_total(clustering, amount)
-------------------------------------------
            {'2015': 99.1, '2017': 44.889}

所以你看到它保持前2行基于金额。

So you see it kept the top 2 rows based on the amount.

注意,键不会按照排序顺序,因为它是一个地图,我不认为我们可以控制地图中的键顺序,所以在FINALFUNC排序将浪费资源。如果你需要地图排序,那么你可以在客户端。

Note that the keys won't be in sorted order since it's a map, and I don't think we can control the key order in the map, so sorting in the FINALFUNC would be a waste of resources. If you need the map sorted then you could do that in the client.

我想你可以在state_group_and_total函数中做更多的工作,沿。这可能更好,以防止地图太大。

I think you could do more work in the state_group_and_total function to drop items from the map as you go along. That might be better to keep the map from getting too big.

这篇关于如何获得前五的记录在cassandra 2.2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆