如何在GROUP BY Druid返回的结果中执行SELECT? [英] How to perform a SELECT in the results returned from a GROUP BY Druid?

查看:336
本文介绍了如何在GROUP BY Druid返回的结果中执行SELECT?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难将下面的这个简单的SQL查询转换为Druid:

I am having a hard time converting this simple SQL Query below into Druid:

SELECT country, city, Count(*) 
FROM people_data 
WHERE name="Mary" 
GROUP BY country, city;

所以到目前为止,我想到了这个查询:

So I came up with this query so far:

{
  "queryType": "groupBy",
  "dataSource" : "people_data",
  "granularity": "all",
  "metric" : "num_of_pages",
  "dimensions": ["country", "city"],
  "filter" : {
      "type" : "and",
      "fields" : [
          {
            "type": "in",
            "dimension": "name",
            "values": ["Mary"]
          },
          {
            "type" : "javascript",
            "dimension" : "email",
            "function" : "function(value) { return (value.length !== 0) }"
          }
      ]
  },
  "aggregations": [

    { "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
  ],
  "intervals": [ "2016-07-20/2016-07-21" ]
}

上面的查询运行了,但是Druid数据源中的groupBy似乎都没有被评估,因为我在输出中看到的人的名字不是Mary.有人对如何使这项工作有任何意见吗?

The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?

推荐答案

简单的答案是,您不能在groupBy查询中选择任意尺寸.

Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.

严格来说,即使SQL查询也没有意义.如果对于给定的country, city组合,有许多不同的namestreet值,那么如何将其压缩到单行中?您必须汇总它们,例如通过使用max函数.

Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.

在这种情况下,您可以在数据中包含与维度和指标相同的列,例如name_dimname_metric,并包括针对您的指标max(name_metric)的相应汇总.

In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).

请注意,如果这些列,name等具有较高的粒度值,则将终止Druid的汇总功能.

Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.

这篇关于如何在GROUP BY Druid返回的结果中执行SELECT?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆