SQL查询GROUP BY,返回与LEAST,COALESCE条件匹配的组 [英] SQL query for GROUP BY, return groups that match the conditions of LEAST, COALESCE

查看:89
本文介绍了SQL查询GROUP BY,返回与LEAST,COALESCE条件匹配的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个SQL查询,查询条件复杂的几列.我正在使用RMySQL包在R Studio上工作.我的服务器是MySQL.

I want to write an SQL query calling for several columns with a bit complicated conditions. I'm working on R Studio using RMySQL package. My server is MySQL.

表看起来像这样.

organisation    Tour_ID             A           B           C           D  
Ikea                  a    2018-04-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  a    2018-06-01  2018-05-03  2018-05-29          NA   
Ikea                  a    2018-04-02  2018-05-01  2018-07-08  2018-05-26 
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01
Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  c    2018-06-01  2018-05-03          NA          NA   
Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26

这就是我想要做的:

  1. 过滤organisation = Ikea
  2. 所在的行
  3. 分组的方式是这样的:

  1. filter the rows where organisation = Ikea
  2. groupby by Tour_ID like this:

organisation    Tour_ID             A           B           C           D  
Ikea                  a    2018-04-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  a    2018-06-01  2018-05-03  2018-05-29          NA   
Ikea                  a    2018-04-02  2018-05-01  2018-07-08  2018-05-26 


Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01


Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  c    2018-06-01  2018-05-03          NA          NA   
Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26

在每个Tour_ID组中的

  • ,请查看ABCD列中的最早日期.如果组中四列中最早的日期在2018-05-012018-05-31之间,则返回整个组.如果一行包含NA值,我想忽略NA并查看其余值中最早的日期是什么.例如,对于Tour_ID = a组,最早的日期是2018-04-01,因此它不符合条件.

  • in each group of Tour_ID, look at the earliest date in columns A, B, C and D. If the earliest date among the four columns in the group is between 2018-05-01 and 2018-05-31, return the entire group. If a row contains NA values, I want to ignore the NAs and see what's the earliest date among the rest of the values. For example, for the group of Tour_ID = a, the earliest date is 2018-04-01 therefore it doesn't meet the criteria.

    最后,只有Tour_ID = bTour_ID = c符合条件的组.结果应该是:

    In conclusion, only the groups where Tour_ID = b and Tour_ID = c match the conditions. The result should be:

    organisation    Tour_ID             A           B           C           D
    Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
    Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
    Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01
    Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
    Ikea                  c    2018-06-01  2018-05-03          NA          NA   
    Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26
    

    我应该如何编写SQL查询? 这是我的尝试,但是我只是不知道如何进行groupby,以及如何返回整个组而不仅仅是返回日期最早的行.

    How should I write an SQL query? Here is my attempt, but I just don't know how to do groupby, and how to return the entire group not just the rows with the earliest date.

    SELECT *
    FROM myTable 
    WHERE organisation LIKE 'Ikea' AND
    GROUP BY 'Tour_ID' AND
    LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) >= '2018-05-01' AND
    LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) < '2018-06-01';
    

    ('2019-01-01'将替换NA)

    ('2019-01-01' is to replace NAs)

    谢谢您的帮助!

    已添加: 按照 Gordon 的回答,在这里我重写了SQL语句.

    ADDED: Following the answer by Gordon, here I rewrote the SQL statement.

    "SELECT t.* FROM myTable JOIN (SELECT organisation, Tour_ID 
                                                       FROM myTable
                                                       WHERE organisation LIKE 'Ikea' AND
                                                       GROUP BY organisation, Tour_ID
                                                       HAVING LEAST(COALESCE(MIN(A), '2119-01-01'), 
                                                                    COALESCE(MIN(B), '2119-01-01'), 
                                                                    COALESCE(MIN(C), '2119-01-01'), 
                                                                    COALESCE(MIN(D), '2119-01-01')) >= '2018-05-01' AND
                                                              LEAST(COALESCE(MIN(A), '2119-01-01'), 
                                                                    COALESCE(MIN(B), '2119-01-01'), 
                                                                    COALESCE(MIN(C), '2119-01-01'), 
                                                                    COALESCE(MIN(D), '2119-01-01')) < '2018-06-01'
                                                      ) tt
                                                      ON tt.Tour_ID = t.Tour_ID AND
                                                         tt.organisation = t.organisation"
    

    然后我从RMySQL包中运行了dbGetQuery.但是我收到以下错误.我不明白,因为GROUP BY部分似乎还不错.有人知道为什么我会收到此错误吗?

    And I ran dbGetQuery from RMySQL package. But I get the following error. I don't understand because GROUP BY part seems quite okay. Does anyone know why I'm getting this error?

    dbGetQuery(connection = connection, statement = condition)
    
    Error in .local(conn, statement, ...) : could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'GROUP BY organisation, Tour_ID HAVING LEAST(COALESCE(A' at line 1
    

    推荐答案

    首先获取符合条件的tour_id:

    SELECT Tour_ID
    FROM myTable 
    WHERE organisation LIKE 'Ikea'
    GROUP BY Tour_ID
    HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
           LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01';
    

    然后将其放入查询中以获取原始行.这是一种方法:

    Then put this into a query to get the original rows. Here is one way:

    select t.*
    from mytable t join
         (SELECT organisation, Tour_ID
          FROM myTable 
          WHERE organisation LIKE 'Ikea'
          GROUP BY organisation, Tour_ID
          HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
                 LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01'
         ) tt
         ON tt.tour_id = t.tour_id AND
            tt.organisation = t.organisation;
    

    这篇关于SQL查询GROUP BY,返回与LEAST,COALESCE条件匹配的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆