请提供复杂的SQL查询建议 [英] Complex SQL query suggestions please

查看:95
本文介绍了请提供复杂的SQL查询建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个表,其架构如下:

表格:应用

| ID (bigint) | USERID (Bigint)|      START_TIME (datetime) | 
-------------------------------------------------------------
|  1          |        13     |         2013-05-03 04:42:55 | 
|  2          |        13     |         2013-05-12 06:22:45 |
|  3          |        13     |         2013-06-12 08:44:24 |    
|  4          |        13     |         2013-06-24 04:20:56 |       
|  5          |        13     |         2013-06-26 08:20:26 |       
|  6          |        13     |         2013-09-12 05:48:27 | 

表:主机

| ID (bigint) | APPID (Bigint)|         DEVICE_ID (Bigint)  | 
-------------------------------------------------------------
|  1          |        1      |                           1 | 
|  2          |        2      |                           1 |
|  3          |        1      |                           1 |    
|  4          |        3      |                           3 |       
|  5          |        1      |                           4 |      
|  6          |        2      |                           3 |

表格:用法

| ID (bigint) | APPID (Bigint)|             HOSTID (Bigint) |   Factor (varchar)    |  
-------------------------------------------------------------------------------------
|  1          |        1      |                           1 |               Low     | 
|  2          |        1      |                           3 |               High    | 
|  3          |        2      |                           2 |               Low     | 
|  4          |        3      |                           4 |               Medium  | 
|  5          |        1      |                           5 |               Low     | 
|  6          |        2      |                           2 |               Medium  | 

现在,如果放置的是用户ID,我要在最近6个月中每个因子"月份明智地获得(所有应用程序中)每个月表行的行数.. >

如果一个DEVICE_ID每月出现一次以上(基于START_TIME,基于加入应用程序和主机的情况),则仅考虑最近使用情况的行(基于应用程序,主机和使用情况的组合)来计算计数.

上述示例查询的示例输出应为:(对于输入用户ID = 13)

| MONTH       | USAGE_COUNT   |               FACTOR        | 
-------------------------------------------------------------
|  5          |        0      |                 High        | 
|  6          |        0      |                 High        | 
|  7          |        0      |                 High        | 
|  8          |        0      |                 High        |       
|  9          |        0      |                 High        |       
|  10         |        0      |                 High        | 
|  5          |        2      |                 Low         | 
|  6          |        0      |                 Low         | 
|  7          |        0      |                 Low         | 
|  8          |        0      |                 Low         |       
|  9          |        0      |                 Low         |       
|  10         |        0      |                 Low         |
|  5          |        1      |                 Medium      | 
|  6          |        1      |                 Medium      | 
|  7          |        0      |                 Medium      | 
|  8          |        0      |                 Medium      |       
|  9          |        0      |                 Medium      |       
|  10         |        0      |                 Medium      |

这是如何计算的?

  1. 在2013年5月(2013年5月5日)中,表格Apps中有两个Apps
  2. 在表Hosts中,这些应用程序与device_id的1,1,1,4,3
  3. 相关联
  4. 对于本月(05-2013年),device_id = 1,start_time的最新值为:2013-05-12 06:22:45(来自表hosts,apps),因此在表Usage中,查找以下项的组合appid = 2& hostid = 2,其中有两行,其中行的系数为Low,其他为Medium,
  5. 对于本月(2013年5月5日),device_id = 4,通过相同的步骤,我们得到了一个条目,即0低
  6. 类似地计算所有值.

要通过查询获取最近6个月,我尝试使用以下方法获取该信息:

SELECT MONTH(DATE_ADD(NOW(), INTERVAL aInt MONTH)) AS aMonth
    FROM
    (
        SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5
    ) 

请检查sqlfiddle: http://sqlfiddle.com/#!2/55fc2

解决方案

由于您进行的计算多次涉及相同的联接,因此我从创建视图开始.

CREATE VIEW `app_host_usage`
AS 
SELECT a.id "appid", h.id "hostid", u.id "usageid",
       a.userid, a.start_time, h.device_id, u.factor
  FROM apps a
  LEFT OUTER JOIN hosts h ON h.appid = a.id
  LEFT OUTER JOIN `usage` u ON u.appid = a.id AND u.hostid = h.id
  WHERE a.start_time > DATE_ADD(NOW(), INTERVAL -7 MONTH)

之所以存在WHERE条件,是因为我假设您不希望将2005年7月和2006年7月以相同的计数分组在一起.

使用该视图后,查询变为

SELECT months.Month, COUNT(DISTINCT device_id), factors.factor
FROM
  (
    -- Get the last six months
    SELECT (MONTH(NOW()) + aInt + 11) % 12 + 1 "Month" FROM
      (SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5) LastSix
  ) months
  JOIN
  ( 
    -- Get all known factors
    SELECT DISTINCT factor FROM `usage` 
  ) factors
  LEFT OUTER JOIN
  (
    -- Get factors for each device... 
    SELECT 
           MONTH(start_time) "Month", 
           device_id,
           factor
      FROM app_host_usage a
      WHERE userid=13 
        AND start_time IN (
          -- ...where the corresponding usage row is connected
          --    to an app row with the highest start time of the
          --    month for that device.
          SELECT MAX(start_time)
            FROM app_host_usage a2
            WHERE a2.device_id = a.device_id
            GROUP BY MONTH(start_time)
        )
     GROUP BY MONTH(start_time), device_id, factor

  ) usageids ON usageids.Month = months.Month 
            AND usageids.factor = factors.factor
GROUP BY factors.factor, months.Month
ORDER BY factors.factor, months.Month

这是非常复杂的,但是我试图评论解释每个部分的作用.看到此sqlfiddle: http://sqlfiddle.com/#!2/5c871/1/0

I have three tables with schema as below:

Table: Apps

| ID (bigint) | USERID (Bigint)|      START_TIME (datetime) | 
-------------------------------------------------------------
|  1          |        13     |         2013-05-03 04:42:55 | 
|  2          |        13     |         2013-05-12 06:22:45 |
|  3          |        13     |         2013-06-12 08:44:24 |    
|  4          |        13     |         2013-06-24 04:20:56 |       
|  5          |        13     |         2013-06-26 08:20:26 |       
|  6          |        13     |         2013-09-12 05:48:27 | 

Table: Hosts

| ID (bigint) | APPID (Bigint)|         DEVICE_ID (Bigint)  | 
-------------------------------------------------------------
|  1          |        1      |                           1 | 
|  2          |        2      |                           1 |
|  3          |        1      |                           1 |    
|  4          |        3      |                           3 |       
|  5          |        1      |                           4 |      
|  6          |        2      |                           3 |

Table: Usage

| ID (bigint) | APPID (Bigint)|             HOSTID (Bigint) |   Factor (varchar)    |  
-------------------------------------------------------------------------------------
|  1          |        1      |                           1 |               Low     | 
|  2          |        1      |                           3 |               High    | 
|  3          |        2      |                           2 |               Low     | 
|  4          |        3      |                           4 |               Medium  | 
|  5          |        1      |                           5 |               Low     | 
|  6          |        2      |                           2 |               Medium  | 

Now if put is userid, i want to get the count of rows of table rows for each month (of all app) for each "Factor" month wise for the last 6 months.

If a DEVICE_ID appears more than once in a month (based on START_TIME, based on joining Apps and Hosts), only the latest rows of Usage (based on combination of Apps, Hosts and Usage) be considered for calculating count.

Example output of the query for the above example should be: (for input user id=13)

| MONTH       | USAGE_COUNT   |               FACTOR        | 
-------------------------------------------------------------
|  5          |        0      |                 High        | 
|  6          |        0      |                 High        | 
|  7          |        0      |                 High        | 
|  8          |        0      |                 High        |       
|  9          |        0      |                 High        |       
|  10         |        0      |                 High        | 
|  5          |        2      |                 Low         | 
|  6          |        0      |                 Low         | 
|  7          |        0      |                 Low         | 
|  8          |        0      |                 Low         |       
|  9          |        0      |                 Low         |       
|  10         |        0      |                 Low         |
|  5          |        1      |                 Medium      | 
|  6          |        1      |                 Medium      | 
|  7          |        0      |                 Medium      | 
|  8          |        0      |                 Medium      |       
|  9          |        0      |                 Medium      |       
|  10         |        0      |                 Medium      |

How is this calculated?

  1. For Month May 2013 (05-2013), there are two Apps from table Apps
  2. In table Hosts , these apps are associated with device_id's 1,1,1,4,3
  3. For this month (05-2013) for device_id=1, the latest value of start_time is: 2013-05-12 06:22:45 (from tables hosts,apps), so in table Usage, look for combination of appid=2&hostid=2 for which there are two rows one with factor Low and other Medium,
  4. For this month (05-2013) for device_id=4, by following same procedure we get one entry i.e 0 Low
  5. Similarly all the values are calculated.

To get the last 6 months via query i'm trying to get it with the following:

SELECT MONTH(DATE_ADD(NOW(), INTERVAL aInt MONTH)) AS aMonth
    FROM
    (
        SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5
    ) 

Please check sqlfiddle: http://sqlfiddle.com/#!2/55fc2

解决方案

Because the calculation you're doing involves the same join multiple times, I started by creating a view.

CREATE VIEW `app_host_usage`
AS 
SELECT a.id "appid", h.id "hostid", u.id "usageid",
       a.userid, a.start_time, h.device_id, u.factor
  FROM apps a
  LEFT OUTER JOIN hosts h ON h.appid = a.id
  LEFT OUTER JOIN `usage` u ON u.appid = a.id AND u.hostid = h.id
  WHERE a.start_time > DATE_ADD(NOW(), INTERVAL -7 MONTH)

The WHERE condition is there because I made the assumption that you don't want July 2005 and July 2006 to be grouped together in the same count.

With that view in place, the query becomes

SELECT months.Month, COUNT(DISTINCT device_id), factors.factor
FROM
  (
    -- Get the last six months
    SELECT (MONTH(NOW()) + aInt + 11) % 12 + 1 "Month" FROM
      (SELECT 0 AS aInt UNION SELECT -1 UNION SELECT -2 UNION SELECT -3 UNION SELECT -4 UNION SELECT -5) LastSix
  ) months
  JOIN
  ( 
    -- Get all known factors
    SELECT DISTINCT factor FROM `usage` 
  ) factors
  LEFT OUTER JOIN
  (
    -- Get factors for each device... 
    SELECT 
           MONTH(start_time) "Month", 
           device_id,
           factor
      FROM app_host_usage a
      WHERE userid=13 
        AND start_time IN (
          -- ...where the corresponding usage row is connected
          --    to an app row with the highest start time of the
          --    month for that device.
          SELECT MAX(start_time)
            FROM app_host_usage a2
            WHERE a2.device_id = a.device_id
            GROUP BY MONTH(start_time)
        )
     GROUP BY MONTH(start_time), device_id, factor

  ) usageids ON usageids.Month = months.Month 
            AND usageids.factor = factors.factor
GROUP BY factors.factor, months.Month
ORDER BY factors.factor, months.Month

which is insanely complicated, but I've tried to comment explaining what each part does. See this sqlfiddle: http://sqlfiddle.com/#!2/5c871/1/0

这篇关于请提供复杂的SQL查询建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆