mysql:涉及日历表的RIGHT JOIN查询速度问题 [英] mysql: RIGHT JOIN query speed issues involving calendar table

查看:129
本文介绍了mysql:涉及日历表的RIGHT JOIN查询速度问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:似乎存在问题(正如人们所说的那样),它是将日期时间字段更改为查询中的日期字段.

UPDATE: It seems the problem (as noted as various people) is changing a datetime field into a date field in the query.

使用DATE( all_griefs_tbl.actioned_date太慢了,是否有一种更快的方法而不将actioned_date更改为日期字段或将其拆分为日期和时间字段?

Using DATE( all_griefs_tbl.actioned_date is too slow, is there a quicker method without either changing actioned_date into a date field or splitting it into a date and time field?

我有2个表,一个表包含一个具有状态和datetime字段的记录,另一个是日历表,其日期从2008年到2015年.

I have 2 tables, one with a load of records that have a status and a datetime field and the other is a calendar table with dates from 2008 to 2015.

我想得到的是一个时间段内的每个日期以及每天已被接受"的记录数-即使该计数为零-看起来像这样:

What I want to get out is every date in a time period and the number of records that have been "accepted" each day - even if that count is zero - which would look like this:

| Date      | number_accepted |
 ----------------------------
 2012-03-01     723
 2012-03-02     723
 2012-03-03     1055
 2012-03-04     1069
 2012-03-05     0
 2012-03-06     615
 2012-03-07     0
 2012-03-08     1072
 2012-03-09     664
 2012-03-10     859
 2012-03-11     0
 2012-03-12     778
 2012-03-13     987

我已经尝试了以下方法,但是它仅对少量数据样本(-1000行)足够快.我需要在至少60万行上都能正常工作的东西

I've tried the following, but it is only fast enough on a small sample of data (-1000 rows). I need something that is works well on at least 600k rows

SELECT calendar.datefield AS Date, 
       COUNT( all_griefs_tbl.actioned_status ) AS total_griefs
FROM all_griefs_tbl
RIGHT JOIN calendar 
   ON ( DATE( all_griefs_tbl.actioned_date ) = calendar.datefield )
   AND all_griefs_tbl.actioned_status = 'accepted'
WHERE calendar.datefield < CURDATE( )
GROUP BY calendar.datefield

谢谢

按要求执行计划

 id select_type     table           type    possible_keys     key               key_len     ref     rows    Extra
 1  SIMPLE          calendar        range   PRIMARY           PRIMARY           3           NULL    1576    Using where; Using index
 1  SIMPLE          all_griefs_tbl  ref     actioned_status   actioned_status   153         const   294975  

推荐答案

一些想法...

首先,尽管您声明要在db查询中返回不包含任何值的日期,但实际上我将对要处理的结果集进行此检查.每当进行联接时,查询都会变得更加复杂,并且需要更多内存来处理它们.在这种情况下,我不会认为您使用日历表是关系数据库的一种特殊用法.

First, although you state that you want days without any values returned in the db query, I would actually do this check on the result set wherever that is being handled. Whenever you do a join, you make your queries much more complicated and require more memory to handle them. In this case, I wouldn't regard you use of the calendar table as a particular good use of a relational database.

为澄清起见,如何调用查询?即是否有一些程序(您正在开发)访问数据库,运行查询并显示结果?如果是这样,我建议让该程序在演示之前处理结果.

To clarify, how is the query being called? i.e. is there some program (that you're developing) accessing the database, running the query and presenting the results? If so, I'd suggest getting this program to process the results before presentation.

第二,如果您承诺加入联接",那么您实际上应该在all_griefs_tbl.actioned_date上有一个索引,因为这是您要进行联接的列.或者,您可以在calendar.datefield上指定一个外键.

Second, if you're committed to the 'join', you really should have an index on all_griefs_tbl.actioned_date since this is the column on which you're doing the join. Alternatively, you could specify a foreign key on calendar.datefield.

第三,您需要使用功能DATE(all_griefs_tbl.actioned_date)吗?这不是已经约会了吗? (不确定您的数据类型,但是如果此数据和calendar.datefield不是相同的数据类型,则这似乎是错误的数据库设计.)

Third, do you need to use the function DATE(all_griefs_tbl.actioned_date)? Isn't this already a date? (Not sure of your data types, but if this and calendar.datefield are not the same data type, this looks like bad database design.)

根据您的发言,您可能需要将all_griefs_tbl.actioned_date分为两列,即日期列all_griefs_tbl.actioned_date和时间戳列all_griefs_tbl.actioned_time.目前,您正在all_griefs_tbl中的每一行上运行此DATE()函数以进行联接-这将很快使查询变慢.这也将允许您在 date time 列上都添加索引,这也将提高联接的性能(鉴于您当前的数据库设计,我不是惊奇地发现actioned_date上的索引没有帮助-我希望这是由于DATE()函数,如果您使用当前的actioned_date列上的索引重新运行EXPLAIN,它不会t使用all_griefs_tbl上的此索引显示它.)

In light of what you say, you may want to split all_griefs_tbl.actioned_date into two columns a date column all_griefs_tbl.actioned_date and a timestamp column all_griefs_tbl.actioned_time. At the moment, you're running this DATE() function on every row in all_griefs_tbl in order to do the join - this will very quickly make the query sluggish. This would also allow you to add an index on both date and time columns, which would also improve the performance of the join (Given your current db design, I'm not surprised the index on actioned_date didn't help - I'd rather expect that, because of the DATE() function, if you rerun EXPLAIN with an index on the actioned_date column as it currently stands, it wouldn't show it using this index on all_griefs_tbl.)

第四,您可能要考虑all_griefs_tbl.actioned_status中存储了哪些类型的信息.可以将其替换为布尔值吗?在存储和处理数据方面这将更加有效. (尽管再次,这取决于您的数据库设计.)

Fourth, you may want to consider what types of information are stored in all_griefs_tbl.actioned_status. Could this be replaced by a boolean? This would be more efficient in both storing and processing the data. (Although again, this depends on your database design.)

您可以考虑将all_griefs_tbl.action_status更改为较小的数据类型-我希望它当前是varchar,但是您可以轻松地将其更改为单个(或小的)char数据类型,甚至更改为多个布尔值.但是,我不希望这会成为主要的性能开销,而实际上取决于您项目的需求,这实际上是一个涉及程度更大的数据库设计决策.

You could consider changing all_griefs_tbl.action_status to a smaller datatype - I expect it's currently a varchar, but you could easily change this to a single (or small) char datatype, or even to a number of booleans. However, I don't expect this to be the main performance overhead and is really a more involved database design decision depending on the needs of your project.

这篇关于mysql:涉及日历表的RIGHT JOIN查询速度问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆