mysql:涉及日历表的RIGHT JOIN查询速度问题 [英] mysql: RIGHT JOIN query speed issues involving calendar table
问题描述
更新:似乎存在问题(正如人们所说的那样),它是将日期时间字段更改为查询中的日期字段.
UPDATE: It seems the problem (as noted as various people) is changing a datetime field into a date field in the query.
使用DATE( all_griefs_tbl.actioned_date
太慢了,是否有一种更快的方法而不将actioned_date更改为日期字段或将其拆分为日期和时间字段?
Using DATE( all_griefs_tbl.actioned_date
is too slow, is there a quicker method without either changing actioned_date into a date field or splitting it into a date and time field?
我有2个表,一个表包含一个具有状态和datetime字段的记录,另一个是日历表,其日期从2008年到2015年.
I have 2 tables, one with a load of records that have a status and a datetime field and the other is a calendar table with dates from 2008 to 2015.
我想得到的是一个时间段内的每个日期以及每天已被接受"的记录数-即使该计数为零-看起来像这样:
What I want to get out is every date in a time period and the number of records that have been "accepted" each day - even if that count is zero - which would look like this:
| Date | number_accepted |
----------------------------
2012-03-01 723
2012-03-02 723
2012-03-03 1055
2012-03-04 1069
2012-03-05 0
2012-03-06 615
2012-03-07 0
2012-03-08 1072
2012-03-09 664
2012-03-10 859
2012-03-11 0
2012-03-12 778
2012-03-13 987
我已经尝试了以下方法,但是它仅对少量数据样本(-1000行)足够快.我需要在至少60万行上都能正常工作的东西
I've tried the following, but it is only fast enough on a small sample of data (-1000 rows). I need something that is works well on at least 600k rows
SELECT calendar.datefield AS Date,
COUNT( all_griefs_tbl.actioned_status ) AS total_griefs
FROM all_griefs_tbl
RIGHT JOIN calendar
ON ( DATE( all_griefs_tbl.actioned_date ) = calendar.datefield )
AND all_griefs_tbl.actioned_status = 'accepted'
WHERE calendar.datefield < CURDATE( )
GROUP BY calendar.datefield
谢谢
按要求执行计划
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE calendar range PRIMARY PRIMARY 3 NULL 1576 Using where; Using index
1 SIMPLE all_griefs_tbl ref actioned_status actioned_status 153 const 294975
推荐答案
一些想法...
首先,尽管您声明要在db查询中返回不包含任何值的日期,但实际上我将对要处理的结果集进行此检查.每当进行联接时,查询都会变得更加复杂,并且需要更多内存来处理它们.在这种情况下,我不会认为您使用日历表是关系数据库的一种特殊用法.
First, although you state that you want days without any values returned in the db query, I would actually do this check on the result set wherever that is being handled. Whenever you do a join, you make your queries much more complicated and require more memory to handle them. In this case, I wouldn't regard you use of the calendar table as a particular good use of a relational database.
为澄清起见,如何调用查询?即是否有一些程序(您正在开发)访问数据库,运行查询并显示结果?如果是这样,我建议让该程序在演示之前处理结果.
To clarify, how is the query being called? i.e. is there some program (that you're developing) accessing the database, running the query and presenting the results? If so, I'd suggest getting this program to process the results before presentation.
第二,如果您承诺加入联接",那么您实际上应该在all_griefs_tbl.actioned_date
上有一个索引,因为这是您要进行联接的列.或者,您可以在calendar.datefield
上指定一个外键.
Second, if you're committed to the 'join', you really should have an index on all_griefs_tbl.actioned_date
since this is the column on which you're doing the join. Alternatively, you could specify a foreign key on calendar.datefield
.
第三,您需要使用功能DATE(all_griefs_tbl.actioned_date)
吗?这不是已经约会了吗? (不确定您的数据类型,但是如果此数据和calendar.datefield
不是相同的数据类型,则这似乎是错误的数据库设计.)
Third, do you need to use the function DATE(all_griefs_tbl.actioned_date)
? Isn't this already a date? (Not sure of your data types, but if this and calendar.datefield
are not the same data type, this looks like bad database design.)
根据您的发言,您可能需要将all_griefs_tbl.actioned_date
分为两列,即日期列all_griefs_tbl.actioned_date
和时间戳列all_griefs_tbl.actioned_time
.目前,您正在all_griefs_tbl
中的每一行上运行此DATE()
函数以进行联接-这将很快使查询变慢.这也将允许您在 date 和 time 列上都添加索引,这也将提高联接的性能(鉴于您当前的数据库设计,我不是惊奇地发现actioned_date
上的索引没有帮助-我希望这是由于DATE()
函数,如果您使用当前的actioned_date
列上的索引重新运行EXPLAIN
,它不会t使用all_griefs_tbl
上的此索引显示它.)
In light of what you say, you may want to split all_griefs_tbl.actioned_date
into two columns a date column all_griefs_tbl.actioned_date
and a timestamp column all_griefs_tbl.actioned_time
. At the moment, you're running this DATE()
function on every row in all_griefs_tbl
in order to do the join - this will very quickly make the query sluggish. This would also allow you to add an index on both date and time columns, which would also improve the performance of the join (Given your current db design, I'm not surprised the index on actioned_date
didn't help - I'd rather expect that, because of the DATE()
function, if you rerun EXPLAIN
with an index on the actioned_date
column as it currently stands, it wouldn't show it using this index on all_griefs_tbl
.)
第四,您可能要考虑all_griefs_tbl.actioned_status
中存储了哪些类型的信息.可以将其替换为布尔值吗?在存储和处理数据方面这将更加有效. (尽管再次,这取决于您的数据库设计.)
Fourth, you may want to consider what types of information are stored in all_griefs_tbl.actioned_status
. Could this be replaced by a boolean? This would be more efficient in both storing and processing the data. (Although again, this depends on your database design.)
您可以考虑将all_griefs_tbl.action_status
更改为较小的数据类型-我希望它当前是varchar,但是您可以轻松地将其更改为单个(或小的)char数据类型,甚至更改为多个布尔值.但是,我不希望这会成为主要的性能开销,而实际上取决于您项目的需求,这实际上是一个涉及程度更大的数据库设计决策.
You could consider changing all_griefs_tbl.action_status
to a smaller datatype - I expect it's currently a varchar, but you could easily change this to a single (or small) char datatype, or even to a number of booleans. However, I don't expect this to be the main performance overhead and is really a more involved database design decision depending on the needs of your project.
这篇关于mysql:涉及日历表的RIGHT JOIN查询速度问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!