如何使用OR使多个LEFT JOIN完全使用复合索引? (第2部分) [英] How to make multiple LEFT JOINs with OR fully use a composite index? (part 2)

查看:132
本文介绍了如何使用OR使多个LEFT JOIN完全使用复合索引? (第2部分)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该系统用于计算用户进入/离开工作场所时指纹的扫描方式.我不知道英文怎么说.我需要确定用户是否早上很晚,以及用户是否提早下班.

It is for a system that calculates how the users scan their fingerprints when they enter/leave the workplace. I don't know how it is called in English. I need to determine if the user is late in the morning, and if the user leaves work early.

tb_scan表包含用户扫描指纹的日期和时间.

This tb_scan table contains date and time a user scans a fingerprint.

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scyear` varchar(4) DEFAULT NULL,
  `scmonth` varchar(2) DEFAULT NULL,
  `scday` varchar(2) DEFAULT NULL,
  `scscantime` datetime,
  KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

它有100,000+行,类似这样

It has 100,000+ rows, something like this

scpercode scyear scmonth scday     scdateandtime
000001    2010      10     10      2016-01-10 08:02:00
000001    2010      10     10      2016-01-02 17:33:00
000001    2010      10     11      2016-01-11 07:48:00
000001    2010      10     11      2016-01-11 17:29:00
000002    2010      10     10      2016-01-10 17:31:00
000002    2010      10     10      2016-01-02 17:28:00
000002    2010      10     11      2016-01-11 05:35:00
000002    2010      10     11      2016-01-11 05:29:00

tb_workday表包含每个日期

CREATE TABLE `tb_workday` (
  `wdpercode` varchar(6) DEFAULT NULL,
  `wdshift` varchar(1) DEFAULT NULL,
  `wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

它具有按日期顺序排列的行,如下所示:

It has rows with date sequence like this:

wdpercode  wdshift wddate
000001     1       2010-10-10
000001     1       2010-10-11
000001     1       2010-10-12
000001     1       2010-10-13
000002     2       2010-10-10
000002     2       2010-10-11
000002     2       2010-10-12
000002     2       2010-10-13

还有另一个包含班次时间的tb_shift

There is another tb_shift table containing shift time

CREATE TABLE `tb_shift` (
  `shiftcode` varchar(1) DEFAULT NULL,
  `shiftbegin2` varchar(4) DEFAULT NULL,
  `shiftbegin` varchar(4) DEFAULT NULL,
  `shiftmid` varchar(4) DEFAULT NULL,
  `shiftend` varchar(4) DEFAULT NULL,
  `shiftend2` varchar(4) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        1     04:00:00     08:00:00  12:00:00  17:30:00  21:30:00 
        2     12:00:00     17:30:00  21:00:00  05:30:00  09:30:00

我想确定每天是员工上班迟到还是提早下班,以及在什么时候上班.

I want to determine that in each day, is the employee comes to work late or leaves work early, and at what time.

SELECT wdpercode,wddate,shiftbegin,shiftend,time(tlate.scscantime) wdlate,time(tearly.scscantime) wdearly
FROM tb_workday
LEFT JOIN tb_shift
  ON wdshift=shiftcode
LEFT JOIN tb_scan tlate 
  ON wdpercode=tlate.scpercode
  AND tlate.scyear=year(wddate)
  AND tlate.scmonth=month(wddate)
  AND (tlate.scday=day(wddate)
    OR tlate.scday=day(wddate)+1)
  AND tlate.scscantime>=ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)
  AND tlate.scscantime<=ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
LEFT JOIN tb_scan tearly 
  ON wdpercode=tearly.scpercode
  AND tearly.scyear=year(wddate)
  AND tearly.scmonth=month(wddate)
  AND (tearly.scday=day(wddate)
    OR tearly.scday=day(wddate)+1)
  AND tearly.scscantime>ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
  AND tearly.scscantime<ADDDATE(CONCAT(wddate,' ',shiftend),INTERVAL IF(shiftbegin2>shiftend,1,0) DAY)

以下是输出示例:

wdpercode wddate      shiftbegin  shiftend  wdlate    wdearly
000001    2016-01-10  08:00:00    17:30:00  08:02:00  (null)
000001    2016-01-11  08:00:00    17:30:00  (null)    17:29:00
000002    2016-01-11  17:30:00    05:30:00  17:31:00  (null)
000002    2016-01-11  17:30:00    05:30:00  (null)    05:29:00

ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)适用于夜班工作的员工,因此必须将轮班时间增加1天

this ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY) is for employees who work on night shift, so it has to add 1 day into the shift time

问题是如果我为scscantime创建索引,MySQL拒绝将其用于比较(>=<=><).请参见此线程为什么MySQL不这样做使用索引大于比较?

The problem is if I create an index for scscantime, MySQL refuses to use it for comparison (>=,<=,>,<). Please see this thread Why does MySQL not use an index for a greater than comparison?

因此,我创建了scyearscmonthscday字段,并将它们与scpercode一起组合在索引中.而且我必须确保它也能计算夜班工作的工人,因此我必须添加OR scday=day(wddate)+1条件.

Because of this I created the scyear, scmonth, and scday fields and combine them in an index along with scpercode. And I have to make sure it calculates for workers working in night shift too so I have to add it with OR scday=day(wddate)+1 condition.

在添加OR条件之前,EXPLAIN结果为52行.但是,当我添加OR scday=day(wddate)+1条件时,EXPLAIN结果变为364行,这意味着MySQL不使用索引的scday部分.有没有办法使用整个索引,所以EXPLAIN结果变得像52行一样更有效?我还尝试删除了+1部分,结果也是52.

Before I added the OR condition, the EXPLAIN result was 52 rows. But when I added the OR scday=day(wddate)+1 condition, the EXPLAIN result became 364 rows, that means MySQL did not use scday part of the index. Is there any way to use the whole index, so the EXPLAIN result becomes more efficient like 52 rows? I also tried removing the +1 part and the result is also 52.

推荐答案

首先,最好是从其他人那里发布此问题.您获得多条记录的原因是,一个人可能会根据他们的班次在同一天内多次进出时钟.现在,如何解决这个问题.

First, better on posting this question from your other. The reason you were getting multiple records is the possibility of a person clocking in and out multiple times in a same day based on their shifts. Now, how to resolve this.

在MySQL中,可以将"@"变量用作select FROM子句的一部分来进行内联变量声明和赋值.我首先要进行的是从工作日到轮班表的简单连接(并认为我现在已经明白了),并带有一些@variables.

In MySQL, you can do inline variable declaration and assignments using "@" variables as part of the select FROM clause. What I am starting with is a simple join from the work day to the shift table (and think I understand this now), with some @variables.

对于每个加入班次的人,我都在预先计算班次中间发生的位置,例如当天和第二天.而且,begin2和end2对于可能的时钟进场与时钟出场似乎是离群值.示例:人员1正在轮班1.对于任何给定的工作日,轮班1定义为

For each person, joined to the shift, I am pre-computing where the middle of the shift occurs such as same day vs next day. Also, the begin2 and end2 appear to be outliers for a possible clock-in vs clock-out. Example: Person 1 is working shift 1. Shift 1 is defined for any given day of work as

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        1     04:00:00     08:00:00  12:00:00  17:30:00  21:30:00 

因此,我将其解释为好像我在6月28日班1上班.

So, I am interpreting this as if I work on June 28, Shift 1,

June 28 @ 4am Earliest allowed clock-in time
June 28 @ 8am Actual beginning of shift
June 28 @ 12pm (afternoon) is the middle of the work day
June 28 @ 5:30pm is the end of the work day
June 28 @ 9:30pm is the max expected clock-out recognized for the shift

类似地,对于第二个班次,它将整夜住宿

Similarly, for shift 2 which will wrap an over-night

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        2     12:00:00     17:30:00  21:00:00  05:30:00  09:30:00

June 28 @ 12pm (afternoon) Earliest allowed clock-in time
June 28 @ 5:30pm Actual beginning of shift
June 28 @ 9pm is the middle of the shift
June 29 @ 5:30am (day roll-over) is the end of the work day 
June 29 @ 9:30am (day roll-over) is the max expected clock-out for the shift

因此,如果这一切正确,我的内部查询将为每个人预先确定所有这些范围,因此无论通过下面进行多少次扫描,我每个工作日每人只有1条记录.

So, if this is all correct, my inner query is pre-determining all these ranges for each person, so I will only ever have 1 record per person per work day regardless of how many scans via below.

select 
      wd.wdpercode,
      wd.wdshift,
      wd.wddate,
      s.shiftbegin,
      s.shiftend,
      s.shiftbegin2,
      s.shiftmid,
      s.shiftend2,
      @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewMidDay,
      @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewEndDay,
      cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
      cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
      cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
      cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
      cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut
   from
      ( select 
              @endDay := '', 
              @midDay := '' ) sqlvars,
      tb_workday wd
         join tb_shift s
            on wd.wdshift = s.shiftcode

@midDay和@endDay的内联计算是这样,所以我不必担心加入扫描的时钟表,并在考虑的所有其他情况中继续增加1天.因此,在此查询结束时,我将得到类似...的结果. 请注意,在1号人员的正常班次和2号人员的夜间班次之间,计算出的结束日期也显示了过渡日期

The inline computing of the @midDay and @endDay are so I don't have to worry about joining to the scanned time clock table and keep adding 1 day in the midst of all else being considered. So, at the end of this query, I would end up with something like... Notice between person 1 normal shift and person 2 night shift, the computed end date shows the roll-over dates too

wdpercode  wdshift  wddate      shiftbegin  shiftend  shiftbegin2  shiftmid  shiftend2  NewMidDay   NewEndDay   EarliestClockIn   BeginShift        MidShift          EndShift          MaxClockOut
000001     1        2010-10-10  08:00       17:30     04:00        12:00     21:30      2010-10-10  2010-10-10  2010-10-10 04:00  2010-10-10 08:00  2010-10-10 12:00  2010-10-10 17:30  2010-10-10 21:30:00
000001     1        2010-10-11  08:00       17:30     04:00        12:00     21:30      2010-10-11  2010-10-11  2010-10-11 04:00  2010-10-11 08:00  2010-10-11 12:00  2010-10-11 17:30  2010-10-11 21:30:00
000001     1        2010-10-12  08:00       17:30     04:00        12:00     21:30      2010-10-12  2010-10-12  2010-10-12 04:00  2010-10-12 08:00  2010-10-12 12:00  2010-10-12 17:30  2010-10-12 21:30:00
000001     1        2010-10-13  08:00       17:30     04:00        12:00     21:30      2010-10-13  2010-10-13  2010-10-13 04:00  2010-10-13 08:00  2010-10-13 12:00  2010-10-13 17:30  2010-10-13 21:30:00

000002     2        2010-10-10  17:30       05:30     12:00        21:00     09:30      2010-10-10  2010-10-11  2010-10-10 12:00  2010-10-10 17:30  2010-10-10 21:00  2010-10-11 05:30  2010-10-11 09:30:00
000002     2        2010-10-11  17:30       05:30     12:00        21:00     09:30      2010-10-11  2010-10-12  2010-10-11 12:00  2010-10-11 17:30  2010-10-11 21:00  2010-10-12 05:30  2010-10-12 09:30:00
000002     2        2010-10-12  17:30       05:30     12:00        21:00     09:30      2010-10-12  2010-10-13  2010-10-12 12:00  2010-10-12 17:30  2010-10-12 21:00  2010-10-13 05:30  2010-10-13 09:30:00
000002     2        2010-10-13  17:30       05:30     12:00        21:00     09:30      2010-10-13  2010-10-14  2010-10-13 12:00  2010-10-13 17:30  2010-10-13 21:00  2010-10-14 05:30  2010-10-14 09:30:00

您可以从该查询中删除多余的列,但是我包括了所有内容,因此您可以查看/确认要考虑的每行工作日期和预定日期的值.我仍然需要的缩写列表是

You could remove the extra columns from this query, but I included all so you could see / confirm what the values are for consideration of each row and date of work scheduled. The abbreviated list I would still need is

select 
      wd.wdpercode,
      @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewMidDay,
      @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewEndDay,
      cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
      cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
      cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
      cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
      cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut

因此,如果上述方法正确无误,我们现在必须根据此查询计算出的最大范围(每个日期可能有多个记录)来确定每个人的时钟时间

So, if the above is accurate, we now have to get the clock in and out for each person based on the MAXIMUM range computed from this query which COULD more than one record per date

wdpercode  EarliestClockIn    MidShift          MaxClockOut
000001     2010-10-10 04:00   2010-10-10 12:00  2010-10-10 21:30:00
000002     2010-10-10 12:00   2010-10-10 21:00  2010-10-11 09:30:00

因此,在这里,我要对最早进入时钟和最大退出时钟之内的任何日期的扫描时间进行合并,并使用中班时间作为确定他们是否延迟进场还是提前离开的基础.我添加了额外的MIN()和MAX()用于给定人员/班次的到达和离开,只是为了确认您的身份 做和应该看到.

So here, I am doing a join to the scan times for any dates within the earliest clock in and max clock out and using midshift as the basis of determining if they clocked in late vs leaving early. I added the extra MIN() and MAX() for the arrival and departure for a given person / shift just to confirm what you DO AND should be seeing.

MAX(IF())的目的是仅在发生迟/早状态时捕获它们.由于分组依据是每个班次,因此第一条记录(入站时间)可能较晚,您需要该时间,但是第二条用于外出的记录不适用于中班时间,因此将为空白.同样用于检测班次的早退.

The purpose of the MAX( IF() ) is to capture the late / early status ONLY IF they have happened. Since the group by is per shift, the first record (clock in) might be late and you want that time, but the second record for clocking out is not applicable via the mid-shift time and would thus be blank. Similarly for detecting early departure from a shift.

select
      perPerson.wdPerCode,
      perPerson.BeginShift,
      perPerson.EndShift,
      min( TS.scScanTime ) as Arrival,
      max( TS.scScanTime ) as Departure,
      max( IF( TS.scScanTime > perPerson.BeginShift         
           AND TS.scScanTime <= perPerson.MidShift, TS.scScanTime, "" )) as LateArrival,
      max( IF( TS.scScanTime > perPerson.MidShift
           AND TS.scScanTime < perPerson.EndShift, TS.scScanTime, "" )) as EarlyDepart
   from
      ( select
              wd.wdpercode,
              @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, 
                 date_add( wd.wddate, interval 1 day )) as NewMidDay,
              @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, 
                 date_add( wd.wddate, interval 1 day )) as NewEndDay,
              cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
              cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
              cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
              cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
              cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut
           from
              ( select
                      @endDay := '',
                      @midDay := '' ) sqlvars,
              tb_workday wd
                 join tb_shift s
                    on wd.wdshift = s.shiftcode ) perPerson
         JOIN tb_scan TS
            on perPerson.wdpercode = TS.scpercode
            AND TS.scScanTime >= perPerson.EarliestClockIn
            AND TS.scScanTime <= perPerson.MaxClockOut
   group by
      perPerson.wdPerCode,
      perPerson.BeginShift;

我通过您提供的内容创建了表格和示例数据(其中一些数据与示例日期和范围不匹配,因此我进行了调整).

I created tables and sample data from what you provided via (of which some of your data did not match the sample dates and ranges, so I adjusted to do so).

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scscantime` datetime,
  KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_scan 
( scpercode, scscantime ) 
values
( '000001', '2010-10-10 08:02:00' ),
( '000001', '2010-10-10 17:33:00' ),
( '000001', '2010-10-11 07:48:00' ),
( '000001', '2010-10-11 17:29:00' ),
( '000001', '2010-10-12 08:04:00' ),
( '000001', '2010-10-12 17:28:00' ),
( '000002', '2010-10-10 17:31:00' ),
( '000002', '2010-10-11 05:35:00' ),
( '000002', '2010-10-11 17:28:00' ),
( '000002', '2010-10-12 05:29:00' ),
( '000002', '2010-10-12 17:32:00' ),
( '000002', '2010-10-13 05:27:00' );

CREATE TABLE `tb_workday` (
  `wdpercode` varchar(6) DEFAULT NULL,
  `wdshift` varchar(1) DEFAULT NULL,
  `wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_workday 
( wdpercode, wdshift, wddate )
values
( '000001', '1', '2010-10-10' ),
( '000001', '1', '2010-10-11' ),
( '000001', '1', '2010-10-12' ),
( '000001', '1', '2010-10-13' ),
( '000002', '2', '2010-10-10' ),
( '000002', '2', '2010-10-11' ),
( '000002', '2', '2010-10-12' ),
( '000002', '2', '2010-10-13' );


CREATE TABLE `tb_shift` (
  `shiftcode` varchar(1) DEFAULT NULL,
  `shiftbegin2` varchar(8) DEFAULT NULL,
  `shiftbegin` varchar(8) DEFAULT NULL,
  `shiftmid` varchar(8) DEFAULT NULL,
  `shiftend` varchar(8) DEFAULT NULL,
  `shiftend2` varchar(8) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_shift
( shiftcode, shiftbegin2, shiftbegin, shiftmid, shiftend, shiftend2 )
values
( '1', '04:00:00', '08:00:00', '12:00:00', '17:30:00', '21:30:00' ), 
( '2', '12:00:00', '17:30:00', '21:00:00', '05:30:00', '09:30:00' );

样本数据显示每个人都有1:迟到,2:早退,3:3迟到,然后早退.

The sample data shows each person with an 1: arrive late, 2: depart early, 3: arrive late AND depart early.

wdPerCode  BeginShift         EndShift           Arrival            Departure          LateArrival        EarlyDepart
000001     2010-10-10 08:00   2010-10-10 17:30   2010-10-10 08:02   2010-10-10 17:33   2010-10-10 08:02
000001     2010-10-11 08:00   2010-10-11 17:30   2010-10-11 07:48   2010-10-11 17:29                      2010-10-11 17:29
000001     2010-10-12 08:00   2010-10-12 17:30   2010-10-12 08:04   2010-10-12 17:28   2010-10-12 08:04   2010-10-12 17:28

000002     2010-10-10 17:30   2010-10-11 05:30   2010-10-10 17:31   2010-10-11 05:35   2010-10-10 17:31
000002     2010-10-11 17:30   2010-10-12 05:30   2010-10-11 17:28   2010-10-12 05:29                      2010-10-12 05:29
000002     2010-10-12 17:30   2010-10-13 05:30   2010-10-12 17:32   2010-10-13 05:27   2010-10-12 17:32   2010-10-13 05:27

为了优化查询,我将更改您在扫描表上的索引

For optimizing the query, I would change your index on the scan table

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scscantime` datetime,
  KEY `personDate` (`scpercode`, `scscantime` )

这篇关于如何使用OR使多个LEFT JOIN完全使用复合索引? (第2部分)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆