Mysql存储函数和groupwise min [英] Mysql stored functions and groupwise min

查看:161
本文介绍了Mysql存储函数和groupwise min的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

架构



数据库架构简化

事件表

此表存储事件(
`event_id` bigint(20)unsigned NOT NULL AUTO_INCREMENT,$ b $(

$ b

  CREATE TABLE` b'isPublic` tinyint(1)NOT NULL DEFAULT'1',
PRIMARY KEY(`event_id`)
)ENGINE = InnoDB DEFAULT CHARSET = latin1;

地点表

存储地点的简单表格。一个事件可以在多个地方。

  CREATE TABLE`Places`(
`place_id` bigint(20 )unsigned NOT NULL AUTO_INCREMENT,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
PRIMARY KEY(`place_id`),
KEY`latind`(`纬度,经度)
)ENGINE = InnoDB CHARSET = latin1;

规则表

存储事件。一个事件可以有更多的时间表。所有日期均采用unixtimestamp格式。常规意味着此规则有一些重复计划存储在RegularRules表中。

  CREATE TABLE`Rules`(
``rule_id` bigint(20)unsigned NOT NULL AUTO_INCREMENT,
`start_date` int(11)NOT NULL,
`end_date` int(11)NOT NULL,
`regular` tinyint(1 )NOT NULL DEFAULT'0',
PRIMARY KEY(`rule_id`),
KEY`endindx`(`end_date`)
)ENGINE = InnoDB CHARSET = latin1;

常规规则

存储可重复计划的表格格式如下。 day_start / end表示从当天(00:00)开始到事件开始的秒数。例如,事件发生在每个星期一从10:00到18:00。我们将在Rules表中存储 start_date end_date ,这些值表示事件的时间限制。在RegularRules表中,我们在 mon_start 中有36000个,在 mon_end 中有64800个。

  CREATE TABLE`RegularRules`(
`repeatition_id` bigint(11)unsigned NOT NULL AUTO_INCREMENT,
`rule_id` bigint(20)unsigned NOT NULL,
`mon_start` int(11)DEFAULT NULL,
`tue_start` int(11)DEFAULT NULL,
`wed_start` int(11)DEFAULT NULL,
`th_start int(11)DEFAULT NULL,
`fr_start` int(11)DEFAULT NULL,
`sat_start` int(11)DEFAULT NULL,
`sun_start` int(11)DEFAULT NULL,
`mon_end` int(11)DEFAULT NULL,
`tue_end` int(11)DEFAULT NULL,
`wed_end` int(11)DEFAULT NULL,
`th_end` int (11)DEFAULT NULL,
`fr_end` int(11)DEFAULT NULL,
`sat_end` int(11)DEFAULT NULL,
`sun_end` int(11)DEFAULT NULL,
PRIMARY KEY('repetition_id`),
KEY`fk_rule_id_regularrules_idx`(`rule_id`),
CONSTRAINT`fk_rule_id_regularrules` FOREIGN KEY(`rule_id`)
REFERENCES`Rules`(`rule_id `)ON DELE TE CASCADE ON UPDATE NO ACTION
)ENGINE = InnoDB CHARSET = latin1;

活动地点规则

表格连接上面的所有表。

  CREATE TABLE EPR(
`hold_id` bigint(30)NOT NULL AUTO_INCREMENT,
`event_id` bigint(20)unsigned NOT NULL,
`place_id` bigint(20)unsigned NOT NULL,
`rule_id` bigint(20)unsigned NOT NULL,
PRIMARY KEY(`hold_id`),
UNIQUE KEY`compound`(`place_id`,`event_id`,`rule_id`),
KEY`FK_Places-Company Events-Rules_Event_id`(`event_id`),
KEY`FK_Places-Company Events-Rules_Places_place_id`(`place_id`),
KEY`FK_Places-Company Events-Rules_Rules_rule_id`(`rule_id`),
CONSTRAINT`FK_Places-Company Events-Rules_Events_event_id`
FOREIGN KEY(`event_id`)REFERENCES`Events`(`event_id`)ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT`FK_Places-Company Events-Rules_Rules_rule_id`
FOREIGN KEY (`rule_id`)参考`Rules`(`rule_id`)ON DELETE CASCADE ON
UPD ATE CASCADE,
CONSTRAINT`fk_place_id_pcerc` FOREIGN KEY(`place_id`)
REFERENCES`Places`(`place_id`)ON DELETE CASCADE ON UPDATE CASCADE
)ENGINE = InnoDB CHARSET = latin1;



存储函数



功能。 GETBEGINS GETENDS 。参数:rule_id,timestamp,curtimestamp.Timestamp是一天的unixtimestamp,curtimestamp是当天开始的unixtimestamp。
这些功能的工作如下。对于每条规则,他们返回规则的开始(开始)和结束(结束)。如果规则不可重复,则返回存储在Rules表中的 start_date end_date 。如果规则是可重复的,则它们构造RegularRules表的最接近的非空day_start / day_end的开始结束 。例如,有一个事件有2个规则。第一个不可重复的开始 start_timestamp 和结束 end_timestamp 。第二个是可重复的,并且只有两个非空字段: mon_start = 36000 mon_end = 64800 GETBEGINS 将根据当前unixtimetamp和当天开始的当前unixtimestamp,在unixtimestamp中打开 mon_start GETBEGINS 类似地工作。这些功能的代码将在必要时提供。


有问题的查询



这个查询应该返回正在进行的地理上和时间上最接近的事件。地方应该是明确的。所以,查询应该为每个地方返回按时间顺序最接近的事件,并在最后根据时间和距离排序结果值与一些系数(我认为排序部分将被移动到像PHP这样的服务器端语言。这种排序我可以接受任何解决方案)。例如,10个电影院附近有5部电影。每个电影院有100个时间表。查询应该为每个电影返回按时间顺序最接近的电影,然后根据两个值的时间和距离对电影和电影院进行分类。

预期查询 >
latpoin,longpoint,r - 传递给脚本的坐标和半径,
curstamp - 当天开始时的unixtimestamp,
时间戳 - 当前unixtimestamp

 选择
epr.event_id,
epr.place_id,
epr.rule_id,
(6371 * ACOS(COS(RADIANS(latpoint))* COS(RADIANS(latitude))*
COS(RADIANS(经度) - RADIANS(长点))+ SIN(RADIANS(latpoint))*
SIN(RADIANS (纬度))))AS距离,
p.latitude,
p.longitude,
GETBEGINS(r.rule_id,curstamp,timestamp)AS开始,
GETENDS(r。 (ABS)(GETBEGINS(r.rule_id,curstamp,timestamp) - timestamp))AS
time_min
FROM
事件e
INNER JOIN
EPR epr ON e.event_id = epr.event_id
INNER JOIN
地点p ON epr.place_id = p.place_id
INNER JOIN
规则r ON epr.rule_id = r.rule_id
WHERE
r.end_date> =时间戳
和latitude BETWEEN latpoint - (r / 111.045)AND latpoint +(r /
111.045)
和经度BETWEEN longpoint - (r /(111.045 *
COS(RADIANS(latpoint))))AND longpoint +(r /(111.045 *
COS(RADIANS(latpoint))))
AND e.isPublic = 1
GROUP BY epr.place_id

正如在主题this查询混合返回值。更具体地说,它匹配错误的rule_id,开始,结束于place_id组。
此外,这个查询性能很差。表格的大小:事件 - 3000行,地点 - 8000行,规则18000行,EPR-15000行。这些查询在使用索引提示(使用索引化合物)和1.2没有提示时使用大约1.8秒。不使用索引提示查询进行全表扫描。
我已阅读官方mysql文档有关此主题。但是,由于用户计算的值( GETBEGINS GETENDS ),因此它们的解决方案并不可靠。


$ b $ h






$ =http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html =nofollow> mysql处理组。所以可能的解决方案是使函数GETBEGINS和GETENDS用户定义的聚合函数以这种方式mysql可能会返回适当的结果?这个解决方案合乎逻辑吗是否会使函数 GETBEGINS GETENDS 汇总帮助?在这种情况下,mysql是否会返回适当的数据? 结论 解决方案,有关索引和数据库体系结构的评论,我们非常感谢并欢迎。 解决方案

groupwise max不能保证能正常工作。事实上,MariaDB破坏了它,但提供了一个让它恢复的设置。这就是我所指的:

$ $ $
$ SELECT $ ORB BY ...)
GROUP BY ...

您想要第一个(或最后一个)来自内部查询的每个组中。问题是SQL可以自由地优化这个意图。



文档中的groupwise max代码非常低效。



为了加快查询速度,可能需要帮助的是将 Rules Places 部分WHERE子句并将其转换为仅返回相应表的PRIMARY KEY的子查询。然后将其与所有表(包括JOIN回到同一个表)进行联接。您已经为该子查询设置了覆盖索引,以便它可以是使用索引(使用EXPLAIN使用的术语)。

innodb_buffer_pool_size设置为大约70可用内存的百分比?



BIGINT需要8个字节;你可以住在MEDIUMINT UNSIGNED(0..16M)。更小 - >更多可缓存 - >更少的I / O - >更快。



lat / lng的DOUBLE对需要16个字节。 FLOAT对需要8个字节并具有6英尺/ 2米的分辨率。或者十进制(6,4)为纬度,(7,4)为经度为7字节和52英尺/ 16米分辨率。对于商店来说足够好了,特别是因为您使用的是方形而不是圆形作为距离。



代码为find the nearest ...很难优化。这里是我所想到的最好的: http://mysql.rjweb.org/doc.php / latlng


Schema

Database schema is simplified
Events table
This table stores events.

CREATE TABLE `Events` (
`event_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`isPublic` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`event_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Places table
Simple table that stores places. One event can be in more than one place.

CREATE TABLE `Places` (
`place_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
PRIMARY KEY (`place_id`),
KEY `latind` (`latitude`,`longitude`)
) ENGINE=InnoDB CHARSET=latin1;

Rules table
Table that stores schedules of events. One event can have more that one schedule. All dates are in unixtimestamp format. Regular means that this rule has some repeating schedule that is stored in RegularRules table.

CREATE TABLE `Rules` (
`rule_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_date` int(11) NOT NULL,
`end_date` int(11) NOT NULL,
`regular` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`rule_id`),
KEY `endindx` (`end_date`)
) ENGINE=InnoDB CHARSET=latin1;

RegularRules
Table that stores repeatable schedules in the following format. day_start/end means number of seconds from the beggining of the day (00:00) to the starting of the event. For example, event takes place every monday from 10:00 to 18:00. We will store start_date and end_date in Rules table, these values represent time limits of the event. In the RegularRules table we will have 36000 in mon_start and 64800 in mon_end.

CREATE TABLE `RegularRules` (
`repetition_id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
`rule_id` bigint(20) unsigned NOT NULL,
`mon_start` int(11) DEFAULT NULL,
`tue_start` int(11) DEFAULT NULL,
`wed_start` int(11) DEFAULT NULL,
`th_start` int(11) DEFAULT NULL,
`fr_start` int(11) DEFAULT NULL,
`sat_start` int(11) DEFAULT NULL,
`sun_start` int(11) DEFAULT NULL,
`mon_end` int(11) DEFAULT NULL,
`tue_end` int(11) DEFAULT NULL,
`wed_end` int(11) DEFAULT NULL,
`th_end` int(11) DEFAULT NULL,
`fr_end` int(11) DEFAULT NULL,
`sat_end` int(11) DEFAULT NULL,
`sun_end` int(11) DEFAULT NULL,
PRIMARY KEY (`repetition_id`),
KEY `fk_rule_id_regularrules_idx` (`rule_id`),
CONSTRAINT `fk_rule_id_regularrules` FOREIGN KEY (`rule_id`) 
REFERENCES `Rules` (`rule_id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB  CHARSET=latin1;

Events-Places-Rules
Table that connects all of the above tables.

CREATE TABLE EPR (
`holding_id` bigint(30) NOT NULL AUTO_INCREMENT,
`event_id` bigint(20) unsigned NOT NULL,
`place_id` bigint(20) unsigned NOT NULL,
`rule_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`holding_id`),
UNIQUE KEY `compound` (`place_id`,`event_id`,`rule_id`),
KEY `FK_Places-Company Events-Rules_Events_event_id` (`event_id`),
KEY `FK_Places-Company Events-Rules_Places_place_id` (`place_id`),
KEY `FK_Places-Company Events-Rules_Rules_rule_id` (`rule_id`),
CONSTRAINT `FK_Places-Company Events-Rules_Events_event_id` 
FOREIGN KEY  (`event_id`) REFERENCES `Events` (`event_id`) ON DELETE CASCADE 
ON UPDATE  CASCADE,
CONSTRAINT `FK_Places-Company Events-Rules_Rules_rule_id` 
FOREIGN KEY  (`rule_id`) REFERENCES `Rules` (`rule_id`) ON DELETE CASCADE ON  
UPDATE CASCADE,
CONSTRAINT `fk_place_id_pcerc` FOREIGN KEY (`place_id`) 
REFERENCES `Places` (`place_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB CHARSET=latin1;

Stored functions

There are two stored functions. GETBEGINS and GETENDS. Parameters: rule_id, timestamp,curtimestamp.Timestamp is unixtimestamp of the day, curtimestamp is the unixtimestamp of the beggining of the current day. These functions work as follows. For each rule they are returning the beggining of the rule(begins) and the ending(ends). If the rule is not repeatable, they return start_date and end_date that are stored in the Rules table. If the rule is repeatable, they construct begins and ends of the closest non-null day_start/day_end of the RegularRules table. For instance, there is an event that has 2 rules. The first one is not repeatable with begins start_timestamp and ends end_timestamp. The second one is repeatable and has just two non-null fields: mon_start = 36000 and mon_end = 64800. GETBEGINS will turn mon_start in unixtimestamp based on current unixtimetamp and current unixtimestamp of the beggining of the day. GETBEGINS works simmilarly. Code of these functions will be provided if necessary.

Problematic query

This query should return ongoing geographically- and chronologically-closest events. Places should be distinct. So query should for each place return chronologically closest event and at the end sort resulting values depending on time and distance with some coefficients (I think sorting part will be moved to the server-side language like PHP. If you have suggestions about this sorting I am open to any solution). For example, there are 5 movies in 10 cinemas near by. Each cinema has 100 schedules. Query should return for each cinema the chronologically closest movie and then sort movies and cinemas depending on two values time and distance.

Intended query
latpoin,longpoint,r - are coordinates and radius that are passed to the script, curstamp - unixtimestamp of the beggining of the day, timestamp - current unixtimestamp

SELECT 
    epr.event_id,
    epr.place_id,
    epr.rule_id,
    (6371 * ACOS(COS(RADIANS(latpoint)) * COS(RADIANS(latitude)) *  
    COS(RADIANS(longitude) - RADIANS(longpoint)) + SIN(RADIANS(latpoint)) * 
    SIN(RADIANS(latitude)))) AS distance,
    p.latitude,
    p.longitude,
    GETBEGINS(r.rule_id, curstamp, timestamp) AS begins,
    GETENDS(r.rule_id, curstamp, timestamp) AS ends,
    MIN(ABS(GETBEGINS(r.rule_id, curstamp, timestamp) - timestamp)) AS   
    time_min
FROM
    Events e
        INNER JOIN
    EPR epr ON e.event_id = epr.event_id
        INNER JOIN
    Places p ON epr.place_id = p.place_id
        INNER JOIN
    Rules r ON epr.rule_id = r.rule_id
WHERE
    r.end_date >= timestamp
        AND latitude BETWEEN latpoint - (r / 111.045) AND latpoint + (r /   
        111.045)
        AND longitude BETWEEN longpoint - (r / (111.045 *  
        COS(RADIANS(latpoint)))) AND longpoint + (r / (111.045 * 
        COS(RADIANS(latpoint))))
        AND e.isPublic = 1
GROUP BY epr.place_id

As stated in the topic this query mixes returning values. To be more specific it matches wrong rule_id,begins,ends to the place_id group. Moreover this query performs quite poorly. Table's size: Events - 3000rows, Places- 8000rows, Rules 18000rows, EPR-15000rows. These query works approximately 1.8 second when using index hint (use index compound) and 1.2 without one. Without using index hint query makes full table scan. I have read official mysql docs regarding this subject. However their solution is not sutable because of user-calculated values (GETBEGINS and GETENDS).

Question

Query provided in the Intended query section has groupwise min problem because of the way mysql handles group by. So possible solution is to make functions GETBEGINS and GETENDS user-defined aggregated functions in this way mysql possibly will return appropiate result? Is this solution logical? Will making functions GETBEGINS and GETENDS aggregated help? Will mysql return appropiate data in that case?

Conclusion

Comments about provided solutions, new solutions, comments about indexing and about database architecture are appreciated and welcomed.

解决方案

The groupwise max is not guaranteed to work. In fact, MariaDB broke it, but provided a setting to get it back. This is what I am referring to:

SELECT  *
    FROM  
      ( SELECT  ...  ORDER BY ... )
    GROUP BY ...

where you want the first (or last) in each group from the inner query. The problem is that SQL is free to optimize away that intent.

The groupwise max code in the docs is terribly inefficient.

To speed up the query, a likely bit of help is to isolate the Rules or Places part of the WHERE clause and make that into a subquery which returns just the PRIMARY KEY of the corresponding table. Then put that into a JOIN with all the tables (including a JOIN back to the same table). You already have a "covering index" for that subquery so that it can be "Using index" (in the jargon used by EXPLAIN).

Is innodb_buffer_pool_size set to about 70% of available RAM?

BIGINT takes 8 bytes; you could probably live with MEDIUMINT UNSIGNED (0..16M). Smaller --> more cacheable --> less I/O --> faster.

The pair of DOUBLEs for lat/lng take 16 bytes. A FLOAT pair would take 8 bytes and have 6-foot / 2m resolution. Or DECIMAL(6,4) for latitude and (7,4) for longitude for 7 bytes and a 52 foot / 16m resolution. Good enough for "stores", especially since you are using a 'square' instead of a 'circle' for distance.

Code for "find the nearest ..." is hard to optimize. Here is the best I have come up with: http://mysql.rjweb.org/doc.php/latlng

这篇关于Mysql存储函数和groupwise min的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆