MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列 [英] MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

查看:132
本文介绍了MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的MySQL Web分析数据库包含一个摘要表,该表在导入新活动时全天进行更新.我们使用ON DUPLICATE KEY UPDATE来使摘要覆盖早期的计算,但是由于摘要表的UNIQUE KEY中的列之一是可选的FK,并且包含NULL值,因此存在困难.

Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values.

这些NULL旨在表示不存在,并且所有此类情况都是等效的".当然,MySQL通常将NULL视为未知,并且所有这种情况都不相同".

These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and all such cases are not equivalent".

基本结构如下:

活动"表包含每个会话的条目,每个会话都属于一个广告系列,并为某些条目提供了可选的过滤器和交易ID.

An "Activity" table containing an entry for each session, each belonging to a campaign, with optional filter and transaction IDs for some entries.

CREATE TABLE `Activity` (
    `session_id` INTEGER AUTO_INCREMENT
    , `campaign_id` INTEGER NOT NULL
    , `filter_id` INTEGER DEFAULT NULL
    , `transaction_id` INTEGER DEFAULT NULL
    , PRIMARY KEY (`session_id`)
);

一个摘要"表,其中包含活动表中会话总数的每日汇总,以及包含事务ID的那些会话总数.这些摘要被分解,每个广告系列和(可选)过滤器组合都有一个.这是使用MyISAM的非事务表.

A "Summary" table containing daily rollups of total number of sessions in activity table, an d the total number of those sessions which contain a transaction ID. These summaries are split up, with one for every combination of campaign and (optional) filter. This is a non-transactional table using MyISAM.

CREATE TABLE `Summary` (
    `day` DATE NOT NULL
    , `campaign_id` INTEGER NOT NULL
    , `filter_id` INTEGER DEFAULT NULL
    , `sessions` INTEGER UNSIGNED DEFAULT NULL
    , `transactions` INTEGER UNSIGNED DEFAULT NULL
    , UNIQUE KEY (`day`, `campaign_id`, `filter_id`)
) ENGINE=MyISAM;

实际的汇总查询类似于以下内容,计算会话和交易的数量,然后按广告系列和(可选)过滤器分组.

The actual summarization query is something like the following, counting up the number of sessions and transactions, then grouping by campaign and (optional) filter.

INSERT INTO `Summary` 
    (`day`, `campaign_id`, `filter_id`, `sessions`, `transactions`)
    SELECT `day`, `campaign_id`, `filter_id
        , COUNT(`session_id`) AS `sessions`
        , COUNT(`transaction_id` IS NOT NULL) AS `transactions`
    FROM Activity
    GROUP BY `day`, `campaign_id`, `filter_id`
ON DUPLICATE KEY UPDATE
    `sessions` = VALUES(`sessions`)
    , `transactions` = VALUES(`transactions`)
;

除了filter_id为NULL的情况的摘要之外,一切都很好.在这些情况下,ON DUPLICATE KEY UPDATE子句与现有行不匹配,并且每次都会写入新行.这是由于"NULL!= NULL"这一事实.但是,在比较唯一键时,我们需要的是"NULL = NULL".

Everything works great, except for the summary of cases where the filter_id is NULL. In these cases, the ON DUPLICATE KEY UPDATE clause does not match the existing row, and a new row is written every time. This is due to the fact that "NULL != NULL". What we need, however, is "NULL = NULL" when comparing the unique keys.

我正在寻找解决方法的想法或对到目前为止我们提出的解决方案的反馈.到目前为止,我们已经想到了解决方法.

I am looking for ideas for workarounds or feedback on those we have come up with so far. Workarounds we have thought of so far follow.

  1. 在运行摘要之前,删除所有包含NULL键值的摘要条目. (这就是我们现在正在做的) 如果在汇总过程中执行了查询,则具有返回缺少数据的结果的负面影响.

  1. Delete all summary entries containing a NULL key value prior to running the summarization. (This is what we are doing now) This has the negative side effect of returning results with missing data if a query is executed during the summarization process.

将DEFAULT NULL列更改为DEFAULT 0,这将使UNIQUE KEY保持一致. 这具有使针对汇总表的查询开发过于复杂的负面影响.这迫使我们使用大量的"CASE filter_id = 0 THEN NULL ELSE filter_id END",并且由于其他所有表的filter_id都为NULL,因此造成了尴尬的连接.

Change the DEFAULT NULL column to DEFAULT 0, which allows the UNIQUE KEY to be matched consistently. This has the negative side effect of overly complicating the development of queries against the summary table. It forces us to use a lot of "CASE filter_id = 0 THEN NULL ELSE filter_id END", and makes for awkward joining since all of the other tables have actual NULLs for the filter_id.

创建一个返回"CASE filter_id = 0 THEN NULL ELSE filter_id END"的视图,并直接使用此视图代替表. 摘要表包含几十万行,并且有人告诉我视图性能很差.

Create a view which returns "CASE filter_id = 0 THEN NULL ELSE filter_id END", and using this view instead of the table directly. The summary table contains a few hundred thousand rows, and I've been told view performance is quite poor.

允许创建重复的条目,并在汇总完成后删除旧的条目. 与提前删除它们有类似的问题.

Allow the duplicate entries to be created, and delete the old entries after summarization completes. Has similar problems to deleting them ahead of time.

添加一个包含0的NULL替代列,并在UNIQUE KEY中使用该替代(实际上,如果所有列都不为NULL,我们可以使用PRIMARY KEY).
该解决方案似乎是合理的,只是上面的示例仅是一个示例.实际的数据库包含六个摘要表,其中一个包含UNIQUE KEY中的四个可为空的列.有人担心开销太大.

Add a surrogate column which contains 0 for NULL, and use that surrogate in the UNIQUE KEY (actually we could use PRIMARY KEY if all columns are NOT NULL).
This solution seems reasonable, except that the example above is only an example; the actual database contains half a dozen summary tables, one of which contains four nullable columns in the UNIQUE KEY. There is concern by some that the overhead is too much.

您是否有更好的解决方法,表结构,更新过程或MySQL最佳实践可以提供帮助?

Do you have a better workaround, table structure, update process or MySQL best practice which can help?

澄清空值的含义"

仅在以下情况下才认为包含NULL列的摘要行中的数据属于在一起:摘要报告中的数据是单个全部捕获"行,汇总了该数据点不存在或未知的那些项.因此,在汇总表本身的上下文中,含义是这些值未知的条目的总和".另一方面,在关系表中,这些结果实际上是NULL.

The data in the summary rows containing NULL columns are considered to belong together only in the sense that of being a single "catch-all" row in summary reports, summarizing those items for which that data point does not exist or is unknown. So within the context of the summary table itself, the meaning is "the sum of those entries for which no value is known". Within the relational tables, on the other hand, these truly are NULL results.

将它们放入摘要表中的唯一键的唯一原因是在重新计算摘要报告时允许自动更新(通过ON DUPLICATE KEY UPDATE).

The only reason for putting them into a unique key on the summary table is to allow for automatic update (by ON DUPLICATE KEY UPDATE) when re-calculating the summary reports.

也许通过以下特定示例可以更好地描述该结果:摘要表之一通过响应者给定的公司地址的邮政编码前缀在地理上对结果进行分组.并非所有受访者都提供公司地址,因此交易和地址表之间的关系完全正确为NULL.在此数据的摘要表中,将为每个邮政编码前缀生成一行,其中包含该区域内的数据摘要.会生成另一行以显示没有邮政编码前缀的数据摘要.

Maybe a better way to describe it is by the specific example that one of the summary tables groups results geographically by the zip code prefix of the business address given by the respondent. Not all respondents provide a business address, so the relationship between the transaction and addresses table is quite correctly NULL. In the summary table for this data, a row is generated for each zip code prefix, containing the summary of data within that area. An additional row is generated to show the summary of data for which no zip code prefix is known.

更改其余数据表以使其具有显式的"THERE_IS_NO_ZIP_CODE" 0值,并在ZipCodePrefix表中放置一个表示该值的特殊记录是不正确的-该关系确实为NULL.

Altering the rest of the data tables to have an explicit "THERE_IS_NO_ZIP_CODE" 0-value, and placing a special record in the ZipCodePrefix table representing this value, is improper--that relationship truly is NULL.

推荐答案

我认为(2)的观点确实是最好的选择,或者至少是从头开始的时候.在SQL中,NULL表示未知.如果您想要其他含义,则确实应该为此使用一个特殊值,并且0当然是一个不错的选择.

I think something along the lines of (2) is really the best bet — or, at least, it would be if you were starting from scratch. In SQL, NULL means unknown. If you want some other meaning, you really ought to use a special value for that, and 0 is certainly an OK choice.

您应该跨整个数据库执行此操作,而不仅限于此一个表.然后,您不应该结束怪异的特殊情况.实际上,您应该能够摆脱许多当前的情况(例如:当前,如果您希望没有过滤器的汇总行,则有特殊情况过滤器为空",而不是正常情况) "filter =?".)

You should do this across the entire database, not just this one table. Then you shouldn't wind up with weird special cases. In fact, you should be able to get rid of a lot of your current ones (example: currently, if you want the summary row where there is no filter, you have the special case "filter is null" as opposed to the normal case "filter = ?".)

您还应该继续在引用表中创建一个不存在"条目,以保持FK约束有效(并避免特殊情况).

You should also go ahead and create a "not present" entry in the referred-to table as well, to keep the FK constraint valid (and avoid special cases).

PS:没有主键的表不是关系表,应避免使用.

PS: Tables w/o a primary key are not relational tables and should really be avoided.

在这种情况下,您实际上是否需要对重复键进行更新?如果您正在执行INSERT ... SELECT,那么您可能会这样做.但是,如果您的应用程序正在提供数据,则只需手动进行即可-进行更新(将zip = null映射为zip is null),检查更改了多少行(MySQL返回此行),如果为0,则执行插入操作.

Hmmm, in that case, do you actually need the on duplicate key update? If you're doing a INSERT ... SELECT, then you probably do. But if your app is supplying the data, just do it by hand — do the update (mapping zip = null to zip is null), check how many rows were changed (MySQL returns this), if 0 do an insert.

这篇关于MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆