MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列 [英] MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

查看：132 发布时间：2020/7/10 3:39:16 mysql nullable summarization

本文介绍了MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们的MySQL Web分析数据库包含一个摘要表，该表在导入新活动时全天进行更新.我们使用ON DUPLICATE KEY UPDATE来使摘要覆盖早期的计算，但是由于摘要表的UNIQUE KEY中的列之一是可选的FK，并且包含NULL值，因此存在困难.

Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values.

这些NULL旨在表示不存在，并且所有此类情况都是等效的".当然，MySQL通常将NULL视为未知，并且所有这种情况都不相同".

These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and all such cases are not equivalent".

基本结构如下:

活动"表包含每个会话的条目，每个会话都属于一个广告系列，并为某些条目提供了可选的过滤器和交易ID.

An "Activity" table containing an entry for each session, each belonging to a campaign, with optional filter and transaction IDs for some entries.

CREATE TABLE `Activity` (
    `session_id` INTEGER AUTO_INCREMENT
    , `campaign_id` INTEGER NOT NULL
    , `filter_id` INTEGER DEFAULT NULL
    , `transaction_id` INTEGER DEFAULT NULL
    , PRIMARY KEY (`session_id`)
);

一个摘要"表，其中包含活动表中会话总数的每日汇总，以及包含事务ID的那些会话总数.这些摘要被分解，每个广告系列和(可选)过滤器组合都有一个.这是使用MyISAM的非事务表.

A "Summary" table containing daily rollups of total number of sessions in activity table, an d the total number of those sessions which contain a transaction ID. These summaries are split up, with one for every combination of campaign and (optional) filter. This is a non-transactional table using MyISAM.

CREATE TABLE `Summary` (
    `day` DATE NOT NULL
    , `campaign_id` INTEGER NOT NULL
    , `filter_id` INTEGER DEFAULT NULL
    , `sessions` INTEGER UNSIGNED DEFAULT NULL
    , `transactions` INTEGER UNSIGNED DEFAULT NULL
    , UNIQUE KEY (`day`, `campaign_id`, `filter_id`)
) ENGINE=MyISAM;

实际的汇总查询类似于以下内容，计算会话和交易的数量，然后按广告系列和(可选)过滤器分组.

The actual summarization query is something like the following, counting up the number of sessions and transactions, then grouping by campaign and (optional) filter.

INSERT INTO `Summary` 
    (`day`, `campaign_id`, `filter_id`, `sessions`, `transactions`)
    SELECT `day`, `campaign_id`, `filter_id
        , COUNT(`session_id`) AS `sessions`
        , COUNT(`transaction_id` IS NOT NULL) AS `transactions`
    FROM Activity
    GROUP BY `day`, `campaign_id`, `filter_id`
ON DUPLICATE KEY UPDATE
    `sessions` = VALUES(`sessions`)
    , `transactions` = VALUES(`transactions`)
;

除了filter_id为NULL的情况的摘要之外，一切都很好.在这些情况下，ON DUPLICATE KEY UPDATE子句与现有行不匹配，并且每次都会写入新行.这是由于"NULL！= NULL"这一事实.但是，在比较唯一键时，我们需要的是"NULL = NULL".

Everything works great, except for the summary of cases where the filter_id is NULL. In these cases, the ON DUPLICATE KEY UPDATE clause does not match the existing row, and a new row is written every time. This is due to the fact that "NULL != NULL". What we need, however, is "NULL = NULL" when comparing the unique keys.

我正在寻找解决方法的想法或对到目前为止我们提出的解决方案的反馈.到目前为止，我们已经想到了解决方法.

I am looking for ideas for workarounds or feedback on those we have come up with so far. Workarounds we have thought of so far follow.

在运行摘要之前，删除所有包含NULL键值的摘要条目. (这就是我们现在正在做的) 如果在汇总过程中执行了查询，则具有返回缺少数据的结果的负面影响.

Delete all summary entries containing a NULL key value prior to running the summarization. (This is what we are doing now) This has the negative side effect of returning results with missing data if a query is executed during the summarization process.

将DEFAULT NULL列更改为DEFAULT 0，这将使UNIQUE KEY保持一致. 这具有使针对汇总表的查询开发过于复杂的负面影响.这迫使我们使用大量的"CASE filter_id = 0 THEN NULL ELSE filter_id END"，并且由于其他所有表的filter_id都为NULL，因此造成了尴尬的连接.

Change the DEFAULT NULL column to DEFAULT 0, which allows the UNIQUE KEY to be matched consistently. This has the negative side effect of overly complicating the development of queries against the summary table. It forces us to use a lot of "CASE filter_id = 0 THEN NULL ELSE filter_id END", and makes for awkward joining since all of the other tables have actual NULLs for the filter_id.

创建一个返回"CASE filter_id = 0 THEN NULL ELSE filter_id END"的视图，并直接使用此视图代替表. 摘要表包含几十万行，并且有人告诉我视图性能很差.

Create a view which returns "CASE filter_id = 0 THEN NULL ELSE filter_id END", and using this view instead of the table directly. The summary table contains a few hundred thousand rows, and I've been told view performance is quite poor.

允许创建重复的条目，并在汇总完成后删除旧的条目. 与提前删除它们有类似的问题.

Allow the duplicate entries to be created, and delete the old entries after summarization completes. Has similar problems to deleting them ahead of time.

添加一个包含0的NULL替代列，并在UNIQUE KEY中使用该替代(实际上，如果所有列都不为NULL，我们可以使用PRIMARY KEY).
该解决方案似乎是合理的，只是上面的示例仅是一个示例.实际的数据库包含六个摘要表，其中一个包含UNIQUE KEY中的四个可为空的列.有人担心开销太大.

Add a surrogate column which contains 0 for NULL, and use that surrogate in the UNIQUE KEY (actually we could use PRIMARY KEY if all columns are NOT NULL).
This solution seems reasonable, except that the example above is only an example; the actual database contains half a dozen summary tables, one of which contains four nullable columns in the UNIQUE KEY. There is concern by some that the overhead is too much.

您是否有更好的解决方法，表结构，更新过程或MySQL最佳实践可以提供帮助?

Do you have a better workaround, table structure, update process or MySQL best practice which can help?

澄清空值的含义"

仅在以下情况下才认为包含NULL列的摘要行中的数据属于在一起:摘要报告中的数据是单个全部捕获"行，汇总了该数据点不存在或未知的那些项.因此，在汇总表本身的上下文中，含义是这些值未知的条目的总和".另一方面，在关系表中，这些结果实际上是NULL.

The data in the summary rows containing NULL columns are considered to belong together only in the sense that of being a single "catch-all" row in summary reports, summarizing those items for which that data point does not exist or is unknown. So within the context of the summary table itself, the meaning is "the sum of those entries for which no value is known". Within the relational tables, on the other hand, these truly are NULL results.

将它们放入摘要表中的唯一键的唯一原因是在重新计算摘要报告时允许自动更新(通过ON DUPLICATE KEY UPDATE).

The only reason for putting them into a unique key on the summary table is to allow for automatic update (by ON DUPLICATE KEY UPDATE) when re-calculating the summary reports.

也许通过以下特定示例可以更好地描述该结果:摘要表之一通过响应者给定的公司地址的邮政编码前缀在地理上对结果进行分组.并非所有受访者都提供公司地址，因此交易和地址表之间的关系完全正确为NULL.在此数据的摘要表中，将为每个邮政编码前缀生成一行，其中包含该区域内的数据摘要.会生成另一行以显示没有邮政编码前缀的数据摘要.

Maybe a better way to describe it is by the specific example that one of the summary tables groups results geographically by the zip code prefix of the business address given by the respondent. Not all respondents provide a business address, so the relationship between the transaction and addresses table is quite correctly NULL. In the summary table for this data, a row is generated for each zip code prefix, containing the summary of data within that area. An additional row is generated to show the summary of data for which no zip code prefix is known.

更改其余数据表以使其具有显式的"THERE_IS_NO_ZIP_CODE" 0值，并在ZipCodePrefix表中放置一个表示该值的特殊记录是不正确的-该关系确实为NULL.

Altering the rest of the data tables to have an explicit "THERE_IS_NO_ZIP_CODE" 0-value, and placing a special record in the ZipCodePrefix table representing this value, is improper--that relationship truly is NULL.

MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列 [英] MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

MySQL ON DUPLICATE KEY UPDATE在唯一键中具有可空列 [英] MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭