使用WHERE子句更新语句,该子句包含具有空值的列 [英] Update statement using a WHERE clause that contains columns with null Values

查看:175
本文介绍了使用WHERE子句更新语句,该子句包含具有空值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用另一个表中的数据更新一个表上的列。 WHERE 子句基于多个列,并且某些列为空。从我的想法来看,这些空值是丢弃的是您的标准 UPDATE TABLE SET X = Y WHERE A = B 语句的含义。

I am updating a column on one table using data from another table. The WHERE clause is based on multiple columns and some of the columns are null. From my thinking, this nulls are what are throwing off your standard UPDATE TABLE SET X=Y WHERE A=B statement.

请参见此SQL提琴试图根据 table_two 中的数据更新 table_one 的两个表中的一个。
我的查询当前如下所示:

See this SQL Fiddle of the two tables where am trying to update table_one based on data from table_two. My query currently looks like this:

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
table_one.invoice_number = table_two.invoice_number AND
table_one.submitted_by = table_two.submitted_by AND
table_one.passport_number = table_two.passport_number AND
table_one.driving_license_number = table_two.driving_license_number AND
table_one.national_id_number = table_two.national_id_number AND
table_one.tax_pin_identification_number = table_two.tax_pin_identification_number AND
table_one.vat_number = table_two.vat_number AND
table_one.ggcg_number = table_two.ggcg_number AND
table_one.national_association_number = table_two.national_association_number

查询在该行中失败当任何一个表中的任何列为 null 时, table_one.x 都不会更新。也就是说,只有在所有列都包含某些数据时,它才会被更新。

The query fails for some rows in that table_one.x isn't getting updated when any of the columns in either table are null. i.e. it only gets updated when all columns have some data.

此问题与我之前的在SO上,使用 Distinct On 从大型数据集中获取不同的值。我现在想用表中具有唯一字段的值填充大数据集。

This question is related to my earlier one here on SO where I was getting distinct values from a large data set using Distinct On. What I now I want is to populate the large data set with a value from the table which has unique fields.

UPDATE

我使用了@binotenary提供的第一条更新语句。对于小桌子,它会瞬间运行。例如,一张表有20,000条记录,并且更新在大约20秒内完成。但是到目前为止,具有900万条记录的另一个表已经运行了20个小时!参见下面的 EXPLAIN 函数的输出

I used the first update statement provided by @binotenary. For small tables, it runs in a flash. Example is had one table with 20,000 records and the update was completed in like 20 seconds. But another table with 9 million plus records has been running for 20 hrs so far!. See below the output for EXPLAIN function

Update on table_one  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
  ->  Nested Loop  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
    Join Filter: ((((my_update_statement_here))))
    ->  Seq Scan on table_one  (cost=0.00..610872.62 rows=9661262 width=1986)
    ->  Seq Scan on table_two  (cost=0.00..6051.98 rows=299998 width=148)

The EXPLAIN ANALYZE 选项也花了很长时间,所以我取消了它。

The EXPLAIN ANALYZE option took also forever so I canceled it.

关于如何使这种类型的更新更快的任何想法?即使这意味着使用其他更新语句,甚至使用自定义函数来循环执行更新。

Any ideas on how to make this type of update faster? Even if it means using a different update statement or even using a custom function to loop through and do the update.

推荐答案

由于 null = null 的计算结果为 false ,除了相等检查外,还需要检查两个字段是否都为 null

Since null = null evaluates to false you need to check if two fields are both null in addition to equality check:

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    (table_one.invoice_number = table_two.invoice_number 
        OR (table_one.invoice_number is null AND table_two.invoice_number is null))
    AND
    (table_one.submitted_by = table_two.submitted_by 
        OR (table_one.submitted_by is null AND table_two.submitted_by is null))
    AND 
    -- etc

您也可以使用 coalesce 函数

You could also use the coalesce function which is more readable:

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    coalesce(table_one.invoice_number, '') = coalesce(table_two.invoice_number, '')
    AND coalesce(table_one.submitted_by, '') = coalesce(table_two.submitted_by, '')
    AND -- etc

但是您需要注意默认值( coalesce )。

它的数据类型应该与列类型匹配(这样一来,您就不会最终将日期与数字进行比较),并且默认值应与出现在数据中

例如, coalesce(null,1)= coalesce(1,1)是您要避免的情况。

But you need to be careful about the default values (last argument to coalesce).
It's data type should match the column type (so that you don't end up comparing dates with numbers for example) and the default should be such that it doesn't appear in the data
E.g coalesce(null, 1) = coalesce(1, 1) is a situation you'd want to avoid.

对table_two进行序列扫描-这表明您在 table_two 上没有任何索引。

因此,如果您更新 table_one 然后在 table_two 中找到匹配的行,数据库基本上必须逐行扫描所有行,直到找到匹配的行。

匹配的行

Seq Scan on table_two - this suggests that you don't have any indexes on table_two.
So if you update a row in table_one then to find a matching row in table_two the database basically has to scan through all the rows one by one until it finds a match.
The matching rows could be found much faster if the relevant columns were indexed.

另一方面,如果 table_one 有任何索引,则表示减慢了更新速度。

根据此性能指南

On the flipside if table_one has any indexes then that slows down the update.
According to this performance guide:


表约束和索引严重延迟了每次写操作。如果可能的话,您应该在更新运行时删除所有索引,触发器和外键,并在最后重新创建它们。

Table constraints and indexes heavily delay every write. If possible, you should drop all the indexes, triggers and foreign keys while the update runs and recreate them at the end.

另一个建议


如果您可以使用例如顺序ID对数据进行分段,则可以更新行

If you can segment your data using, for example, sequential IDs, you can update rows incrementally in batches.

例如,如果 table_one id 列,您可以在x和y $ b之间添加

So for example if table_one an id column you could add something like

and table_one.id between x and y

where 条件,并多次运行查询更改 x y 的值,以便覆盖所有行。

to the where condition and run the query several times changing the values of x and y so that all rows are covered.


EXPLAIN ANALYZE选项也花了很多时间

The EXPLAIN ANALYZE option took also forever

在使用时可能要小心处理stateme时,将 ANALYZE 选项与 EXPLAIN 一起使用有副作用的新台币。
根据文档

You might want to be careful when using the ANALYZE option with EXPLAIN when dealing with statements with sideffects. According to documentation:


请记住,当使用ANALYZE选项时,该语句实际上已执行。尽管EXPLAIN会丢弃SELECT将返回的任何输出,但该语句的其他副作用将照常发生。

Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual.

这篇关于使用WHERE子句更新语句,该子句包含具有空值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆