更新语句中的冗余数据 [英] Redundant data in update statements

查看:134
本文介绍了更新语句中的冗余数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hibernate生成包含所有列的 UPDATE 语句,而不管我是否更改列中的值,例如:

  tx.begin(); 
Item i = em.find(Item.class,12345);
i.setA(a-value);
tx.commit();

发行这个 UPDATE 语句:

  update Item set A = $ 1,B = $ 2,C = $ 3,D = $ 4其中id = $ 5 

所以列B,C,D会更新,而我没有更改它们。
$ b说,项目经常更新和所有列索引。
问题是:优化Hibernate部分是否有意义:

  tx.begin(); 
em.createQuery(update item i set ia =:a where i.id =:id)
.setParameter(a,a-value)
.setParameter( id,12345)
.executeUpdate();
tx.commit();

最让我困惑的是 EXPLAIN '未优化'和'优化'查询版本的计划是相同的! 解决方案

由于 PostgreSQL MVCC UPDATE 几乎是有效的像一个 DELETE 加上一个 INSERT - 除了值得注意的值之外。见:



准确地说,对于在提交删除之后开始的任何事务,已删除行都是不可见的,并在稍后进行抽真空。因此,在数据库方面,包括索引操作,两个语句之间实际上是没有区别。 (有例外,请继续阅读。)它会增加网络流量(取决于您的数据)并需要一些解析。



araqnid的输入并运行了一些测试。只要HOT更新涉及,实际上没有改变值的列的更新没有任何区别。我的答案成立。请参阅下面的详细信息。

这也适用于烘烤的属性,因为除非值实际更改,否则这些属性也不会触及。



但是,如果您使用每列触发器(引入第9.0页),则可能会产生不良副作用!



我引用手册on触发器


...诸如 UPDATE ... SET x = x ... 将在
x 上触发一个触发器,即使该列的值没有改变

大胆强调我的想法。

抽象层是为了方便。对于不识字的开发人员或者应用程序需要在不同RDBMS之间移植时,它们非常有用。不利的一面是,他们可以屠杀业绩并引入额外的失败点。



关于HOT(堆只有元组)更新



堆只元组是与 Postgres 8.3 一起推出,并有重要改进在 8.3.4 8.4.9

Postgres 8.3发行说明:


UPDATE s和 DELETE s如果失败 INSERT s,则会留下无用的元组。
以前只有 VACUUM 可以回收死元组占用的空间。使用
HOT死元组空间可以在
INSERT UPDATE 时自动回收。 如果没有对索引列进行更改。这
允许更一致的表现。此外,HOT可避免添加
重复索引条目。


强调我的功能。而且无变化包括列已更新为与已有值相同的值的情况。我刚实际测试,因为我不确定。



敬酒的专栏也不妨碍HOT更新。 HOT更新的元组链接到关系的吐司分支中的相同的,未改变的元组。 HOT更新甚至可以使用目标列表中的烘烤值(实际已更改或未更改)。如果更改了烘烤值,显然,它需要写入toast关系分支。我也测试了所有这些。



您不必为此接受我的话。请亲自看看,Postgres提供了几个函数来检查统计信息。运行带有和不带所有列的 UPDATE 并检查它是否有任何区别。

  SELECT pg_stat_get_tuples_hot_updated('table_name':: regclass :: oid)
$ b $ - 行数在表中热更新,在当前事务中:
SELECT pg_stat_get_xact_tuples_hot_updated('table_name':: regclass :: oid)

或使用 pgAdmin 。选择你的表并检查主窗口中的统计选项卡。



请注意,HOT更新只有在新元组版本的空间相同时才可能主要关系分叉的页面。一种简单的方法来强制这种情况是用一个只有几行的小表进行测试。页面大小通常为8k,因此页面上必须有可用空间。


Hibernate generates UPDATE statements, which include all columns, regardless of whether I'm changing the value in that columns, eg:

tx.begin();
Item i = em.find(Item.class, 12345);
i.setA("a-value");
tx.commit();

issues this UPDATE statement:

update Item set A = $1, B = $2, C = $3, D = $4 where id = $5

so columns B, C, D are updated, while I didn't change them.

Say, Items are updated frequently and all columns are indexed. The question is: does it make sense to optimize the Hibernate part to something like this:

tx.begin();
em.createQuery("update Item i set i.a = :a where i.id = :id")
    .setParameter("a", "a-value")
    .setParameter("id", 12345)
    .executeUpdate();
tx.commit();

What confuses me most is that the EXPLAIN plans of the 'unoptimized' and the 'optimized' query version are identical!

解决方案

Due to PostgreSQL MVCC, an UPDATE is effectively almost like a DELETE plus an INSERT - with the notable exception of toasted values. See:

To be precise, the "deleted" row is just invisible to any transaction starting after the delete has been committed, and vacuumed later. Therefore, on the database side, including index manipulation, there is in effect no difference between the two statements. (Exceptions apply, keep reading.) It increases network traffic a bit (depending on your data) and needs a bit of parsing.

I studied HOT updates some more after @araqnid's input and ran some tests. Updates on columns that don't actually change the value make no difference whatsoever as far as HOT updates are concerned. My answer holds. See details below.

This also applies to toasted attributes, since those are also not touched unless the values actually change.

However, if you use per-column triggers (introduced with pg 9.0), this may have undesired side effects!

I quote the manual on triggers:

... a command such as UPDATE ... SET x = x ... will fire a trigger on column x, even though the column's value did not change.

Bold emphasis mine.

Abstraction layers are for convenience. They are useful for SQL-illiterate developers or if the application needs to be portable between different RDBMS. On the downside, they can butcher performance and introduce additional points of failure. I avoid them wherever possible.

Concerning HOT (Heap-only tuple) updates

Heap-Only Tuples were introduced with Postgres 8.3, with important improvements in 8.3.4 and 8.4.9.
The release notes for Postgres 8.3:

UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs. Previously only VACUUM could reclaim space taken by dead tuples. With HOT dead tuple space can be automatically reclaimed at the time of INSERT or UPDATE if no changes are made to indexed columns. This allows for more consistent performance. Also, HOT avoids adding duplicate index entries.

Emphasis mine. And "no changes" includes cases where columns are updated with the same value as they already hold. I actually tested that just now, as I wasn't sure.

Toasted columns also don't stand in the way of HOT updates. The HOT-updated tuple just links to the same, unchanged tuple(s) in the toast fork of the relation. HOT updates even work with toasted values in the target list (actually changed or not). If toasted values are changed, it entails writes to the toast relation fork, obviously. I tested that all of that, too.

You don't have to take my word for it. See for yourself, Postgres provides a couple of functions to check statistics. Run your UPDATE with and without all columns and check if it makes any difference.

-- Number of rows HOT-updated in table:
SELECT pg_stat_get_tuples_hot_updated('table_name'::regclass::oid)

-- Number of rows HOT-updated in table, in the current transaction:
SELECT pg_stat_get_xact_tuples_hot_updated('table_name'::regclass::oid)

Or use pgAdmin. Select your table and inspect the "Statistics" tab in the main window.

Be aware that HOT updates are only possible when there is room for the new tuple version on the same page of the main relation fork. One simple way to force that condition is to test with a small table that holds only a few rows. Page size is typically 8k, so there must be free space on the page.

这篇关于更新语句中的冗余数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆