复制Postgres表中的时间戳列 [英] Copying timestamp columns within a Postgres table

查看:84
本文介绍了复制Postgres表中的时间戳列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Postgres 9.4 db中有一个表,其中包含约3000万行。该表有6列,分别是主键ID,2个文本,一个布尔值和两个时间戳。其中一个文本列上有索引,并且显然是主键。

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.

我想复制第一个timestamp列中的值,称为 timestamp_a 进入第二个timestamp列,将其称为 timestamp_b 。为此,我运行了以下查询:

I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:

UPDATE my_table SET timestamp_b = timestamp_a;

这可行,但是花了一个小时又15分钟才能完成,据我所知,这似乎是很长的时间,据我所知,它只是将值从一列复制到下一列

This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.

我在查询中运行了 EXPLAIN ,似乎没有什么效率不高。然后,我使用 pgtune 修改了我的配置文件,最值得注意的是它增加了 shared_buffers work_mem maintenance_work_mem

I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.

我重新运行查询,并且花费了基本上相同的时间,实际上是更长的时间(一个小时又20分钟)。

I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).

我还能做什么?提高此更新的速度?将3000万个时间戳写入postgres是否需要多长时间?对于上下文,我正在Macbook pro,osx,quadcore,16 gig的ram上运行它。

What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.

推荐答案

原因是缓慢的是PostgreSQL内部没有 update 字段。实际上,它会使用新值写入新行。通常,与插入花费那么多时间相似。

The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.

如果 any上有索引列会进一步降低更新速度。即使它们不在要更新的列上,因为PostgreSQL必须写一个新行并写新的索引条目来指向该行。 HOT更新可以提供帮助,并且会在可能的情况下自动进行,但是通常只有在表进行大量小更新时,它才有帮助。

If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.

由于基本上是在重写表,因此如果您不介意在您锁定所有并发用户的同时,也将其禁用。做到这一点,您可以更快地做到:

Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:


  • BEGIN

  • DROP 所有索引

  • UPDATE

  • 创建再次所有索引

  • COMMIT

  • BEGIN
  • DROP all indexes
  • UPDATE the table
  • CREATE all indexes again
  • COMMIT

PostgreSQL还对写入刚被 TRUNCATE d,但是要从中受益,您必须将数据复制到临时表中,然后 TRUNCATE 并将其复制回去。所以没有好处。

PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.

这篇关于复制Postgres表中的时间戳列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆