PostgreSQL删除除最旧记录之外的所有内容 [英] PostgreSQL delete all but the oldest records

查看：153 发布时间：2017/7/21 1:05:26 sql postgresql duplicate-removal

本文介绍了PostgreSQL删除除最旧记录之外的所有内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个PostgreSQL数据库，在多个 devicenames 上有 objectid 的多个条目，但有一个每个条目的唯一时间戳。表格看起来像这样：

 地址| devicename | objectid |时间戳
 -------- + ------------ + --------------- + ------- ----------------------- 
 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17：36：41.011629 + 00 
 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17：48：01.755559 + 00 
 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15：37：09.06065 + 00 
 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15：48：33.93128 + 00 
 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-05 16：01：59.266779 + 00 
 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-05 16：13：46.843113 + 00 
 1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-06 01：11：45.853361 + 00 
 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-06 01：23：21.204324 + 00

我想删除除最旧的条目之外的所有内容对于每个 odjectid 和 devicename 。在这种情况下，我想删除以下所有内容：

  1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17：36：41.011629 + 00 
 1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17：48：01.755559 + 00

有没有办法？或者可以选择 objectid 和 devicename 中的最旧条目到临时表中？

解决方案

要解释描述的结果，这可能是最简单和最快的：

  SELECT DISTINCT ON（devicename，objectid）* 
 FROM tbl 
 ORDER BY devicename，objectid，ts DESC;

详细说明在此相关答案中。

从您的样本数据中，我得出结论，您将要删除大部分原表。 TRUNCATE 表（或 DROP &重新创建，因为您应该添加代理pk列），并将剩余的行写入它。这也将为您提供一个prestine表，隐含地集中（排序）最适合您的查询的方式，并保存VACUUM必须执行的工作。而且总体来说还可能更快：

 
 
 我也强烈建议在表格中添加代理主键，最好是   serial   列。
  BEGIN; 
 
 CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS 
 SELECT DISTINCT ON（devicename，objectid）* 
 FROM tbl 
 ORDER BY devicename，objectid，ts DESC; 
 
 TRUNCATE tbl; 
 ALTER TABLE tbl ADD列tbl_id serial PRIMARY KEY; 
 
  - 或者，如果你有能力下降&重新创建：
  -  DROP TABLE tbl; 
  -  CREATE TABLE tbl（
  -  tbl_id serial PRIMARY KEY 
  - ，地址文本
  - ，文件名文本
  - ，objectid文本
  - ，ts时间戳）; 
 
 INSERT INTO tbl（address，devicename，objectid，ts）
 SELECT address，devicename，objectid，ts 
 FROM tmp_tbl; 
 
 COMMIT; 
  
在一个事务中执行所有操作，以确保您不会中途失败。 p> 
 
 
只要你的设置为  temp_buffers  足以容纳临时表。否则系统将开始将数据交换到磁盘，性能会下降。您可以为这个目前的会话设置 temp_buffers ：
  SET temp_buffers = 1000MB; 
  
所以你不要浪费你通常不需要的内存 temp_buffers 。必须先在会话中使用临时对象。 此相关答案的更多信息。
 
 
 另外，作为 INSERT 在事务中遵循 TRUNCATE ，在提前记录  - 提高性能。
 
 
 考虑 CREATE TABLE AS 替代路由：
 
 
  
  什么原因导致大型INSERT减慢和磁盘使用率爆炸？ 
 
 
 
 
 唯一的缺点：你需要一个排他锁。这可能是并发负载较重的数据库中的问题。
 
 
 最后，永远不要使用 timestamp 作为列名。这是每个SQL标准中的保留字在PostgreSQL中键入名称。我将您的列更改为 ts ，您可能已经注意到了。
 
I have a PostgreSQL database that has multiple entries for the objectid, on multiple devicenames, but there is a unique timestamp for each entry. The table looks something like this:       
address | devicename | objectid      |  timestamp       
--------+------------+---------------+------------------------------
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324+00
I want to delete all but the oldest entry for each odjectid and devicename. In this case I want to delete all but:
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
Is there a way do this? Or is it possible to select the oldest entries for both "objectid and devicename" into a temp table?
 解决方案 
To distill the described result, this would probably simplest and fastest:
SELECT DISTINCT ON (devicename, objectid) *
FROM   tbl
ORDER  BY devicename, objectid, ts DESC;
Details and explanation in this related answer.

From your sample data, I conclude that you are going to delete large portions of the original table. It is probably faster to just TRUNCATE the table (or DROP & recreate, since you should add a surrogate pk column anyway) and write the remaining rows to it. This would also provide you with a prestine table, implicitly clustered (ordered) the way it's best for your queries and save the work that VACUUM would have to do otherwise. And it's probably still faster overall:

I would also strongly advise to add a surrogate primary key to your table, preferably a serial column.
BEGIN;

CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS
SELECT DISTINCT ON (devicename, objectid) *
FROM   tbl
ORDER  BY devicename, objectid, ts DESC;

TRUNCATE tbl;
ALTER TABLE tbl ADD column tbl_id serial PRIMARY KEY;

-- or, if you can afford to drop & recreate:
-- DROP TABLE tbl;
-- CREATE TABLE tbl (
--   tbl_id serial PRIMARY KEY
-- , address text
-- , devicename text
-- , objectid text
-- , ts timestamp);

INSERT INTO tbl (address, devicename, objectid, ts)
SELECT address, devicename, objectid, ts
FROM   tmp_tbl;

COMMIT;
Do it all within a transaction to make sure you are not going to fail half way through.

This is fast as long as your setting for temp_buffers is big enough to hold the temporary table. Else the system will start to swap data to disk and performance takes a dive. You can set temp_buffers just for the current session like this:
SET temp_buffers = 1000MB;
So you don't waste RAM that you don't normally need for temp_buffers. Has to be before the first use of temporary objects in the session. More information in this related answer.

Also, as the INSERT follows a TRUNCATE inside a transaction, it will be easy on the Write Ahead Log - improving performance.

Consider CREATE TABLE AS for the alternative route:


What causes large INSERT to slow down and disk usage to explode?


The only downside: You need an exclusive lock on the table. This may be a problem in databases with heavy concurrent load.

Finally, never use timestamp as column name. It's a reserved word in every SQL standard and a type name in PostgreSQL. I renamed the column to ts as you may have noticed.

                        这篇关于PostgreSQL删除除最旧记录之外的所有内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

PostgreSQL删除除最旧记录之外的所有内容 [英] PostgreSQL delete all but the oldest records

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL删除除最旧记录之外的所有内容 [英] PostgreSQL delete all but the oldest records

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭