PostgreSQL删除除最旧记录之外的所有内容 [英] PostgreSQL delete all but the oldest records

查看:153
本文介绍了PostgreSQL删除除最旧记录之外的所有内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个PostgreSQL数据库,在多个 devicenames 上有 objectid 的多个条目,但有一个每个条目的唯一时间戳。表格看起来像这样:

 地址| devicename | objectid |时间戳
-------- + ------------ + --------------- + ------- -----------------------
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324 + 00

我想删除除最旧的条目之外的所有内容对于每个 odjectid devicename 。在这种情况下,我想删除以下所有内容:

  1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629 + 00 
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559 + 00

有没有办法?或者可以选择 objectid devicename 中的最旧条目到临时表中?

解决方案

要解释描述的结果,这可能是最简单和最快的:

  SELECT DISTINCT ON(devicename,objectid)* 
FROM tbl
ORDER BY devicename,objectid,ts DESC;

详细说明在此相关答案中



从您的样本数据中,我得出结论,您将要删除大部分原表。 TRUNCATE 表(或 DROP &重新创建,因为您应该添加代理pk列),并将剩余的行写入它。这也将为您提供一个prestine表,隐含地集中(排序)最适合您的查询的方式,并保存VACUUM必须执行的工作。而且总体来说还可能更快:



我也强烈建议在表格中添加代理主键,最好是 serial 列。

  BEGIN; 

CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS
SELECT DISTINCT ON(devicename,objectid)*
FROM tbl
ORDER BY devicename,objectid,ts DESC;

TRUNCATE tbl;
ALTER TABLE tbl ADD列tbl_id serial PRIMARY KEY;

- 或者,如果你有能力下降&重新创建:
- DROP TABLE tbl;
- CREATE TABLE tbl(
- tbl_id serial PRIMARY KEY
- ,地址文本
- ,文件名文本
- ,objectid文本
- ,ts时间戳);

INSERT INTO tbl(address,devicename,objectid,ts)
SELECT address,devicename,objectid,ts
FROM tmp_tbl;

COMMIT;

在一个事务中执行所有操作,以确保您不会中途失败。 p>

只要你的设置为 temp_buffers 足以容纳临时表。否则系统将开始将数据交换到磁盘,性能会下降。您可以为这个目前的会话设置 temp_buffers

  SET temp_buffers = 1000MB; 

所以你不要浪费你通常不需要的内存 temp_buffers 。必须先在会话中使用临时对象。 此相关答案的更多信息。



另外,作为 INSERT 在事务中遵循 TRUNCATE ,在提前记录 - 提高性能。



考虑 CREATE TABLE AS 替代路由:





唯一的缺点:你需要一个排他锁。这可能是并发负载较重的数据库中的问题。



最后,永远不要使用 timestamp 作为列名。这是每个SQL标准中的保留字在PostgreSQL中键入名称。我将您的列更改为 ts ,您可能已经注意到了。


I have a PostgreSQL database that has multiple entries for the objectid, on multiple devicenames, but there is a unique timestamp for each entry. The table looks something like this:

address | devicename | objectid      |  timestamp       
--------+------------+---------------+------------------------------
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113+00
1.1.1.1 | device1    | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361+00
1.1.1.2 | device2    | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324+00

I want to delete all but the oldest entry for each odjectid and devicename. In this case I want to delete all but:

1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00

Is there a way do this? Or is it possible to select the oldest entries for both "objectid and devicename" into a temp table?

解决方案

To distill the described result, this would probably simplest and fastest:

SELECT DISTINCT ON (devicename, objectid) *
FROM   tbl
ORDER  BY devicename, objectid, ts DESC;

Details and explanation in this related answer.

From your sample data, I conclude that you are going to delete large portions of the original table. It is probably faster to just TRUNCATE the table (or DROP & recreate, since you should add a surrogate pk column anyway) and write the remaining rows to it. This would also provide you with a prestine table, implicitly clustered (ordered) the way it's best for your queries and save the work that VACUUM would have to do otherwise. And it's probably still faster overall:

I would also strongly advise to add a surrogate primary key to your table, preferably a serial column.

BEGIN;

CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS
SELECT DISTINCT ON (devicename, objectid) *
FROM   tbl
ORDER  BY devicename, objectid, ts DESC;

TRUNCATE tbl;
ALTER TABLE tbl ADD column tbl_id serial PRIMARY KEY;

-- or, if you can afford to drop & recreate:
-- DROP TABLE tbl;
-- CREATE TABLE tbl (
--   tbl_id serial PRIMARY KEY
-- , address text
-- , devicename text
-- , objectid text
-- , ts timestamp);

INSERT INTO tbl (address, devicename, objectid, ts)
SELECT address, devicename, objectid, ts
FROM   tmp_tbl;

COMMIT;

Do it all within a transaction to make sure you are not going to fail half way through.

This is fast as long as your setting for temp_buffers is big enough to hold the temporary table. Else the system will start to swap data to disk and performance takes a dive. You can set temp_buffers just for the current session like this:

SET temp_buffers = 1000MB;

So you don't waste RAM that you don't normally need for temp_buffers. Has to be before the first use of temporary objects in the session. More information in this related answer.

Also, as the INSERT follows a TRUNCATE inside a transaction, it will be easy on the Write Ahead Log - improving performance.

Consider CREATE TABLE AS for the alternative route:

The only downside: You need an exclusive lock on the table. This may be a problem in databases with heavy concurrent load.

Finally, never use timestamp as column name. It's a reserved word in every SQL standard and a type name in PostgreSQL. I renamed the column to ts as you may have noticed.

这篇关于PostgreSQL删除除最旧记录之外的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆