PostgreSQL删除除最旧记录之外的所有内容 [英] PostgreSQL delete all but the oldest records
问题描述
我有一个PostgreSQL数据库,在多个 devicenames
上有 objectid
的多个条目,但有一个每个条目的唯一时间戳
。表格看起来像这样:
地址| devicename | objectid |时间戳
-------- + ------------ + --------------- + ------- -----------------------
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113 + 00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324 + 00
我想删除除最旧的条目之外的所有内容对于每个 odjectid
和 devicename
。在这种情况下,我想删除以下所有内容:
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629 + 00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559 + 00
有没有办法?或者可以选择 objectid
和 devicename
中的最旧条目到临时表中?
要解释描述的结果,这可能是最简单和最快的:
SELECT DISTINCT ON(devicename,objectid)*
FROM tbl
ORDER BY devicename,objectid,ts DESC;
详细说明在此相关答案中。
从您的样本数据中,我得出结论,您将要删除大部分原表。 TRUNCATE $可能更快c $ c>
表(或 DROP
&重新创建,因为您应该添加代理pk列),并将剩余的行写入它。这也将为您提供一个prestine表,隐含地集中(排序)最适合您的查询的方式,并保存VACUUM必须执行的工作。而且总体来说还可能更快:
我也强烈建议在表格中添加代理主键,最好是 serial
列。
BEGIN;
CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS
SELECT DISTINCT ON(devicename,objectid)*
FROM tbl
ORDER BY devicename,objectid,ts DESC;
TRUNCATE tbl;
ALTER TABLE tbl ADD列tbl_id serial PRIMARY KEY;
- 或者,如果你有能力下降&重新创建:
- DROP TABLE tbl;
- CREATE TABLE tbl(
- tbl_id serial PRIMARY KEY
- ,地址文本
- ,文件名文本
- ,objectid文本
- ,ts时间戳);
INSERT INTO tbl(address,devicename,objectid,ts)
SELECT address,devicename,objectid,ts
FROM tmp_tbl;
COMMIT;
在一个事务中执行所有操作,以确保您不会中途失败。 p>
只要你的设置为 temp_buffers
足以容纳临时表。否则系统将开始将数据交换到磁盘,性能会下降。您可以为这个目前的会话设置 temp_buffers
:
SET temp_buffers = 1000MB;
所以你不要浪费你通常不需要的内存 temp_buffers
。必须先在会话中使用临时对象。 此相关答案的更多信息。
另外,作为 INSERT
在事务中遵循 TRUNCATE
,在提前记录 - 提高性能。
考虑 CREATE TABLE AS
替代路由:
唯一的缺点:你需要一个排他锁。这可能是并发负载较重的数据库中的问题。
最后,永远不要使用 timestamp
作为列名。这是每个SQL标准中的保留字在PostgreSQL中键入名称。我将您的列更改为 ts
,您可能已经注意到了。
I have a PostgreSQL database that has multiple entries for the objectid
, on multiple devicenames
, but there is a unique timestamp
for each entry. The table looks something like this:
address | devicename | objectid | timestamp
--------+------------+---------------+------------------------------
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324+00
I want to delete all but the oldest entry for each odjectid
and devicename
. In this case I want to delete all but:
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
Is there a way do this? Or is it possible to select the oldest entries for both "objectid
and devicename
" into a temp table?
To distill the described result, this would probably simplest and fastest:
SELECT DISTINCT ON (devicename, objectid) *
FROM tbl
ORDER BY devicename, objectid, ts DESC;
Details and explanation in this related answer.
From your sample data, I conclude that you are going to delete large portions of the original table. It is probably faster to just TRUNCATE
the table (or DROP
& recreate, since you should add a surrogate pk column anyway) and write the remaining rows to it. This would also provide you with a prestine table, implicitly clustered (ordered) the way it's best for your queries and save the work that VACUUM would have to do otherwise. And it's probably still faster overall:
I would also strongly advise to add a surrogate primary key to your table, preferably a serial
column.
BEGIN;
CREATE TEMP TABLE tmp_tbl ON COMMIT DROP AS
SELECT DISTINCT ON (devicename, objectid) *
FROM tbl
ORDER BY devicename, objectid, ts DESC;
TRUNCATE tbl;
ALTER TABLE tbl ADD column tbl_id serial PRIMARY KEY;
-- or, if you can afford to drop & recreate:
-- DROP TABLE tbl;
-- CREATE TABLE tbl (
-- tbl_id serial PRIMARY KEY
-- , address text
-- , devicename text
-- , objectid text
-- , ts timestamp);
INSERT INTO tbl (address, devicename, objectid, ts)
SELECT address, devicename, objectid, ts
FROM tmp_tbl;
COMMIT;
Do it all within a transaction to make sure you are not going to fail half way through.
This is fast as long as your setting for temp_buffers
is big enough to hold the temporary table. Else the system will start to swap data to disk and performance takes a dive. You can set temp_buffers
just for the current session like this:
SET temp_buffers = 1000MB;
So you don't waste RAM that you don't normally need for temp_buffers
. Has to be before the first use of temporary objects in the session. More information in this related answer.
Also, as the INSERT
follows a TRUNCATE
inside a transaction, it will be easy on the Write Ahead Log - improving performance.
Consider CREATE TABLE AS
for the alternative route:
The only downside: You need an exclusive lock on the table. This may be a problem in databases with heavy concurrent load.
Finally, never use timestamp
as column name. It's a reserved word in every SQL standard and a type name in PostgreSQL. I renamed the column to ts
as you may have noticed.
这篇关于PostgreSQL删除除最旧记录之外的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!