Postgres:使用游标更新的令人惊讶的性能 [英] Postgres: Surprising performance on updates using cursor

查看:249
本文介绍了Postgres:使用游标更新的令人惊讶的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下两个Python代码示例,它们实现相同但具有显着和令人惊讶的性能差异。

Consider the two following Python code examples, which achieves the same but with significant and surprising performance difference.

import psycopg2, time

conn = psycopg2.connect("dbname=mydatabase user=postgres")
cur = conn.cursor('cursor_unique_name')  
cur2 = conn.cursor()

startTime = time.clock()
cur.execute("SELECT * FROM test for update;")
print ("Finished: SELECT * FROM test for update;: " + str(time.clock() - startTime));
for i in range (100000):
    cur.fetchone()
    cur2.execute("update test set num = num + 1 where current of cursor_unique_name;")
print ("Finished: update starting commit: " + str(time.clock() - startTime));
conn.commit()
print ("Finished: update : " + str(time.clock() - startTime));

cur2.close()
conn.close()

和:

import psycopg2, time

conn = psycopg2.connect("dbname=mydatabase user=postgres")
cur = conn.cursor('cursor_unique_name')  
cur2 = conn.cursor()

startTime = time.clock()
for i in range (100000):
    cur2.execute("update test set num = num + 1 where id = " + str(i) + ";")
print ("Finished: update starting commit: " + str(time.clock() - startTime));
conn.commit()
print ("Finished: update : " + str(time.clock() - startTime));

cur2.close()
conn.close()

表测试的create语句是:

The create statement for the table test is:

CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);

该表包含100000行和VACUUM ANALYZE TEST;

And that table contains 100000 rows and VACUUM ANALYZE TEST; has been run.

第一个代码示例:

p>

First code example:

Finished: SELECT * FROM test for update;: 0.00609304950429
Finished: update starting commit: 37.3272754429
Finished: update : 37.4449708474

第二个代码示例:

Finished: update starting commit: 24.574401185
Finished committing: 24.7331461431


$ b b

这是非常令人惊讶的,因为我认为应该完全相反,这意味着使用游标的更新应该明显更快,根据这个答案。

推荐答案

我不认为测试是平衡的 - 你的第一个代码是从游标获取数据,然后更新,而第二个是盲目更新ID,而不获取数据。我假设第一个代码序列转换为FETCH命令,后跟UPDATE-这是两个客户端/服务器命令周转而不是一个。

I don't think that the test is balanced- your first code is fetching the data from the cursor, then updating, whereas the second is blindly updating by ID without fetching the data. I assume the first code sequence translates to a FETCH command followed by UPDATE- so that's two client/server command turnarounds as opposed to one.

锁定表中的每一行 - 这将整个表拉入缓冲区缓存 - 虽然考虑了这一点,但我怀疑这实际上会影响性能,但您没有提到它)。

(Also the first code starts by locking each row in the table- this pulls the entire table into the buffer cache- although thinking about it, I doubt this actually impacts performance but you didn't mention it)

另外我认为对于一个简单的表,在更新ctid之间没有太多的不同(我认为是当前的... 工作)和通过主键更新pkey更新是一个额外的索引查找,但除非索引是巨大的它不是很大的退化。

Also tbh I think that for a simple table, there won't be much different between updating by ctid (which I assume is how where current of... works) and updating through a primary key- the pkey update is an extra index lookup, but unless the index is huge it's not much of a degradation.

对于这样更新100,000行,我怀疑大部分时间是生成额外的元组,并将它们插入或附加到表中,而不是定位以前的元组将其标记为已删除。

For updating 100,000 rows like this, I suspect that most of the time is taken up generating the extra tuples and inserting them into or appending them to the table, rather than locating the previous tuple to mark it as deleted.

这篇关于Postgres:使用游标更新的令人惊讶的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆