Python DBM真的快吗? [英] Is Python DBM really fast?
问题描述
我当时认为Python的本机DBM应该比NOSQL数据库(例如Tokyo Cabinet,MongoDB等)要快得多(因为Python DBM具有较少的功能和选项;即,系统更简单).我用一个非常简单的写/读示例进行了测试,
I was thinking that native DBM of Python should be quite faster than NOSQL databases such as Tokyo Cabinet, MongoDB, etc (as Python DBM has lesser features and options; i.e. a simpler system). I tested with a very simple write/read example as
#!/usr/bin/python
import time
t = time.time()
import anydbm
count = 0
while (count < 1000):
db = anydbm.open("dbm2", "c")
db["1"] = "something"
db.close()
db = anydbm.open("dbm", "r")
print "dict['Name']: ", db['1'];
print "%.3f" % (time.time()-t)
db.close()
count = count + 1
读/写:1.3s 读:0.3秒 写入:1.0秒
Read/Write: 1.3s Read: 0.3s Write: 1.0s
这些MongoDb的值至少快5倍.真的是Python DBM的性能吗?
These values for MongoDb is at least 5 times faster. Is it really the Python DBM performance?
推荐答案
Python没有内置的DBM实现.它的DBM功能基于各种DBM样式的第三方库,例如AnyDBM,Berkeley DBM和GNU DBM.
Python doesn't have a built-in DBM implementation. It bases its DBM functions on a wide range of DBM-style third party libraries, like AnyDBM, Berkeley DBM and GNU DBM.
Python的字典实现对于键值存储确实非常快,但不是持久的.如果您需要高性能的运行时键值查找,则可能会发现更好的字典-您可以使用点刺式或搁置式来管理持久性.如果启动时间对您很重要(并且如果您要修改数据,终止)-比运行时访问速度更重要-那么像DBM这样的东西会更好.
Python's dictionary implementation is really fast for key-value storage, but not persistent. If you need high-performance runtime key-value lookups, you may find a dictionary better - you can manage persistence with something like cpickle or shelve. If startup times are important to you (and if you're modifying the data, termination) - more important than runtime access speed - then something like DBM would be better.
在评估中,作为主循环的一部分,您包括了dbm open调用和数组查找.打开一个DBM来存储一个值,然后在查找之前关闭并重新打开,这是一个非常不现实的用例,您会看到一种典型的性能下降,即以这种方式管理持久数据存储时(效率很低).
In your evaluation, as part of the main loop you have included both dbm open calls and also array lookup. It's a pretty unrealistic use case to open a DBM to store one value and the close and re-open before looking it up, and you're seeing the typical slow performance that one would when managing a persistent data store in such a manner (it's quite inefficient).
根据您的要求,如果您需要快速查找并且不太在意启动时间,则DBM可能是一种解决方案-但要对其进行基准测试,请仅在循环中包含写入和读取!像下面这样的东西可能是合适的:
Depending on your requirements, if you need fast lookups and don't care too much about startup times, DBM might be a solution - but to benchmark it, only include writes and reads in the loop! Something like the below might be suitable:
import anydbm
from random import random
import time
# open DBM outside of the timed loops
db = anydbm.open("dbm2", "c")
max_records = 100000
# only time read and write operations
t = time.time()
# create some records
for i in range(max_records):
db[str(i)] = 'x'
# do a some random reads
for i in range(max_records):
x = db[str(int(random() * max_records))]
time_taken = time.time() - t
print "Took %0.3f seconds, %0.5f microseconds / record" % (time_taken, (time_taken * 1000000) / max_records)
db.close()
这篇关于Python DBM真的快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!