SQLite性能基准 - 为什么是:内存:这么慢...只有1.5X一样快的磁盘? [英] SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?

查看:243
本文介绍了SQLite性能基准 - 为什么是:内存:这么慢...只有1.5X一样快的磁盘?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图看看是否有任何性能改进通过使用内存sqlite基于磁盘的sqlite。基本上我想交易启动时间和内存,以获得极快的查询,在应用程序过程中不会命中磁盘。

I've been trying to see if there are any performance improvements gained by using in-memory sqlite vs. disk based sqlite. Basically I'd like to trade startup time and memory to get extremely rapid queries which do not hit disk during the course of the application.

但是,以下的基准测试给我的速度提高了1.5倍。在这里,我将生成1M行的随机数据,并将其加载到基于磁盘和内存的同一个表的版本。然后我对两个dbs运行随机查询,返回大小约300k的集合。我预计基于内存的版本要快得多,但如前所述我只得到1.5X加速。

However, the following benchmark gives me only a factor of 1.5X in improved speed. Here, I'm generating 1M rows of random data and loading it into both a disk and memory based version of the same table. I then run random queries on both dbs, returning sets of size approx 300k. I expected the memory based version to be considerably faster, but as mentioned I'm only getting 1.5X speedups.

我尝试了其他几种大小的dbs和查询集;优点:内存: 似乎随着数据库中行数的增加而增加。我不知道为什么优势是这么小,虽然我有一些假设:

I experimented with several other sizes of dbs and query sets; the advantage of :memory: does seem to go up as the number of rows in the db increases. I'm not sure why the advantage is so small, though I had a few hypotheses:


  • 使用的表不够大在行中):make:memory:a huge winner

  • more join / tables会使:memory:advantage更明显

  • 的缓存在连接或操作系统级别,以致以前的结果以某种方式访问​​,破坏基准

  • 有一些种类的隐藏磁盘访问,我没有看到我还没有试过lsof,但我已经关闭了PRAGMA的日记)

  • the table used isn't big enough (in rows) to make :memory: a huge winner
  • more joins/tables would make the :memory: advantage more apparent
  • there is some kind of caching going on at the connection or OS level such that the previous results are accessible somehow, corrupting the benchmark
  • there is some kind of hidden disk access going on that I'm not seeing (I haven't tried lsof yet, but I did turn off the PRAGMAs for journaling)

我在这里做错了吗?任何想法为什么:记忆:不产生几乎即时查找?这是基准:

Am I doing something wrong here? Any thoughts on why :memory: isn't producing nearly instant lookups? Here's the benchmark:

==> sqlite_memory_vs_disk_benchmark.py <==

#!/usr/bin/env python
"""Attempt to see whether :memory: offers significant performance benefits.

"""
import os
import time
import sqlite3
import numpy as np

def load_mat(conn,mat):
    c = conn.cursor()

    #Try to avoid hitting disk, trading safety for speed.
    #http://stackoverflow.com/questions/304393
    c.execute('PRAGMA temp_store=MEMORY;')
    c.execute('PRAGMA journal_mode=MEMORY;')

    # Make a demo table
    c.execute('create table if not exists demo (id1 int, id2 int, val real);')
    c.execute('create index id1_index on demo (id1);')
    c.execute('create index id2_index on demo (id2);')
    for row in mat:
        c.execute('insert into demo values(?,?,?);', (row[0],row[1],row[2]))
    conn.commit()

def querytime(conn,query):
    start = time.time()
    foo = conn.execute(query).fetchall()
    diff = time.time() - start
    return diff

#1) Build some fake data with 3 columns: int, int, float
nn   = 1000000 #numrows
cmax = 700    #num uniques in 1st col
gmax = 5000   #num uniques in 2nd col

mat = np.zeros((nn,3),dtype='object')
mat[:,0] = np.random.randint(0,cmax,nn)
mat[:,1] = np.random.randint(0,gmax,nn)
mat[:,2] = np.random.uniform(0,1,nn)

#2) Load it into both dbs & build indices
try: os.unlink('foo.sqlite')
except OSError: pass

conn_mem = sqlite3.connect(":memory:")
conn_disk = sqlite3.connect('foo.sqlite')
load_mat(conn_mem,mat)
load_mat(conn_disk,mat)
del mat

#3) Execute a series of random queries and see how long it takes each of these
numqs = 10
numqrows = 300000 #max number of ids of each kind
results = np.zeros((numqs,3))
for qq in range(numqs):
    qsize = np.random.randint(1,numqrows,1)
    id1a = np.sort(np.random.permutation(np.arange(cmax))[0:qsize]) #ensure uniqueness of ids queried
    id2a = np.sort(np.random.permutation(np.arange(gmax))[0:qsize])
    id1s = ','.join([str(xx) for xx in id1a])
    id2s = ','.join([str(xx) for xx in id2a])
    query = 'select * from demo where id1 in (%s) AND id2 in (%s);' % (id1s,id2s)

    results[qq,0] = round(querytime(conn_disk,query),4)
    results[qq,1] = round(querytime(conn_mem,query),4)
    results[qq,2] = int(qsize)

#4) Now look at the results
print "  disk | memory | qsize"
print "-----------------------"
for row in results:
    print "%.4f | %.4f | %d" % (row[0],row[1],row[2])

注意,对于相当宽范围的查询大小,磁盘占用大约1.5倍的内存。

Here's the results. Note that disk takes about 1.5X as long as memory for a fairly wide range of query sizes.

[ramanujan:~]$python -OO sqlite_memory_vs_disk_clean.py
  disk | memory | qsize
-----------------------
9.0332 | 6.8100 | 12630
9.0905 | 6.6953 | 5894
9.0078 | 6.8384 | 17798
9.1179 | 6.7673 | 60850
9.0629 | 6.8355 | 94854
8.9688 | 6.8093 | 17940
9.0785 | 6.6993 | 58003
9.0309 | 6.8257 | 85663
9.1423 | 6.7411 | 66047
9.1814 | 6.9794 | 11345

不应该RAM相对于磁盘几乎是瞬间的。这里有什么问题?

Shouldn't RAM be almost instant relative to disk? What's going wrong here?

这里有一些很好的建议。

Some good suggestions here.

我想我的主要目的是**可能没有办法:

I guess the main takehome point for me is that **there's probably no way to make :memory: absolutely faster, but there is a way to make disk access relatively slower. **

在其他方面基准是充分测量存储器的现实性能,而不是磁盘的现实性能(例如,因为cache_size布尔值太大或者因为我没有写入)。我会搞砸这些参数,并发布我的发现,当我有机会。

In other words, the benchmark is adequately measuring the realistic performance of memory, but not the realistic performance of disk (e.g. because the cache_size pragma is too big or because I'm not doing writes). I'll mess around with those parameters and post my findings when I get a chance.

也就是说,如果有人认为我可以从内存数据库中挤出一些速度(除了通过缓存cache_size和default_cache_size,我会很抱歉...

That said, if there is anyone who thinks I can squeeze some more speed out of the in-memory db (other than by jacking up the cache_size and default_cache_size, which I will do), I'm all ears...

推荐答案

它与SQLite有一个页面缓存。根据文档,默认页面缓存为2000 1K页或约2Mb。由于这是大约75%到90%的数据,因此这两个数字非常相似并不奇怪。我的猜测是,除了SQLite页面缓存,其余的数据仍然在操作系统磁盘缓存中。如果你有SQLite刷新页面缓存(和磁盘缓存),你会看到一些非常明显的区别。

It has to do with the fact that SQLite has a page cache. According to the Documentation, the default page cache is 2000 1K pages or about 2Mb. Since this is about 75% to 90% of your data, it isn't surprising that the two number are very similar. My guess is that in addition to the SQLite page cache, the rest of the data is still in the OS disk cache. If you got SQLite to flush the page cache (and the disk cache) you would see some really significant differences.

这篇关于SQLite性能基准 - 为什么是:内存:这么慢...只有1.5X一样快的磁盘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆