与多个Python脚本共享字典 [英] Share a dict with multiple Python scripts

查看:249
本文介绍了与多个Python脚本共享字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望可以从同时运行的多个Python脚本访问唯一的dict(键/值)数据库.

I'd like a unique dict (key/value) database to be accessible from multiple Python scripts running at the same time.

如果script1.py更新d[2839],则script2.py在几秒钟后查询d[2839]时应看到修改后的值.

If script1.py updates d[2839], then script2.py should see the modified value when querying d[2839] a few seconds after.

  • 我曾考虑过使用SQLite,但是似乎从多个进程进行并发写入/读取并不是SQLite的优势(假设script1.py刚刚修改了d[2839]script2.py的SQLite连接如何知道是否必须重新加载数据库的这一特定部分? )

  • I thought about using SQLite but it seems that concurrent write/read from multiple processes is not SQLite's strength (let's say script1.py has just modified d[2839], how would script2.py's SQLite connection know it has to reload this specific part of the database?)

当我要刷新修改时,我还考虑过锁定文件(但它是

I also thought about locking the file when I want to flush the modifications (but it's rather tricky to do), and use json.dump to serialize, then trying to detect the modifications, use json.load to reload if any modification, etc. ... oh no I'm reinventing the wheel, and reinventing a particularly inefficient key/value database!

redis看起来像一个解决方案,但是它不正式支持Windows ,申请 leveldb .

redis looked like a solution but it does not officially support Windows, the same applies for leveldb.

多个脚本可能希望完全在同一时间编写(即使这是非常少见的事件),有没有办法让DB系统处理此问题(由于使用了锁定参数?默认的SQLite无法执行此操作,因为"SQLite支持无限数量的同时读取器,但它只能在任何时刻允许一个写入器." )

multiple scripts might want to write at exactly the same time (even if this is a very rare event), is there a way to let the DB system handle this (thanks to a locking parameter? it seems that by default SQLite can't do this because "SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.")

什么是Pythonic解决方案?

注意:我在Windows上,并且该字典最多应包含1M个项目(键和值均为整数).

Note: I'm on Windows, and the dict should have maximum 1M items (key and value both integers).

推荐答案

除SQLite之外,嵌入式数据存储区的其他对象没有针对并发访问的优化,我也对SQLite的并发性能感到好奇,因此我做了一个基准测试: /p>

Mose of embedded datastore other than SQLite doesn't have optimization for concurrent access, I was also curious about SQLite concurrent performance too, so I did a benchmark:

import time
import sqlite3
import os
import random
import sys
import multiprocessing


class Store():

    def __init__(self, filename='kv.db'):
        self.conn = sqlite3.connect(filename, timeout=60)
        self.conn.execute('pragma journal_mode=wal')
        self.conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
        self.conn.commit()

    def get(self, key):
        item = self.conn.execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        self.conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        self.conn.commit()


def worker(n):
    d = [random.randint(0, 1<<31) for _ in range(n)]
    s = Store()
    for i in d:
        s.set(i, i)
    random.shuffle(d)
    for i in d:
        s.get(i)


def test(c):
    n = 5000
    start = time.time()
    ps = []
    for _ in range(c):
        p = multiprocessing.Process(target=worker, args=(n,))
        p.start()
        ps.append(p)
    while any(p.is_alive() for p in ps):
        time.sleep(0.01)
    cost = time.time() - start
    print(f'{c:<10d}\t{cost:<7.2f}\t{n/cost:<20.2f}\t{n*c/cost:<14.2f}')


def main():
    print(f'concurrency\ttime(s)\tpre process TPS(r/s)\ttotal TPS(r/s)')
    for c in range(1, 9):
        test(c)


if __name__ == '__main__':
    main()

我的4核macOS盒的结果,SSD容量:

result on my 4 cores macOS box, SSD volume:

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           0.65    7638.43                 7638.43
2           1.30    3854.69                 7709.38
3           1.83    2729.32                 8187.97
4           2.43    2055.25                 8221.01
5           3.07    1629.35                 8146.74
6           3.87    1290.63                 7743.78
7           4.80    1041.73                 7292.13
8           5.37    931.27                  7450.15

在8核Windows Server 2012云服务器上的结果,SSD卷:

result on an 8 cores windows server 2012 cloud server, SSD volume:

concurrency     time(s) pre process TPS(r/s)    total TPS(r/s)
1               4.12    1212.14                 1212.14
2               7.87    634.93                  1269.87
3               14.06   355.56                  1066.69
4               15.84   315.59                  1262.35
5               20.19   247.68                  1238.41
6               24.52   203.96                  1223.73
7               29.94   167.02                  1169.12
8               34.98   142.92                  1143.39

事实证明,无论并发性如何,总体吞吐量都是一致的,并且Windows上的SQLite比macOS慢,希望对您有所帮助.

turns out overall throughput is consistent regardless of concurrency, and SQLite is slower on windows than macOS, hope this is helpful.

由于SQLite写锁定是数据库明智的,因此为了获得更多的TPS,您可以将数据分区到多数据库文件中:

As SQLite write lock is database wise, in order to get more TPS, you could partition data to multi-database files:

class MultiDBStore():

    def __init__(self, buckets=5):
        self.buckets = buckets
        self.conns = []
        for n in range(buckets):
            conn = sqlite3.connect(f'kv_{n}.db', timeout=60)
            conn.execute('pragma journal_mode=wal')
            conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
            conn.commit()
            self.conns.append(conn)

    def _get_conn(self, key):
        assert isinstance(key, int)
        return self.conns[key % self.buckets]

    def get(self, key):
        item = self._get_conn(key).execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        conn = self._get_conn(key)
        conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        conn.commit()

在我的Mac上有20个分区的结果:

result on my mac with 20 partitions:

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           2.07    4837.17                 4837.17
2           2.51    3980.58                 7961.17
3           3.28    3047.68                 9143.03
4           4.02    2486.76                 9947.04
5           4.44    2249.94                 11249.71
6           4.76    2101.26                 12607.58
7           5.25    1903.69                 13325.82
8           5.71    1752.46                 14019.70

总TPS高于单个数据库文件.

total TPS is higher than single database file.

这篇关于与多个Python脚本共享字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆