用于 nginx/uwsgi 服务器的持久内存 Python 对象 [英] Persistent in-memory Python object for nginx/uwsgi server

查看:50
本文介绍了用于 nginx/uwsgi 服务器的持久内存 Python 对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怀疑这甚至是可能的,但这里是问题和建议的解决方案(建议解决方案的可行性是这个问题的对象):


我有一些全局数据"需要可用于所有请求.我将这些数据保存到 Riak 并使用 Redis 作为访问速度的缓存层(现在......).数据被分成大约 30 个逻辑块,每个大约 8 KB.

每个请求都需要读取这 8KB 块中的 4 个,从而导致从 Redis 或 Riak 读取了 32KB 的数据.这是对任何特定于请求的数据的补充,这些数据也需要读取(相当多).

假设每秒 3000 个请求(这不是一个实时服务器,所以我没有实际数字,但 3000ps 是一个合理的假设,可能更多),这意味着从 Redis 或 Riak 传输 96KBps从应用程序逻辑进行的已经不重要的其他调用.此外,Python 每秒解析这些 8KB 对象的 JSON 3000 次.


所有这一切——尤其是 Python 必须反复反序列化数据——似乎完全是一种浪费,一个完美优雅的解决方案是将 反序列化的数据缓存在 Python 的内存本机对象中,当所有这些静态"数据变得陈旧时,我可以定期刷新它.几分钟(或几小时)一次,而不是每秒 3000 次.

但我不知道这是否可能.您实际上需要一个始终运行"的应用程序来缓存其内存中的任何数据.而且我知道在 nginx+uwsgi+python 组合中情况并非如此(与节点之类的东西相比) - 据我所知,python 内存中的数据不会在所有请求中持久化,除非我是大错特错.

不幸的是,这是一个我继承"的系统,因此无法在基础技术方面进行太多更改,而且我对 nginx+uwsgi+python 组合在启动 Python 方面的工作原理也不够了解处理和持久化 Python 内存中的数据——这意味着我上面的假设可能会大错特错!


因此,关于此解决方案是否有效的直接建议 + 参考材料可以帮助我理解 nginx+uwsgi+python 在启动新进程和内存分配方面的工作方式,将有很大帮助.

附:

  1. 已经阅读了 nginx、uwsgi 等的一些文档,但还没有完全理解每个用例的后果.希望现在在这方面取得一些进展

  2. 如果内存中的事情可以解决,我会放弃 Redis,因为我只缓存了上面提到的静态数据.这使得进程内持久内存 Python 缓存对我来说更具吸引力,减少了系统中的一个移动部分,并且每个请求至少需要四次网络往返.

解决方案

您的建议并非直接可行.由于新进程可以在您的控制范围之外上下运行,因此无法将本机 Python 数据保留在内存中.

不过,有几种方法可以解决这个问题.

通常,您只需要一层键值存储.有时,具有固定大小的值缓冲区(您可以直接将其用作 str/bytes/bytearray 对象;您需要的任何其他内容struct 在那里或以其他方式序列化)就是你所需要的.在这种情况下,uWSGI 的内置缓存框架会处理一切你需要.

如果需要更精确的控制,可以在SharedArea 并进行一些自定义.但是,我不建议这样做.它基本上为您提供了与文件相同的 API,与仅使用文件相比,唯一真正的优势是服务器将管理文件的生命周期;它适用于所有 uWSGI 支持的语言,甚至那些不允许文件的语言;如果您以后需要,它可以更轻松地将您的自定义缓存迁移到分布式(多计算机)缓存.我认为这些都与您无关.

另一种获得扁平键值存储但没有固定大小缓冲区的方法是使用 Python 的 stdlib anydbm.键值查找就像它得到的那样 Pythonic:它看起来就像一个 dict,除了它备份到磁盘上的 BDB(或类似)数据库,适当地缓存在内存中,而不是存储在内存中的哈希表中.

如果您需要处理其他一些简单类型——任何可以非常快速地取消/pickle 的类型,例如 ints——您可能需要考虑 shelve.

如果您的结构足够严格,您可以使用键值数据库作为顶层,但通过 ctypes.Structure 访问值,或使用 struct.但通常情况下,如果你能做到这一点,你也可以消除顶层,此时你的整个事情只是一个大的StructureArray.

此时,您可以只使用普通文件进行存储——mmap 它(对于 ctypes),或者只是 openread 它(对于struct).

或者使用multiprocessing共享 ctypes 对象 直接从共享内存区域访问您的 Structure.

与此同时,如果您实际上并不总是需要所有缓存数据,只是偶尔需要一些零碎的数据,这正是数据库的用途.同样,anydbm 等可能是你所需要的,但如果你有复杂的结构,画一个 ER 图,把它变成一组表,然后使用像 MySQL 这样的东西.>

I doubt this is even possible, but here is the problem and proposed solution (the feasibility of the proposed solution is the object of this question):


I have some "global data" that needs to be available for all requests. I'm persisting this data to Riak and using Redis as a caching layer for access speed (for now...). The data is split into about 30 logical chunks, each about 8 KB.

Each request is required to read 4 of these 8KB chunks, resulting in 32KB of data read in from Redis or Riak. This is in ADDITION to any request-specific data which would also need to be read (which is quite a bit).

Assuming even 3000 requests per second (this isn't a live server so I don't have real numbers, but 3000ps is a reasonable assumption, could be more), this means 96KBps of transfer from Redis or Riak in ADDITION to the already not-insignificant other calls being made from the application logic. Also, Python is parsing the JSON of these 8KB objects 3000 times every second.


All of this - especially Python having to repeatedly deserialize the data - seems like an utter waste, and a perfectly elegant solution would be to just have the deserialized data cached in an in-memory native object in Python, which I can refresh periodically as and when all this "static" data becomes stale. Once in a few minutes (or hours), instead of 3000 times per second.

But I don't know if this is even possible. You'd realistically need an "always running" application for it to cache any data in its memory. And I know this is not the case in the nginx+uwsgi+python combination (versus something like node) - python in-memory data will NOT be persisted across all requests to my knowledge, unless I'm terribly mistaken.

Unfortunately this is a system I have "inherited" and therefore can't make too many changes in terms of the base technology, nor am I knowledgeable enough of how the nginx+uwsgi+python combination works in terms of starting up Python processes and persisting Python in-memory data - which means I COULD be terribly mistaken with my assumption above!


So, direct advice on whether this solution would work + references to material that could help me understand how the nginx+uwsgi+python would work in terms of starting new processes and memory allocation, would help greatly.

P.S:

  1. Have gone through some of the documentation for nginx, uwsgi etc but haven't fully understood the ramifications per my use-case yet. Hope to make some progress on that going forward now

  2. If the in-memory thing COULD work out, I would chuck Redis, since I'm caching ONLY the static data I mentioned above, in it. This makes an in-process persistent in-memory Python cache even more attractive for me, reducing one moving part in the system and at least FOUR network round-trips per request.

解决方案

What you're suggesting isn't directly feasible. Since new processes can be spun up and down outside of your control, there's no way to keep native Python data in memory.

However, there are a few ways around this.

Often, one level of key-value storage is all you need. And sometimes, having fixed-size buffers for values (which you can use directly as str/bytes/bytearray objects; anything else you need to struct in there or otherwise serialize) is all you need. In that case, uWSGI's built-in caching framework will take care of everything you need.

If you need more precise control, you can look at how the cache is implemented on top of SharedArea and do something customize. However, I wouldn't recommend that. It basically gives you the same kind of API you get with a file, and the only real advantages over just using a file are that the server will manage the file's lifetime; it works in all uWSGI-supported languages, even those that don't allow files; and it makes it easier to migrate your custom cache to a distributed (multi-computer) cache if you later need to. I don't think any of those are relevant to you.

Another way to get flat key-value storage, but without the fixed-size buffers, is with Python's stdlib anydbm. The key-value lookup is as pythonic as it gets: it looks just like a dict, except that it's backed up to an on-disk BDB (or similar) database, cached as appropriate in memory, instead of being stored in an in-memory hash table.

If you need to handle a few other simple types—anything that's blazingly fast to un/pickle, like ints—you may want to consider shelve.

If your structure is rigid enough, you can use key-value database for the top level, but access the values through a ctypes.Structure, or de/serialize with struct. But usually, if you can do that, you can also eliminate the top level, at which point your whole thing is just one big Structure or Array.

At that point, you can just use a plain file for storage—either mmap it (for ctypes), or just open and read it (for struct).

Or use multiprocessing's Shared ctypes Objects to access your Structure directly out of a shared memory area.

Meanwhile, if you don't actually need all of the cache data all the time, just bits and pieces every once in a while, that's exactly what databases are for. Again, anydbm, etc. may be all you need, but if you've got complex structure, draw up an ER diagram, turn it into a set of tables, and use something like MySQL.

这篇关于用于 nginx/uwsgi 服务器的持久内存 Python 对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆