在进程内存中缓存巨量数据 [英] Caching huge data in Process memory

查看:165
本文介绍了在进程内存中缓存巨量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在金融业工作。我们要推出数据库命中数据处理。这是非常昂贵的。所以我们计划去按需缓存逻辑。 [运行时插入&运行时查找]



是否有人在超过1000万条记录的实现缓存逻辑?每条记录约有160 - 200个字节。



我用不同的方法遇到了以下缺点。


  1. 无法使用 stl std :: map 实现密钥库缓存注册表。
    插入和查找在200000条记录后非常慢。

  2. 共享内存或内存映射文件是缓存数据的开销,
    是因为这些数据不是在进程间共享的

  3. 使用 sqlite3 内存中& flatfile应用数据库可以
    值。但它也有缓慢的查找后2-3百万的记录。

  4. 处理内存可能对自己的内核内存消耗有一些限制。我的
    假设是在32位机器& 64位机器上的4千兆位。

如果您遇到这个问题并以任何方式解决,请向我建议。 >

感谢

解决方案

如果缓存是一个简单的键值存储,不应使用 std :: map ,它具有 (日志 )查找, c> std :: unordered_map ,它具有 O (1)查找。你应该只使用 std :: map 如果你需要排序。



听起来性能就是你之后,因此您可能需要查看 Boost Intrusive 。您可以轻松地结合 unordered_map 列表来创建高效LRU。


I am working in Finance Industry. We want to roll out Database hit for data processing. It is very costly. So we are planning to go for on-demand cache logic. [ runtime insert & runtime lookup ]

Is anyone worked in implementation of Caching logic for more than 10 million of records?. Per record is say about 160 - 200 bytes.

I faced following disadvantages with different approach.

  1. Can not use stl std::map to implement a key base cache registry. The insert and lookup is very slow after 200000 records.
  2. Shared memory or memory mapped files are kind of overhead for caching data, because these data are not shared across the processes
  3. Use of sqlite3 in-memory & flatfile application database can be worth. But it too have slow lookup after a 2-3 million of records.
  4. Process memory might have some limitation on its own kernel memory consumption. my assumption is 2 gig on 32 bit machine & 4 gig on 64 bit machine.

Please suggest me something if you had come across this problem and solved by any means.

Thanks

解决方案

If your cache is a simple key-value store, you should not be using std::map, which has O(log n) lookup, but std::unordered_map, which has O(1) lookup. You should only use std::map if you require sorting.

It sounds like performance is what you're after, so you might want to look at Boost Intrusive. You can easily combine unordered_map and list to create a high-efficiency LRU.

这篇关于在进程内存中缓存巨量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆