如何缓存从MySQL数据库创建的对象 [英] How to cache objects created from MySQL database

查看:81
本文介绍了如何缓存从MySQL数据库创建的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个服务器应用程序,该应用程序必须每秒通过连接的客户端触发许多MySQL查询,并且我想知道除了良好的结构化数据库之外,提高性能的最佳方法是什么.

我的想法是缓存某些数据,然后仅发送缓存的数据,而不执行新的查询.就像数据库具有不同的表一样,缓存将不得不处理这些不同的结构"(对象).由于连接到服务器的客户端可以间接更改数据,因此我必须能够编辑缓存的数据(并且有时可以更新我的数据库,但这并不难).这里是我需要做的简短清单:

  • 编辑缓存的数据/对象
  • 删除旧数据/对象并添加新数据/对象
  • 按某种优先级(最后使用)排序数据
  • 通过某种ID识别数据

我认为使用向量或queue/priority_queue是一个好主意.这样,我可以为要缓存的每个表/对象创建一个队列或向量(不执行任何测试,因为我想在浪费时间之前获得更多的意见).存储在这些缓存结构中的最大对象将约为1 KB(最可能较小),而最小的对象可能为96字节.

对于每个缓存结构,我不必存储超过50.000个对象,并且我认为我可以使用10种不同的结构(每个人都使用不同的对象类型).

最重要的部分是速度,否则我只能执行查询.不仅我要进行查询,而且还要在以后创建一个新对象,而不仅仅是重用或重新发送旧对象.

这是我的问题:

  • 根据所提供的信息并且为什么缓存数据/对象的最佳方法是什么?

哦,当我说的是结构时,我不是在说struct,只是我不知道该如何引用矢量队列映射等.同时也许容器会更好:).

解决方案

有很多事情要考虑,但总的来说,我会根据您的情况将关系映射基于企业应用程序体系结构模式来自Martin Fowler .这是一个很好的信息.

现在要具体说明...

  • 通过某种ID识别数据

通常,您将为此在数据库中使用一些自动递增的整数列.您可以使用unordered_map快速从缓存中提取那些对象.由于缓存中具有所有对象,因此为了优化起见,还可以实现某些find*函数来首先搜索缓存.如果您的搜索时间受到严格限制,则可以使用unordered_map/unordered_multimap对某些数据进行索引",或者仅使用良好的旧地图/multimap.但是,这会使工作量加倍,并且您已经可以在数据库中免费获得这些查询.

  • 编辑缓存的数据/对象

在您实际将脏数据写入数据库之前,脏数据对系统的其余部分应该是不可见的.启动更新后,如果一切按预期进行,则可以将缓存中的对象替换为用于更新的对象,或者简单地删除缓存中的对象,然后让其他读者从数据库中拾取它(结果是再次缓存对象).您可以通过克隆原始的Gateway对象来实现此目的,但是最重要的是,您应该实现一些锁定策略.

  • 删除旧数据/对象并添加新数据/对象

在这里,您只需从缓存中删除对象,然后尝试从数据库中删除.如果在数据库中删除失败,则其他读取器将对其进行缓存.只需确保删除过程中没有客户端可以访问相同的记录.添加新记录时,只需实例化Gateway对象,然后将其传递给域级别的对象,完成更改后,请在Gateway对象上调用insert.您可以将新的Gateway对象放入缓存中,也可以让第一个阅读器将其放入缓存中.

  • 按某种优先级(最后使用)排序数据
  • 根据所提供的信息并且为什么缓存数据/对象的最佳方法是什么?

这是选择最佳缓存算法的问题.这不是一个容易解决的问题答案,但是LRU应该可以正常工作.没有实际的度量标准,就没有正确的答案,但是LRU易于实现,如果它不符合您的要求,则只需进行度量标准并决定一种新算法即可.确保具有良好的缓存接口,可以无缝地执行此操作.要记住的另一件事是,域级对象永远不应依赖于缓存的限制.如果您需要100k个对象,但是只有50k个缓存,则您仍将所有100k个对象存储在内存中,但是其中有50k个在缓存中.换句话说,您的对象不应依赖于缓存的状态,也不必在乎是否具有缓存.

接下来,如果您仍然不赞成RDG,那么您只需在缓存中缓存Gateway对象.您可以通过shared_ptr将网关对象的实例保留在缓存中,但如果要避免脏写,也应该考虑使用锁定策略(乐观与悲观).另外,所有网关(每个表一个)都可以继承相同的接口,因此可以概括化保存/加载策略,并且还可以在保持简单的同时使用单个池. (签出boost :: pool.也许可以帮助您实现缓存.)

最后一点:

蛋糕是骗人的! :D无论您决定做什么,请确保它基于相当数量的性能指标.如果将性能提高20%,并且花费了2个月的时间,那么考虑将更多的RAM内存添加到硬件中也许是值得的.进行一些容易验证的概念验证,这将为您提供足够的信息,说明是否实现缓存是有偿的;如果没有,请尝试一些经过测试且可靠的解决方案(内存缓存或类似的方法,如@Layne已经评论过).

I'm writing a server application which has to do a lot of MySQL queries per second triggered by connected clients and I'm wondering what would be the best way to improve performance apart from a good structured database.

My idea was to cache some of the data and just send the cached data instead of performing a new query. Just like a database has different tables the cache would have to deal with these different "structures" (objects). Since the clients connected to the server are indirectly able to alter the data I must be able to edit the cached data (and at some point be able to update my database but this shouldn't be too difficult). Here a short list of what I need to be able to do:

  • edit the cached data/objects
  • delete old data/objects and add new data/objects
  • order the data by some kind of priority (last used)
  • identify the data by some sort of id

I thought either a vector or a queue/priority_queue would be a good idea. This way I could create a queue or vector for every table/object I want to cache (didn't perform any tests since I wanted to get more opinions before I might waste my time). The largest object stored in these caching structures would be around 1 kilobyte (most likely smaller) and the smallest maybe 96 bytes.

Per caching structure I would not have to store more than 50.000 objects and I think I could work with 10 different structures (everyone for a different object type).

The most important part is speed otherwise I could just perform the queries. It's not just that I would have to make the queries but also create a new object afterwards instead of just reuse or resend the old object.

So here is my question:

  • What would be the best way to cache the data/objects based on the provided information AND WHY?

Edit: Oh and when I mean structures I don't mean struct I just didn't know how I should refer to vector queue map etc. at the same time maybe container would have been better :).

解决方案

There are many thing to consider, but in general I would base relational mapping in your case on the Row Data Gateway pattern (RDG). If you don't have too many different object types, this approach to architecture should scale good enough. RDG should facilitate your caching implementation if you constrain cache book-keeping to the Finder class.

If you have the time and will, check out the Patterns of Enterprise Application Architecture from Martin Fowler. It's a well of good information.

Now to the specifics...

  • identify the data by some sort of id

Typically you would use some auto-incremented integer column in the database for this. You can use unordered_map to pull those objects from the cache quickly. Since you have all the objects in your cache, for the sake of optimization, you could also implement some of the find* functions to search the cache first. You can use unordered_map/unordered_multimap to 'index' some of the data, if your search time is highly restricted, or just stick to the good old map/multimap. However, this is doubling the work, and you already have it for free in the database these kinds of queries.

  • edit the cached data/objects

Dirty data shouldn't be visible to the rest of the system until you actually write it to the database. Once you kick the update, and if all goes as intended, you can either replace the object in cache with the one you used for update, or simply delete the object in cache and let other readers pick it up from the database (which will result in caching the object again). You can implement this by cloning the original Gateway object, but the bottom-line is that you should have some locking strategy implemented.

  • delete old data/objects and add new data/objects

Here you simply delete object from cache, and try to delete from the database. If deletion fails in the database, other readers will cache it. Just make sure that no client can access the same record while you're in the process od deletion. When adding new records, you simply instantiate Gateway object, pass it to the domain level object, and when you're done with the changes, call insert on the Gateway object. You can either put the new Gateway object to the cache, or simply let the first reader to put it into the cache.

  • order the data by some kind of priority (last used)
  • What would be the best way to cache the data/objects based on the provided information AND WHY?

This is a matter of selecting the best caching algorithm. This is not an easy question to answer, but LRU should work just fine. Without actual metrics, there is no right answer, but LRU is simple to implement and if it doesn't stand to your requirements, just do the metrics and decide on a new algorithm. Make sure that you can do that seamlessly by having a good interface to the cache. One other thing to have in mind is that your domain level objects should never depend on the limits of your cache. If you need 100k objects, but you have only 50k cache , you're still having all 100k objects in memory, but 50k of them are in the cache. In other words, your objects should not depend on the state of your cache, and also should not care if you have caching at all.

Next, if you're still baring with the idea of RDG, you are simply caching Gateway object in your cache. You can keep instances of the Gateway objects in your cache by means of shared_ptr, but should also consider you locking strategy (optimistic vs pessimistic), if you want to avoid dirty writes. Also, all your Gateways (one for every table), can inherit the same interface, so you can generalize your save/load strategies, and also, you would be able to use single pool while keeping things simple. (Check out boost::pool. Maybe it can help you with the cache implementation.)

One final point:

The cake is a lie! :D No matter what you decide to do, make sure that it is based on a decent amount of performance metrics. If you improve performance by 20%, and you spent 2 months doing it, maybe it is a worthwhile to think about putting few more gigs of RAM to your hardware. Make some easy verifiable proof of concept, which will give you enough info whether implementing your cache pays up, and if not, try some of the tested and reliable solutions off the shelf (memcached or such, as @Layne already commented).

这篇关于如何缓存从MySQL数据库创建的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆