组合缓存方法 - 基于内存缓存/磁盘 [英] Combining cache methods - memcache/disk based

查看:178
本文介绍了组合缓存方法 - 基于内存缓存/磁盘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是协议。我们将采取完整的静态html的道路来解决性能问题,但由于网站将是部分动态,这不会为我们工作。
我们想到的是使用memcache + eAccelerator来加速PHP并为最常用的数据缓存缓存。

Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data.

这两种方法我们现在想了:

Here's our two approaches that we have thought of right now:


  • 使用memcache on >> all<主要查询,并让它单独做它做的最好的。

  • Using memcache on >>all<< major queries and leaving it alone to do what it does best.

为大多数常用的检索数据使用内存缓存,并与标准的硬盘驱动器存储缓存结合使用。

Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage.

仅使用内存缓存的主要优点当然是性能,但随着用户的增加,内存使用量变得很重。结合这两种声音就像对我们更自然的方法,即使理论在性能上有所妥协。
Memcached似乎也有一些复制功能可用,当增加节点时可能会很方便。

The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes.

我们应该使用什么方法?
- 是否愚蠢的妥协和结合这两种方法?我们应该集中在使用内存缓存,而是专注于随着负载随着用户数量的增加而升级内存。

What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users?

非常感谢!

推荐答案

妥协和结合这两种方法是一种非常聪明的方式。

Compromize and combine this two method is a very clever way, I think.

最明显的缓存管理规则是延迟vs大小规则,也用于CPU缓存。在多级高速缓存中,每个下一级应具有更大的大小以补偿更高的延迟。我们有更高的延迟,但更高的缓存命中率。所以,我不建议你把基于磁盘的缓存在memcache前面。反之,它应该放在memcache后面。唯一的例外是如果您将目录安装在内存中( tmpfs )。在这种情况下,基于文件的缓存可以补偿内存缓存的高负载,并且还可以具有延迟利益(因为数据本地化)。

The most obvious cache management rule is latency v.s. size rule, which is used in CPU cached also. In multi level caches each next level should have more size for compensating higher latency. We have higher latency but higher cache hit ratio. So, I didn't recommend you to place disk based cache in front of memcache. Сonversely it's should be place behind memcache. The only exception is if you cache directory mounted in memory (tmpfs). In this case file based cache could compensate high load on memcache, and also could have latency profits (because of data locality).

这两个存储(基于文件的memcache)不仅是方便缓存的存储器。你也可以使用几乎任何KV数据库,因为他们在并发控制非常好。

This two storages (file based, memcache) are not only storages that are convenient for cache. You also could use almost any KV database as they are very good at concurrency control.

缓存失效是一个单独的问题,可以吸引你的注意。有几个技巧,你可以用来提供更多微妙的缓存更新缓存未命中。其中之一是狗桩效应预测。如果几个并发线程同时遇到缓存缺失,则所有这些线程都会到达后端(数据库)。应用程序应该只允许其中一个进行,其余的应该等待缓存。二是后台缓存更新。很高兴更新缓存不是在Web请求线程,而是在后台。在后台,你可以更好地控制并发级别和更新超时。

Cache invalidation is separate question which can engage your attention. There are several tricks you could use to provide more subtle cache update on cache misses. One of them is dog pile effect prediction. If several concurrent threads got cache miss simultaneously all of them go to backend (database). Application should allow only one of them to proceed and rest of them should wait on cache. Second is background cache update. It's nice to update cache not in web request thread but in background. In background you can control concurrency level and update timeouts more gracefully.

实际上有一个很酷的方法,允许你做基于标记的缓存跟踪( memcached-tag )。这是非常简单的引擎盖下。对于每个缓存条目,您保存它所属的标签版本的向量(例如: {directory#5:1,user#8:2} )。当你读取缓存行时,你还从memcached中读取所有实际的向量数(这可以用 multiget 有效地执行)。如果至少一个实际标签版本大于高速缓存行中保存的标签版本,则高速缓存无效。当您更改对象(例如目录)时,适当的标签版本应该增加。这是非常简单和强大的方法,但有自己的缺点,虽然。在此方案中,您无法执行有效的缓存无效。

Actually there is one cool method which allows you to do tag based cache tracking (memcached-tag for example). It's very simple under the hood. With every cache entry you save a vector of tags versions which it is belongs to (for example: {directory#5: 1, user#8: 2}). When you reading cache line you also read all actual vector numbers from memcached (this could be effectively performed with multiget). If at least one actual tag version is greater than tag version saved in cache line then cache is invalidated. And when you change objects (for example directory) appropriate tag version should be incremented. It's very simple and powerful method, but have it's own disadvantages, though. In this scheme you couldn't perform efficient cache invalidation. Memcached could easily drop out live entries and keep old entries.

当然你应该记住:计算机科学只有两个难点:缓存无效和命名的东西 - Phil Karlton。

And of course you should remember: "There are only two hard things in Computer Science: cache invalidation and naming things" - Phil Karlton.

这篇关于组合缓存方法 - 基于内存缓存/磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆