在两台服务器之间同步缓存数据的最佳方法 [英] Best way to synchronize cache data between two servers

查看:360
本文介绍了在两台服务器之间同步缓存数据的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想要在两台服务器之间同步缓存数据。两个数据库共享同一个数据库,但为了更好的执行数据,我在启动时将数据缓存到Hash Map中。
因此,希望在不重新启动服务器的情况下同步缓存的数据。 (两台服务器同时启动。)

Want to synchronize the cache data between two servers. Both database is sharing the same database, but for better execution data i have cached the data into Hash Map at startup. Thus want to synchronize the cached data without restarting servers. (Both servers starts at same time).

请告诉我最好和最有效的方法。

Please suggest me the best and efficient way to do.

推荐答案

不是尝试在两个服务器实例之间同步缓存数据,为什么不使用memcached / couchbase或redis之类的内容来集中缓存?使用像ehcache这样的分布式缓存要复杂得多,容易出错IMO使用像上面提到的缓存服务器集中缓存数据。

Instead of trying to synchronize the cached data between two server instances, why not centralize the caching instead using something like memcached/couchbase or redis? Using distributed caching with something like ehcache is far more complicated and error prone IMO vs centralizing the cached data using a caching server like those mentioned.

作为我原来答案的补充,在决定使用哪种缓存方法时(在内存中,集中),要考虑的一件事是数据的波动性正在被缓存。

As an addendum to my original answer, when deciding what caching approach to use (in memory, centralized), one thing to take into account is the volatility of the data that is being cached.

如果数据存储在数据库中,但在服务器加载数据后没有更改,那么您甚至不需要服务器之间的同步。让他们每个人从源头将这些静态数据加载到内存中,然后以他们喜欢的方式进行他们的快乐方式。数据不会改变,因此无需引入复杂的模式来保持服务器之间的数据同步。

If the data is stored in the DB, but does not change after the servers load it, then you don't even need synchronization between the servers. Just let them each load this static data into memory from the source and then go about their merry ways doing whatever it is they do. The data won't be changing, so no need to introduce a complicated pattern for keeping the data in sync between the servers.

如果确实存在一定程度的波动性数据(比如说你是缓存从数据库中查找实体数据以便保存对数据库的命中),然后我仍然认为集中式缓存是一种比内存分布式和同步缓存更好的方法。您只需要确保在缓存数据上使用适当的过期时间,以便不时地自然刷新数据。此外,您可能只想在特定实体的更新路径中从集中式存储中删除缓存数据,然后在下次请求该数据时从缓存中重新加载缓存数据。这是IMO比尝试执行真正的直写缓存更好,您可以在其中写入底层存储以及缓存。数据库本身可能会对数据进行调整(例如,通过默认的不匹配值),在这种情况下,您的缓存数据可能与数据库中的数据不匹配。

If there is indeed a level of volatility in the data (like say you are caching looked up entity data from the DB in order to save hits to the DB), then I still think centralized caching is a better approach than in-memory distributed and synchronized caching. You just need to make sure that you use an appropriate expiration on the cached data to allow natural refresh of the data from time to time. Also, you might want to just drop the cached data from the centralized store when in the update path for a particular entity and then just let it be reloaded from the cache on the next request for that data. This is IMO better than trying to do a true write-through cache where you write to the underlying store as well as the cache. The DB itself might make tweaks to the data (via defaulting unsupplied values for example), and your cached data in that case might not match what's in the DB.

编辑

在评论中提出了一个关于集中式缓存优势的问题(我猜测内存分布式缓存的内容) 。我会就此提出我的意见,但首先是标准的免责声明。集中式缓存并不是万灵药。它旨在解决与in-jvm-memory缓存相关的特定问题。在评估是否切换到它之前,您应该首先了解您的问题,看看它们是否符合集中式缓存的好处。集中式缓存是一种架构变化,它可能带有问题/警告。不要简单地转向它,因为有人说它比你正在做的更好。确保原因符合问题。

A question was asked in the comments about the advantages of a centralized cache (I'm guessing against something like an in memory distributed cache). I'll provide my opinion on that, but first a standard disclaimer. Centralized caching is not a cure-all. It aims to solve specific issues related to in-jvm-memory caching. Before evaluating whether or not to switch to it, you should understand what your problems are first and see if they fit with the benefits of centralized caching. Centralized caching is an architectural change and it can come with issues/caveats of its own. Don't switch to it simple because someone says it's better than what you are doing. Make sure the reason fits the problem.

好的,现在我认为集中式缓存可以解决的问题与in-jvm-memory(可能是分布式)缓存有什么关系。我要列出两件事,虽然我相信还有更多。我的两大问题是:整体内存占用数据同步问题

Okay, now onto my opinion for what kinds of problems centralized caching can solve vs in-jvm-memory (and possibly distributed) caching. I'm going to list two things although I'm sure there are a few more. My two big ones are: Overall Memory Footprint and Data Synchronization Issues.

让我们从开始记忆足迹。假设您正在进行标准实体缓存,以保护您的关系数据库免受过度压力。我们还要说,为了真正保护您的数据库,您需要缓存大量数据;说在许多GB的范围内。如果你正在使用in-jvm-memory缓存,并且你说有10个应用服务器盒,你需要为每个需要在jvm中进行缓存的每个盒子获得10倍的额外内存($$$)记忆。此外,您必须为JVM分配更大的堆以容纳缓存的数据。我认为JVM堆应该小而精简,以减轻垃圾收集负担。如果你有一大堆无法收集的Old Gen,那么当它进入一个完整的GC并且试图从那个臃肿的Old Gen空间中收回一些东西时,你会给你的垃圾收集器施加压力。你想避免长时间的GC2暂停时间和膨胀你的Old Gen对此没有帮助。另外,如果您的内存要求高于某个阈值,并且您恰好为您的应用层运行32位计算机,则必须升级到64位计算机,这可能是另一个令人望而却步的成本。

Let's start with Overall Memory Footprint. Say you are doing standard entity caching to protect your relational DB from undue stress. Let's also say that you have a lot of data to cache in order to really protect your DB; say in the range of many GBs. If you are doing in-jvm-memory caching, and you say had 10 app server boxes, you would need to get that additional memory ($$$) times 10 for each of the boxes that would need to be doing the caching in jvm memory. In addition, you would then have to allocate a larger heap to your JVM in order to accommodate the cached data. I'm from the opinion that the JVM heap should be small and streamlined in order to ease garbage collection burden. If you have a large chunks of Old Gen that can't be collected then your going to stress your garbage collector when it goes into a full GC and tries to reap something back from that bloated Old Gen space. You want to avoid long GC2 pause times and bloating your Old Gen is not going to help with that. Plus, if you memory requirement is above a certain threshold, and you happened to be running 32 bit machines for your app layer, you'll have to upgrade to 64 bit machines and that can be another prohibitive cost.

现在,如果您决定集中缓存数据(使用类似Redis或Memcached的东西),您可以显着减少缓存数据的总体内存占用量,因为您可能拥有它位于几个方框而不是应用层中的所有应用服务器方框。您可能希望使用集群方法(两种技术都支持它)和至少两台服务器为您提供高可用性并避免缓存层中的单点故障(更多内容在一秒内)。有一台机器可以支持缓存所需的内存需求,你可以节省一些可观的内存。此外,您可以不同地调整应用程序框和缓存框,因为它们用于不同的目的。可以针对高吞吐量和低堆调整应用程序框,并且可以针对大内存调整缓存框。并且拥有较小的堆肯定有助于提高应用层框的整体吞吐量。

Now if you decided to centralize the cached data instead (using something like Redis or Memcached), you could significantly reduce the overall memory footprint of the cached data because you could have it on a couple of boxes instead of all of the app server boxes in the app layer. You probably want to use a clustered approach (both technologies support it) and at least two servers to give you high availability and avoid a single point of failure in your caching layer (more on that in a sec). By one having a couple of machines to support the needed memory requirement for caching, you can save some considerable $$. Also, you can tune the app boxes and the cache boxes differently now as they are serving distinct purposes. The app boxes can be tuned for high throughput and low heap and the cache boxes can be tuned for large memory. And having smaller heaps will definitely help out with overall throughput of the app layer boxes.

现在可以快速集中缓存。您应该以这样的方式设置应用程序,使其在没有缓存的情况下能够存活,以防它完全停机一段时间。在传统的实体缓存中,这意味着当缓存完全不可用时,您只需针对每个请求直接访问您的数据库。不是很棒,但也不是世界末日。

Now one quick point for centralized caching in general. You should set up your application in such a way that it can survive without the cache in case it goes completely down for a period of time. In traditional entity caching, this means that when the cache goes completely unavailable, you just are hitting your DB directly for every request. Not awesome, but also not the end of the world.

好的,现在是数据同步问题。使用分布式in-jvm-memory缓存,您需要保持缓存同步。对一个节点中缓存数据的更改需要复制到其他节点并同步到其缓存数据中。这种方法有点可怕,因为如果由于某种原因(例如网络故障)其中一个节点失去同步,那么当请求进入该节点时,用户看到的数据将无法准确反对当前的节点。 D B。更糟糕的是,如果他们发出另一个请求并且遇到不同的节点,他们将看到不同的数据,这将使用户感到困惑。通过集中数据,您可以消除此问题。现在,人们可以争辩说,集中式缓存需要对同一缓存数据密钥的更新进行并发控制。如果同一个密钥有两个并发更新,那么如何确保这两个更新不会相互踩踏?我的想法是不要担心这个;当更新发生时,从缓存中删除项目(并直接写入数据库)并在下次读取时重新加载。这种方式更安全,更容易。如果您不想这样做,那么如果您真的想在更新时更新缓存和数据库,则可以使用CAS(检查和设置)功能代替乐观并发控制。

Okay, now for Data Synchronization Issues. With distributed in-jvm-memory caching, you need to keep the cache in sync. A change to cached data in one node needs to replicate to the other nodes and by sync'd into their cached data. This approach is a little scary in that if for some reason (network failure for example) one of the nodes falls out of sync, then when a request goes to that node, the data the user sees will not be accurate against what's currently in the DB. Even worse, if they make another request and that hits a different node, they will see different data and that will be confusing to the user. By centralizing the data, you eliminate this issue. Now, one could then argue that the centralized cache needs concurrency control around updates to the same cached data key. If two concurrent updates come in for the same key, how do you make sure the two updates don't stomp on each other? My thought here is to not even worry bout this; when an update happens, drop the item from the cache (and write though directly to the DB) and let it be reloaded on the next read. It's safer and easier this way. If you don't want to do that, then you can use CAS (Check-And-Set) functionality instead for optimistic concurrency control if you really want to update both the cache and db on updates.

总而言之,如果集中缓存的数据,您可以节省资金并更好地调整应用层机器。您还可以获得更好的数据准确性,因为您需要处理的数据同步问题较少。我希望这会有所帮助。

So to summarize, you can save money and better tune your app layer machines if you centralize the data they cache. You also can get better accuracy of that data as you have less data synchronization issues to deal with. I hope this helps.

这篇关于在两台服务器之间同步缓存数据的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆