在 Java 中同步 String 对象 [英] Synchronizing on String objects in Java

查看:24
本文介绍了在 Java 中同步 String 对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 web 应用程序,我正在对它进行一些负载/性能测试,特别是在我们预计有几百个用户访问同一页面并在此页面上大约每 10 秒刷新一次的功能上.我们发现可以使用此函数进行改进的一个方面是将来自 Web 服务的响应缓存一段时间,因为数据没有改变.

在实现了这个基本的缓存之后,在一些进一步的测试中我发现我没有考虑并发线程如何同时访问缓存.我发现在大约 100 毫秒内,大约有 50 个线程试图从缓存中获取对象,发现它已过期,点击 Web 服务获取数据,然后将对象放回缓存中.

原始代码如下所示:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {final String key = "Data-";+ 电子邮件;SomeData[] data = (SomeData[]) StaticCache.get(key);如果(数据==空){数据 = service.getSomeDataForEmail(email);StaticCache.set(key, data, CACHE_TIME);}别的 {logger.debug(getSomeDataForEmail: using cached object");}返回数据;}

因此,为了确保在 key 处的对象过期时只有一个线程正在调用 Web 服务,我认为我需要同步 Cache get/set 操作,并且似乎使用了缓存键将是同步对象的一个​​很好的候选者(这样,对电子邮件 b@b.com 的此方法调用将不会被对 a@a.com 的方法调用阻止).

我将方法更新为如下所示:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {SomeData[] 数据 = null;final String key = "Data-";+ 电子邮件;同步(键){数据 =(SomeData[]) StaticCache.get(key);如果(数据==空){数据 = service.getSomeDataForEmail(email);StaticCache.set(key, data, CACHE_TIME);}别的 {logger.debug(getSomeDataForEmail: using cached object");}}返回数据;}

我还为诸如同步块之前"、同步块内部"、即将离开同步块"和同步块之后"之类的内容添加了日志记录行,因此我可以确定我是否有效同步获取/设置操作.

然而,这似乎并没有奏效.我的测试日志输出如下:

(日志输出为 'threadname' 'logger name' 'message')http-80-Processor253 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor253 jsp.view-page - getSomeDataForEmail:内部同步块http-80-Processor253 cache.StaticCache - get: object at key [SomeData-test@test.com] 已过期http-80-Processor253 cache.StaticCache - get: key [SomeData-test@test.com] 返回值 [null]http-80-Processor263 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor263 jsp.view-page - getSomeDataForEmail:内部同步块http-80-Processor263 cache.StaticCache - get: object at key [SomeData-test@test.com] 已过期http-80-Processor263 cache.StaticCache - get: key [SomeData-test@test.com] 返回值 [null]http-80-Processor131 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor131 jsp.view-page - getSomeDataForEmail:内部同步块http-80-Processor131 cache.StaticCache - get: object at key [SomeData-test@test.com] 已过期http-80-Processor131 cache.StaticCache - get: key [SomeData-test@test.com] 返回值 [null]http-80-Processor104 jsp.view-page - getSomeDataForEmail:内部同步块http-80-Processor104 cache.StaticCache - get: object at key [SomeData-test@test.com] 已过期http-80-Processor104 cache.StaticCache - get: key [SomeData-test@test.com] 返回值 [null]http-80-Processor252 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor283 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor2 jsp.view-page - getSomeDataForEmail:即将进入同步块http-80-Processor2 jsp.view-page - getSomeDataForEmail:内部同步块

我希望一次只看到一个线程围绕 get/set 操作进入/退出同步块.

同步 String 对象有问题吗?我认为缓存键是一个不错的选择,因为它是操作所独有的,即使 final String key 是在方法中声明的,我认为每个线程都会得到一个引用同一个对象,因此会在这个单个对象上同步.

我在这里做错了什么?

更新:进一步查看日志后,似乎具有相同同步逻辑的方法,其中密钥始终相同,例如

final String key = "blah";...同步(键){ ...

不要表现出相同的并发问题 - 一次只有一个线程进入块.

更新 2:感谢大家的帮助!我接受了关于 intern()ing Strings 的第一个答案,它解决了我最初的问题——多个线程正在进入我认为不应该进入的同步块,因为 key具有相同的值.

正如其他人指出的那样,将 intern() 用于这样的目的并在这些字符串上进行同步确实证明是一个坏主意 - 当针对 webapp 运行 JMeter 测试以模拟预期的加载,我看到使用的堆大小在不到 20 分钟的时间内增长到近 1GB.

目前我使用的只是同步整个方法的简单解决方案 - 但我真的喜欢 martinprobst 和 MBCook 提供的代码示例,但因为我有大约 7 个类似的 getData() 目前这个类中的方法(因为它需要来自 Web 服务的大约 7 条不同的数据),我不想为每个方法添加关于获取和释放锁的几乎重复的逻辑.但这对于未来的使用绝对是非常非常有价值的信息.我认为这些最终是关于如何最好地使这种线程安全的操作的正确答案,如果可以的话,我会为这些答案投更多票!

解决方案

无需全神贯注,快速浏览一下您所说的内容,看起来好像您需要实习()您的字符串:

final String firstkey = "Data-" + email;final String key = firstkey.intern();

具有相同值的两个字符串不一定是同一个对象.

请注意,这可能会引入一个新的争用点,因为在 VM 深处,intern() 可能必须获取锁.我不知道这方面的现代虚拟机是什么样子,但有人希望它们能被彻底优化.

我假设您知道 StaticCache 仍然需要是线程安全的.但是,与在调用 getSomeDataForEmail 时锁定缓存而不是仅锁定键相比,那里的争用应该很小.

对问题更新的回复:

我认为这是因为字符串文字总是产生相同的对象.Dave Costa 在评论中指出,它甚至比这更好:文字总是产生规范表示.因此,程序中任何地方具有相同值的所有字符串文字都会产生相同的对象.

编辑

其他人指出,同步实习字符串实际上是一个非常糟糕的主意 - 部分原因是允许创建实习字符串使它们永久存在,部分原因是如果超过一点程序中任何地方的代码都在实习字符串上同步,这些代码位之间存在依赖关系,并且可能无法防止死锁或其他错误.

在我输入的其他答案中,正在开发通过为每个键字符串存储一个锁定对象来避免这种情况的策略.

这是另一种选择 - 它仍然使用单一锁,但我们知道无论如何我们都需要其中一个用于缓存,并且您说的是 50 个线程,而不是 5000 个,因此这可能不是致命的.我还假设这里的性能瓶颈是 DoSlowThing() 中的缓慢阻塞 I/O,因此这将大大受益于不被序列化.如果这不是瓶颈,那么:

  • 如果 CPU 很忙,那么这种方法可能不够用,您需要另一种方法.
  • 如果CPU不忙,访问服务器也不是瓶颈,那么这种方式就大材小用了,你不妨把这个和per-key锁都忘记,在整个操作周围放一个大的synchronized(StaticCache),并以简单的方式完成.

显然,这种方法在使用前需要对可扩展性进行浸泡测试——我不保证什么.

此代码不需要 StaticCache 是同步的或线程安全的.如果任何其他代码(例如旧数据的预定清理)曾触及缓存,则需要重新访问.

IN_PROGRESS 是一个虚拟值 - 不完全干净,但代码很简单,它节省了两个哈希表.它不处理 InterruptedException 因为我不知道在这种情况下您的应用程序想要做什么.此外,如果 DoSlowThing() 对于给定的键始终失败,则此代码并不完全优雅,因为每个线程都会重试.由于我不知道失败标准是什么,以及它们是临时的还是永久的,我也不处理这个问题,我只是确保线程不会永远阻塞.在实践中,您可能希望在缓存中放置一个指示不可用"的数据值,这可能是有原因的,以及重试时间的超时.

//不要在这里尝试双重检查锁定.我是认真的.同步(静态对象){数据 = StaticCache.get(key);而(数据== IN_PROGRESS){//另一个线程正在获取数据静态对象.wait();数据 = StaticCache.get(key);}如果(数据==空){//我们必须得到数据StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);}}如果(数据==空){//我们必须得到数据尝试 {数据 = server.DoSlowThing(key);} 最后 {同步(静态对象){//警告:这里的失败是致命的,必须允许终止//应用程序,否则服务员将永远离开.选择合适的//保证替换键值的集合类型.StaticCache.put(key, data, CURRENT_TIME);静态对象.notifyAll();}}}

每次向缓存中添加任何内容时,所有线程都会唤醒并检查缓存(无论它们使用的是什么键),因此可以使用争议较少的算法获得更好的性能.但是,大部分工作将在大量空闲 CPU 时间阻塞 I/O 期间进行,因此这可能不是问题.

如果您为缓存及其关联的锁、它返回的数据、IN_PROGRESS 哑元和要执行的慢速操作定义了合适的抽象,则此代码可以通用以用于多个缓存.将整个事情滚动到缓存上的一个方法中可能不是一个坏主意.

I have a webapp that I am in the middle of doing some load/performance testing on, particularily on a feature where we expect a few hundred users to be accessing the same page and hitting refresh about every 10 seconds on this page. One area of improvement that we found we could make with this function was to cache the responses from the web service for some period of time, since the data is not changing.

After implementing this basic caching, in some further testing I found out that I didn't consider how concurrent threads could access the Cache at the same time. I found that within the matter of ~100ms, about 50 threads were trying to fetch the object from the Cache, finding that it had expired, hitting the web service to fetch the data, and then putting the object back in the cache.

The original code looked something like this:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {

  final String key = "Data-" + email;
  SomeData[] data = (SomeData[]) StaticCache.get(key);

  if (data == null) {
      data = service.getSomeDataForEmail(email);

      StaticCache.set(key, data, CACHE_TIME);
  }
  else {
      logger.debug("getSomeDataForEmail: using cached object");
  }

  return data;
}

So, to make sure that only one thread was calling the web service when the object at key expired, I thought I needed to synchronize the Cache get/set operation, and it seemed like using the cache key would be a good candidate for an object to synchronize on (this way, calls to this method for email b@b.com would not be blocked by method calls to a@a.com).

I updated the method to look like this:

private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {

  
  SomeData[] data = null;
  final String key = "Data-" + email;
  
  synchronized(key) {      
    data =(SomeData[]) StaticCache.get(key);

    if (data == null) {
        data = service.getSomeDataForEmail(email);
        StaticCache.set(key, data, CACHE_TIME);
    }
    else {
      logger.debug("getSomeDataForEmail: using cached object");
    }
  }

  return data;
}

I also added logging lines for things like "before synchronization block", "inside synchronization block", "about to leave synchronization block", and "after synchronization block", so I could determine if I was effectively synchronizing the get/set operation.

However it doesn't seem like this has worked. My test logs have output like:

(log output is 'threadname' 'logger name' 'message')  
http-80-Processor253 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor253 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor253 cache.StaticCache - get: object at key [SomeData-test@test.com] has expired  
http-80-Processor253 cache.StaticCache - get: key [SomeData-test@test.com] returning value [null]  
http-80-Processor263 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor263 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor263 cache.StaticCache - get: object at key [SomeData-test@test.com] has expired  
http-80-Processor263 cache.StaticCache - get: key [SomeData-test@test.com] returning value [null]  
http-80-Processor131 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor131 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor131 cache.StaticCache - get: object at key [SomeData-test@test.com] has expired  
http-80-Processor131 cache.StaticCache - get: key [SomeData-test@test.com] returning value [null]  
http-80-Processor104 jsp.view-page - getSomeDataForEmail: inside synchronization block  
http-80-Processor104 cache.StaticCache - get: object at key [SomeData-test@test.com] has expired  
http-80-Processor104 cache.StaticCache - get: key [SomeData-test@test.com] returning value [null]  
http-80-Processor252 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor283 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor2 jsp.view-page - getSomeDataForEmail: about to enter synchronization block  
http-80-Processor2 jsp.view-page - getSomeDataForEmail: inside synchronization block  

I wanted to see only one thread at a time entering/exiting the synchronization block around the get/set operations.

Is there an issue in synchronizing on String objects? I thought the cache-key would be a good choice as it is unique to the operation, and even though the final String key is declared within the method, I was thinking that each thread would be getting a reference to the same object and therefore would synchronization on this single object.

What am I doing wrong here?

Update: after looking further at the logs, it seems like methods with the same synchronization logic where the key is always the same, such as

final String key = "blah";
...
synchronized(key) { ...

do not exhibit the same concurrency problem - only one thread at a time is entering the block.

Update 2: Thanks to everyone for the help! I accepted the first answer about intern()ing Strings, which solved my initial problem - where multiple threads were entering synchronized blocks where I thought they shouldn't, because the key's had the same value.

As others have pointed out, using intern() for such a purpose and synchronizing on those Strings does indeed turn out to be a bad idea - when running JMeter tests against the webapp to simulate the expected load, I saw the used heap size grow to almost 1GB in just under 20 minutes.

Currently I'm using the simple solution of just synchronizing the entire method - but I really like the code samples provided by martinprobst and MBCook, but since I have about 7 similar getData() methods in this class currently (since it needs about 7 different pieces of data from a web service), I didn't want to add almost-duplicate logic about getting and releasing locks to each method. But this is definitely very, very valuable info for future usage. I think these are ultimately the correct answers on how best to make an operation like this thread-safe, and I'd give out more votes to these answers if I could!

解决方案

Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:

final String firstkey = "Data-" + email;
final String key = firstkey.intern();

Two Strings with the same value are otherwise not necessarily the same object.

Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.

I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.

Response to question update:

I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.

Edit

Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.

Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.

Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:

  • If the CPU is busy then this approach may not be sufficient and you need another approach.
  • If the CPU is not busy, and access to server is not a bottleneck, then this approach is overkill, and you might as well forget both this and per-key locking, put a big synchronized(StaticCache) around the whole operation, and do it the easy way.

Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.

This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.

IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.

// do not attempt double-check locking here. I mean it.
synchronized(StaticObject) {
    data = StaticCache.get(key);
    while (data == IN_PROGRESS) {
        // another thread is getting the data
        StaticObject.wait();
        data = StaticCache.get(key);
    }
    if (data == null) {
        // we must get the data
        StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);
    }
}
if (data == null) {
    // we must get the data
    try {
        data = server.DoSlowThing(key);
    } finally {
        synchronized(StaticObject) {
            // WARNING: failure here is fatal, and must be allowed to terminate
            // the app or else waiters will be left forever. Choose a suitable
            // collection type in which replacing the value for a key is guaranteed.
            StaticCache.put(key, data, CURRENT_TIME);
            StaticObject.notifyAll();
        }
    }
}

Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.

This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.

这篇关于在 Java 中同步 String 对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆