在JRuby gem中使用线程安全初始化 [英] Using threadsafe initialization in a JRuby gem

查看：110 发布时间：2020/5/14 1:22:14 multithreading thread-safety jruby jrubyonrails puma

本文介绍了在JRuby gem中使用线程安全初始化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

想要确保在JRuby中编写线程安全代码时使用的同步正确(且不超过必要)；具体来说，是在Puma实例化的Rails应用中.

更新:对该问题进行了广泛的重新编辑，非常清楚，并使用了我们正在实现的最新代码.这段代码使用了@headius(Charles Nutter)为JRuby编写的atomic gem，但是不确定我们在这里尝试执行的操作是完全必要的还是必要的方式.

这就是我们所拥有的，这是过度杀伤力(是说，我们过度/过度设计了这一点)，还是不正确?

ourgem.rb:

require 'atomic'  # gem from @headius

SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze

module Foo

  def self.included(cls)
    cls.extend(ClassMethods)
    cls.send :__setup
  end

  module ClassMethods
    def get(service_name, method_name, *args)
      __cached_client(service_name).send(method_name.to_sym, *args)
      # we also capture exceptions here, but leaving those out for brevity
    end

    private

    def __client(service_name)
      # obtain and return a client handle for the given service_name
      # we definitely want to cache the value returned from this method
      # **AND**
      # it is a requirement that this method ONLY be called *once PER service_name*.
    end

    def __cached_client(service_name)
      @@_clients.value[service_name]
    end

    def __setup
      @@_clients = Atomic.new({})
      @@_clients.update do |current_service|
        SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
          if current_services[service_name]
            current_services[service_name]
          else
            memo.merge({service_name => __client(service_name)})
          end
        end
      end
    end
  end
end

client.rb:

require 'ourgem'

class GetStuffFromServiceABC
  include Foo

  def self.get_some_stuff
    result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
    puts result
  end
end

上面的摘要:我们有@@_clients(一个保存客户端哈希的可变类变量)，我们只想为所有可用服务填充ONCE，这些服务都键入service_name.

由于哈希值在类变量中(因此是 threadsafe ?)，我们是否保证对每个服务名称的__client调用不会运行多次(即使Puma是使用此类实例化多个线程以服务于来自不同用户的所有请求)?如果类变量是线程安全的(以这种方式)，那么Atomic.new({})可能是不必要的吗?

此外，我们应该改用Atomic.new(ThreadSafe::Hash)吗?再或者，那不是必须的吗?

如果不是(意思是:您认为我们 do 至少需要Atomic.new，也许还需要ThreadSafe::Hash)，那么为什么不能第二(或第三，等等). )Atomic.new(nil)和@@_clients.update do ...之间的线程中断，这意味着EACH线程中的Atomic.new将每个创建两个(单独的)对象?

感谢任何线程安全建议，我们在SO上没有看到任何直接解决此问题的问题.

解决方案

在我尝试解决您在此处提出的问题之前，只提供一条友好的建议即可.

该问题以及随附的代码强烈建议您(尚未)对编写多线程代码所涉及的问题有扎实的了解.我鼓励您在决定编写供生产使用的多线程应用程序之前三思而后行.您为什么真正想要使用Puma?是为了表现吗?您的应用程序是否可以同时处理许多长时间运行的，受I/O约束的请求(例如上载/下载大文件)?还是(像许多应用一样)它将主要处理简短的，受CPU限制的请求?

如果答案是"short/CPU-bound"，那么使用Puma将无济于事.多个单线程服务器进程会更好.内存消耗会更高，但是请保持理智.编写正确的多线程代码非常困难，甚至专家也会犯错.如果您的业务成功，工作安全性等取决于多线程代码的正常运行和正常运行，那么您将给自己造成很多不必要的痛苦和精神上的痛苦.

此外，让我尝试阐明您的问题中提出的一些问题.有太多话要说，很难知道从哪里开始.您可能需要坐下来阅读本论文之前，先倒自己选择的冷饮或热饮:

当谈论编写线程安全"代码时，您需要清楚其含义.在大多数情况下，线程安全"代码表示不会以可能导致数据损坏的方式同时修改可变数据的代码. (这真是令人满口！)这可能意味着该代码根本不允许并发修改可变数据(使用锁)，或者允许并发修改，但要确保其不会破坏数据(可能使用原子的).操作和一点黑魔法).

请注意，当您的线程仅读取数据而不进行修改时，或者当使用共享的无状态对象时，就没有线程安全性"的问题. /p>

线程安全"的另一种定义可能更适合您的情况，它与影响外界的操作(基本上是I/O)有关.您可能希望某些操作仅发生一次，或以特定顺序发生.如果执行这些操作的代码在多个线程上运行，则它们发生的次数可能会超出预期的次数，或者发生的顺序可能与预期的顺序不同，除非您采取措施防止这种情况发生.

似乎只有在第一次加载ourgem.rb时才调用__setup方法.据我所知，即使多个线程require同时在同一个文件中，MRI也只会让单个线程加载该文件.我不知道JRuby是否相同.但是无论如何，如果源文件被多次加载，则可能是一个更深层次的问题的征兆.它们只能在单个线程上加载一次.如果您的应用程序处理多个线程上的请求，则应在应用程序加载后之后而不是之前启动这些线程.这是唯一可行的做事方式.

假设一切正常，ourgem.rb将使用单个线程加载.这意味着__setup只会被单个线程调用.在这种情况下，根本就不必担心线程安全性(就您的客户端缓存"的初始化而言).

即使__setup被多个线程并发调用，您的原子代码也不会执行您认为的操作.首先，您使用Atomic.new({}).value.这将散列包装在原子引用中，然后将其解包，因此您只需取回散列即可.这是没有操作的.您可以只写{}.

第二，您的Atomic#update调用将不阻止初始化代码多次运行.要了解这一点，您需要知道Atomic的实际作用.

让我提出一个古老而又疲惫的增加共享计数器"的例子.假设以下代码在2个线程上运行:

 i += 1

我们都知道这里可能出什么问题.您可能会遇到以下一系列事件:

线程A读取i并将其递增.
线程B读取i并将其递增.
线程A将其递增的值写回到i.
线程B将其递增的值写回到i.

所以我们丢失了更新，对吧?但是，如果我们将计数器值存储在原子引用中并使用Atomic#update怎么办?然后就是这样:

线程A读取i并将其递增.
线程B读取i并将其递增.
线程A尝试将其递增的值写回到i，并成功.
线程B尝试将其递增的值写回到i，但失败，因为该值已更改.
线程B再次读取i并将其递增.
线程B尝试再次将其增量值写回到i，并成功.

您知道吗? Atomic 从不停止2个线程同时运行相同的代码. 的作用是强制某些线程在必要时重试#update块，以避免更新丢失.

如果您的目标是确保初始化代码仅运行一次，则使用Atomic是非常不适当的选择.如果有的话，它可以使它运行更多次，而不是更少(由于重试).

就是这样.但是，如果您仍然在这里与我在一起，我实际上会更担心您的客户端"对象本身是否是线程安全的.他们有任何可变状态吗?由于您正在缓存它们，因此似乎初始化它们必须很慢.即便如此，如果您使用锁使它们成为线程安全的，则可能无法从在线程之间缓存和共享它们获得任何好处.您的多线程"服务器可能会减少为实际上不必要的复杂单线程服务器.

如果客户端对象没有可变状态，则对您有好处.您可以轻松自在"并在线程之间共享它们，而不会出现任何问题.如果它们确实具有可变状态，但是初始化它们的速度很慢，那么我建议<每个线程缓存一个对象，这样就永远不会共享它们. Thread[]是您的朋友在那里.

Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.

UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by @headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.

Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?

ourgem.rb:

require 'atomic'  # gem from @headius

SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze

module Foo

  def self.included(cls)
    cls.extend(ClassMethods)
    cls.send :__setup
  end

  module ClassMethods
    def get(service_name, method_name, *args)
      __cached_client(service_name).send(method_name.to_sym, *args)
      # we also capture exceptions here, but leaving those out for brevity
    end

    private

    def __client(service_name)
      # obtain and return a client handle for the given service_name
      # we definitely want to cache the value returned from this method
      # **AND**
      # it is a requirement that this method ONLY be called *once PER service_name*.
    end

    def __cached_client(service_name)
      @@_clients.value[service_name]
    end

    def __setup
      @@_clients = Atomic.new({})
      @@_clients.update do |current_service|
        SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
          if current_services[service_name]
            current_services[service_name]
          else
            memo.merge({service_name => __client(service_name)})
          end
        end
      end
    end
  end
end

client.rb:

require 'ourgem'

class GetStuffFromServiceABC
  include Foo

  def self.get_some_stuff
    result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
    puts result
  end
end

Summary of the above: we have @@_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.

Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?

Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?

If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the @@_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?

Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.

解决方案

Just a friendly piece of advice, before I attempt to tackle the issues you raise here:

This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?

If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.

That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:

When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).

Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".

Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.

It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.

Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).

Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.

Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.

Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:

 i += 1

We all know what can go wrong here. You may end up with the following sequence of events:

Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.

So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:

Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.

Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.

If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).

So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.

If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.

这篇关于在JRuby gem中使用线程安全初始化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在JRuby gem中使用线程安全初始化 [英] Using threadsafe initialization in a JRuby gem

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在JRuby gem中使用线程安全初始化 [英] Using threadsafe initialization in a JRuby gem

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭