通过异步设置键值对改善Rails.cache.write [英] Improving Rails.cache.write by setting key-value pairs asynchronously

查看:149
本文介绍了通过异步设置键值对改善Rails.cache.write的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用dalli将项目写入 Rails.cache.write 的性能。 .com / memcachierrel =nofollow> memcachier 云。



目前,缓存与缓存有关:


heroku,memcachier heroku addon,dalli 2.6.4,rails 3.0.19



我使用newrelic进行性能监视。

我目前正在为给定的登录用户(由 BusinessUser 实例表示)提取活跃学生当它的 active_students 方法从处理请求的控制器调用时,需要一个活跃学生列表:

  class BusinessUser< ActiveRecord :: Base 
...
def active_students
Rails.cache.fetch(/ studio /#{self.id} / students)
customer_users.active_by_name
end
end
...
end

在看过newrelic之后,我基本上已经缩小了应用程序在memcachier上设置关键值时的一大性能。每次平均需要 225ms 。此外,它看起来像设置memcache键值阻塞主线程,并最终中断请求队列。显然,这是不可取的,尤其是当缓存策略的整个要点是为了减少性能瓶颈时。另外,我已经对基准测试缓存存储与普通的dalli和Rails.cache.write 1000个相同值的缓存集合:

  heroku run console -a {app-name-redacted} 
irb(main):001:0>要求'dalli'
=> false
irb(main):002:0> cache = Dalli :: Client.new(ENV [MEMCACHIER_SERVERS]。split(,),
irb(main):003:1 * {:username => ENV [MEMCACHIER_USERNAME],
irb(main):004:2 *:password => ENV [MEMCACHIER_PASSWORD],
irb(main):005:2 *:failover => true,
irb主要):006:2 *:socket_timeout => 1.5,
irb(main):007:2 *:socket_failure_delay => 0.2
irb(main):008:2>})
=> #< Dalli :: Client:0x00000006686ce8 @servers = [server-redacted:11211],@options = {:username =>username-redacted,:password =>password-redacted,:failover => true,:socket_timeout => 1.5,:socket_failure_delay => 0.2},@ ring = nil>
irb(main):009:0>要求'基准'
=> false
irb(main):010:0> n = 1000
=> 1000
irb(main):011:0> Benchmark.bm do | x |
irb(main):012:1 * x.report {n.times do; cache.set(foo,bar); end}
irb(main):013:1> x.report {n.times do; Rails.cache.write(foo,bar); end}
irb(main):014:1>结束
用户系统总实数
Dalli :: Server#连接服务器 - 编辑:11211
Dalli / SASL认证用户名 - 编辑
Dalli / SASL:用户名编辑
0.090000 0.050000 0.140000(2.066113)

Dalli :: Server#connect server-redacted:11211
Dalli / SASL认证为用户名编辑
Dalli / SASL:username-redacted

0.100000 0.070000 0.170000(2.108364)

使用普通的dalli cache.set ,我们使用2.066113s将1000个条目写入缓存中,平均 cache.set 时间为2.06ms。 / p>

使用 Rails.cache.write ,我们使用2.108364s将1000个条目写入缓存, Rails.cache.write 时间为2.11ms。



⇒看起来问题不在memcachier中,但简单地用我们试图存储的数据量。



根据 #fetch方法的文档,看起来它不会是我想要的方式如果我想将缓存集放入单独的线程或worker中,因为我无法从中读取 write 读取 - 不言而喻,我不想异步阅读。



是否可以通过抛出<$ c $来减少瓶颈c> Rails.cache.write 在设置键值时是否需要添加到worker中?或者更普遍,是否有更好的模式可以做到这一点,这样每次我要执行 Rails.cache.write

解决方案

在正常情况下,有两个因素会影响整体延迟:客户端编组/压缩和网络带宽。

Dalli mashalls可选压缩数据,这可能相当昂贵。以下是编组和压缩随机字符列表的一些基准(一种用户ID或类似的人工列表)。在这两种情况下,结果值都是大约200KB。这两个基准测试都运行在Heroku测功机上 - 性能显然取决于CPU和机器的负载:

  irb> val =(1..50000).to_a.map! {兰特(255).CHR};无
#50000个单字符串列表
irb> Marshal.dump(val).size
275832
#OK,大约200K。在开始与MemCachier交谈之前,执行此操作
#需要多长时间?
irb> Benchmark.measure {Marshal.dump(val)}
=> 0.040000 0.000000 0.040000(0.044568)
#大约45ms,这与列表长度大致成线性关系。


irb> val =(1..100000).to_a; nil#100000个整数列表
irb> Zlib :: Deflate.deflate(Marshal.dump(val))。size
177535
#OK,大约200K。执行此操作需要多长时间
irb> Benchmark.measure {Zlib :: Deflate.deflate(Marshal.dump(val))}
=> 0.140000 0.000000 0.140000(0.145672)

因此,我们基本上可以看到40毫秒到150毫秒的任何表现用于编组和/或压缩数据。编组一个字符串将会便宜得多,而编组类似复杂对象的东西将会更加昂贵。压缩取决于数据的大小,也取决于数据的冗余。例如,压缩所有a字符的1MB字符串只需要大约10ms。

网络带宽将在这里扮演一些角色,但不是非常重要的角色。 MemCachier对值有1MB的限制,从MemCachier转移到/从MemCachier转换大约需要20ms:

  irb(main):036 :0> Benchmark.measure {1000.times {c.set(h,val,0,:raw => true)}} 
=> 0.250000 11.620000 11.870000(21.284664)

这大约为400Mbps(1MB * 8MB / Mb * s / 20ms)),这是有道理的。然而,即使是一个相对较大的,但仍然较小的200KB的值,我们预计会有5倍的加速:

  irb(main ):039:0> val =a*(1024 * 200); val.size 
=> 204800
irb(main):040:0> Benchmark.measure {1000.times {c.set(h,val,0,:raw => true)}}
=> 0.160000 2.890000 3.050000(5.954258)

因此,您可能可以通过几件事来获得一些加速:


  1. 使用更快的编组机制。例如,使用 Array#pack(L *)将50,000个32位无符号整数的列表(如在第一个基准测试中)编码为一串长度200,000(每个整数4个字节),只需要2ms而不是40ms。使用相同的编组方案进行压缩,获得相似大小的值也非常快(大约2ms),但压缩对随机数据不再有用(Ruby's Marshal甚至在列表中产生相当多余的字符串随机整数)。


  2. 使用较小的值。这可能需要对应用程序进行深度更改,但如果您不需要整个列表,则应该设置它。例如,memcache协议有 append prepend 操作。如果您只是向长列表添加新东西,您可以使用这些操作。

  3. 建议,从关键路径中移除集合/将获得,这将防止任何延迟影响HTTP请求延迟。您仍然需要将数据提供给工作人员,因此,如果您使用的是类似工作队列的数据,则发送给工作人员的消息应该只包含有关要构建哪些数据的说明,而不是数据本身(或您'再次陷入同一个洞中,只是有着不同的系统)。一个非常轻量级的(就编码工作而言)就是简单地分叉进程: 。)。all.map!(& id)
    ...我需要用学生的新列表更新memcache ...
    fork do
    #必须创建一个新的Dalli客户端
    客户端= Dalli :: Client.new
    client.set(mylistkey,mylist)
    #这将与以前同时阻塞,但在单独的进程中运行
    end

    我没有对完整示例进行基准测试,但由于您不是 exec ing,而Linux fork是写时复制,fork调用本身的开销应该是最小的。在我的机器上,大约500us( micro - 秒不是毫秒)。


    I am currently thinking about improving the performance of Rails.cache.write when using dalli to write items to the memcachier cloud.

    The stack, as it relates to caching, is currently:

    heroku, memcachier heroku addon, dalli 2.6.4, rails 3.0.19

    I am using newrelic for performance monitoring.

    I am currently fetching "active students" for a given logged in user, represented by a BusinessUser instance, when its active_students method is called from a controller handling a request that requires a list of "active students":

    class BusinessUser < ActiveRecord::Base
      ...
      def active_students
        Rails.cache.fetch("/studio/#{self.id}/students") do
          customer_users.active_by_name
        end
      end
      ...
    end
    

    After looking at newrelic, I've basically narrowed down one big performance hit for the app in setting key values on memcachier. It takes an average of 225ms every time. Further, it looks like setting memcache key values blocks the main thread and eventually disrupts the request queue. Obviously this is undesirable, especially when the whole point of the caching strategy is to reduce performance bottlenecks.

    In addition, I've benchmarked the cache storage with plain dalli, and Rails.cache.write for 1000 cache sets of the same value:

    heroku run console -a {app-name-redacted}
    irb(main):001:0> require 'dalli'
    => false
    irb(main):002:0> cache = Dalli::Client.new(ENV["MEMCACHIER_SERVERS"].split(","),
    irb(main):003:1*                     {:username => ENV["MEMCACHIER_USERNAME"],
    irb(main):004:2*                      :password => ENV["MEMCACHIER_PASSWORD"],
    irb(main):005:2*                      :failover => true,
    irb(main):006:2*                      :socket_timeout => 1.5,
    irb(main):007:2*                      :socket_failure_delay => 0.2
    irb(main):008:2>                     })
    => #<Dalli::Client:0x00000006686ce8 @servers=["server-redacted:11211"], @options={:username=>"username-redacted", :password=>"password-redacted", :failover=>true, :socket_timeout=>1.5, :socket_failure_delay=>0.2}, @ring=nil>
    irb(main):009:0> require 'benchmark'
    => false
    irb(main):010:0> n = 1000
    => 1000
    irb(main):011:0> Benchmark.bm do |x|
    irb(main):012:1*   x.report { n.times do ; cache.set("foo", "bar") ; end }
    irb(main):013:1>   x.report { n.times do ; Rails.cache.write("foo", "bar") ; end }
    irb(main):014:1> end
           user     system      total        real
     Dalli::Server#connect server-redacted:11211
    Dalli/SASL authenticating as username-redacted
    Dalli/SASL: username-redacted
      0.090000   0.050000   0.140000 (  2.066113)
    
    Dalli::Server#connect server-redacted:11211
    Dalli/SASL authenticating as username-redacted
    Dalli/SASL: username-redacted
    
      0.100000   0.070000   0.170000 (  2.108364)
    

    With plain dalli cache.set, we are using 2.066113s to write 1000 entries into the cache, for an average cache.set time of 2.06ms.

    With Rails.cache.write, we are using 2.108364s to write 1000 entries into the cache, for an average Rails.cache.write time of 2.11ms.

    ⇒ It seems like the problem is not with memcachier, but simply with the amount of data that we are attempting to store.

    According to the docs for the #fetch method, it looks like it would not be the way I want to go, if I want to throw cache sets into a separate thread or a worker, because I can't split out the write from the read - and self-evidently, I don't want to be reading asynchronously.

    Is it possible to reduce the bottleneck by throwing Rails.cache.write into a worker, when setting key values? Or, more generally, is there a better pattern to do this, so that I am not blocking the main thread every time I want to perform a Rails.cache.write?

    解决方案

    There are two factors that would contribute to overall latency under normal circumstances: client side marshalling/compression and network bandwidth.

    Dalli mashalls and optionally compresses the data, which could be quite expensive. Here are some benchmarks of Marshalling and compressing a list of random characters (a kind of artificial list of user ids or something like that). In both cases the resulting value is around 200KB. Both benchmarks were run on a Heroku dyno - performance will obviously depend on the CPU and load of the machine:

    irb> val = (1..50000).to_a.map! {rand(255).chr}; nil
    # a list of 50000 single character strings
    irb> Marshal.dump(val).size
    275832
    # OK, so roughly 200K. How long does it take to perform this operation
    # before even starting to talk to MemCachier?
    irb> Benchmark.measure { Marshal.dump(val) }
    =>   0.040000   0.000000   0.040000 (  0.044568)
    # so about 45ms, and this scales roughly linearly with the length of the list.
    
    
    irb> val = (1..100000).to_a; nil # a list of 100000 integers
    irb> Zlib::Deflate.deflate(Marshal.dump(val)).size
    177535
    # OK, so roughly 200K. How long does it take to perform this operation
    irb>  Benchmark.measure { Zlib::Deflate.deflate(Marshal.dump(val)) }
    =>   0.140000   0.000000   0.140000 (  0.145672)
    

    So we're basically seeing anywhere from a 40ms to 150ms performance hit just for Marshaling and/or zipping data. Marshalling a String will be much cheaper, while marshalling something like a complex object will be more expensive. Zipping depends on the size of the data, but also on the redundancy of the data. For example, zipping a 1MB string of all "a" characters takes merely about 10ms.

    Network bandwidth will play some of a role here, but not a very significant one. MemCachier has a 1MB limit on values, which would take approximately 20ms to transfer to/from MemCachier:

    irb(main):036:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
    =>   0.250000  11.620000  11.870000 ( 21.284664)
    

    This amounts to about 400Mbps (1MB * 8MB/Mb * (1000ms/s / 20ms)), which makes sense. However, for even a relatively large, but still smaller value of 200KB, we'd expect a 5x speedup:

    irb(main):039:0> val = "a" * (1024 * 200); val.size
    => 204800
    irb(main):040:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
    =>   0.160000   2.890000   3.050000 (  5.954258)
    

    So, there are several things you might be able to do to get some speedup:

    1. Use a faster marshalling mechanism. For example, using Array#pack("L*") to encode a list of 50,000 32-bit unsigned integers (like in the very first benchmark) into a string of length 200,000 (4 bytes for each integer), takes only 2ms rather than 40ms. Using compression with the same marshalling scheme, to get a similar sized value is also very fast (about 2ms as well), but the compression doesn't do anything useful on random data anymore (Ruby's Marshal produces a fairly redundant String even on a list of random integers).

    2. Use smaller values. This would probably require deep application changes, but if you don't really need the whole list, you should be setting it. For example, the memcache protocol has append and prepend operations. If you are only ever adding new things to a long list, you could use those operations instead.

    Finally, as suggested, removing the set/gets from the critical path would prevent any delays from affecting HTTP request latency. You still have to get the data to the worker, so it's important that if you're using something like a work queue, the message you send to the worker should only contain instructions on which data to construct rather than the data itself (or you're in the same hole again, just with a different system). A very lightweight (in terms of coding effort) would be to simply fork a process:

    mylist = Student.where(...).all.map!(&:id)
    ...I need to update memcache with the new list of students...
    fork do
      # Have to create a new Dalli client
      client = Dalli::Client.new
      client.set("mylistkey", mylist)
      # this will block for the same time as before, but is running in a separate process
    end
    

    I haven't benchmarked a full example, but since you're not execing, and Linux fork is copy-on-write, the overhead of the fork call itself should be minimal. On my machine, it's about 500us (that's micro-seconds not milliseconds).

    这篇关于通过异步设置键值对改善Rails.cache.write的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆