不具有分片的高并发计数器 [英] High-concurrency counters without sharding

查看:103
本文介绍了不具有分片的高并发计数器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题涉及计数器的两个实现,这些计数器可以在没有分片的情况下进行扩展(在某些情况下可能会低估它们的折衷):


  1. http://appengine-cookbook.appspot。 com / recipe / high-concurrency-counters-without-sharding / (评论中的代码) http://blog.notdot.net/2010/04/High-concurrency-counters-without-sharding



我的问题:




  • 关于# 1:在一个延迟的事务性任务中运行 memcache.decr()看起来有点过分。如果 memcache.decr()是在事务之外完成的,我认为最糟糕的情况是事务失败,我们错过了计算减少的任何事情。 我是否忽视了这样做可能导致的其他问题?
  • li>


以下是我所看到的折衷:​​




  • 2不需要数据存储事务。


  • 要获得计数器的值,#2需要数据存储获取,而#1 通常仅需要执行 memcache.get() memcache.add()

  • 递增计数器时,都会调用 memcache.incr()。定期地,#2将任务添加到任务队列,而#1事务性地执行数据存储的获取和放置。 #1还总是执行 memcache.add()(以测试是否需要将计数器保存到数据存储区)。


    结论



    (没有实际运行任何性能测试):


    • 1在检索计数器(#1 memcache vs#2数据存储)时通常应该更快。虽然#1也必须额外执行 memcache.add()


    • 然而,#2应该更快当更新计数器时(#1 datastore get + put vs#2入列一项任务)。
    • 另一方面,对于#1,你必须更仔细一点,任务队列配额几乎比数据存储区或memcahce API小100倍。
    • Going到数据存储可能比通过memcache更昂贵。其他的memcache在第一个地方不会有用: - )



      我推荐第一个选项。


      $ b $如果你有合理的请求率,你可以实现它甚至更简单:

        1)更新值memcache 
      2)如果返回的更新值可以被N
      平均分割2.1)将N添加到数据存储区计数器
      2.2)将memcache减少N

      假设您可以在连续事件之间设置足够长的内存缓存超时,但如果事件非常稀少以至于您的内存缓存超时,那么机会你不需要一个高并发计数器: - )



      对于较大的网站,依靠单个内存缓存来完成统计页面点击次数等操作可能会让你处于麻烦;在这种情况下,你真的想要分割你的memcache,并更新一个随机计数器实例;计数器的聚合将通过数据库更新来实现。



      当使用memcache时,请注意,某些客户端API会假定一秒超时意味着该值不是那里。如果到达memcache实例的TCP SYN数据包被丢弃,这意味着您的请求将错误地假定数据不在那里。 (对于memcache,UDP可能会发生类似的问题)


      This question concerns two implementations of counters which are intended to scale without sharding (with a tradeoff that they might under-count in some situations):

      1. http://appengine-cookbook.appspot.com/recipe/high-concurrency-counters-without-sharding/ (the code in the comments)
      2. http://blog.notdot.net/2010/04/High-concurrency-counters-without-sharding

      My questions:

      • With respect to #1: Running memcache.decr() in a deferred, transactional task seems like overkill. If memcache.decr() is done outside the transaction, I think the worst-case is the transaction fails and we miss counting whatever we decremented. Am I overlooking some other problem that could occur by doing this?
      • What are the significiant tradeoffs between the two implementations?

      Here are the tradeoffs I see:

      • 2 does not require datastore transactions.

      • To get the counter's value, #2 requires a datastore fetch while with #1 typically only needs to do a memcache.get() and memcache.add().
      • When incrementing a counter, both call memcache.incr(). Periodically, #2 adds a task to the task queue while #1 transactionally performs a datastore get and put. #1 also always performs memcache.add() (to test whether it is time to persist the counter to the datastore).

      Conclusions

      (without actually running any performance tests):

      • 1 should typically be faster at retrieving a counter (#1 memcache vs #2 datastore). Though #1 has to perform an extra memcache.add() too.

      • However, #2 should be faster when updating counters (#1 datastore get+put vs #2 enqueue a task).
      • On the other hand, with #1 you have to be a bit more careful with the update interval since the task queue quota is almost 100x smaller than either the datastore or memcahce APIs.

      解决方案

      Going to datastore is likely to be more expensive than going through memcache. Else memcache wouldn't be all that useful in the first place :-)

      I'd recommend the first option.

      If you have a reasonable request rate, you can actually implement it even simpler:

      1) update the value in memcache
      2) if the returned updated value is evenly divisible by N
      2.1) add N to the datastore counter
      2.2) decrement memcache by N
      

      This assumes you can set a long enough timeout on your memcache to live between successive events, but if events are so sparse that your memcache times out, chances are you wouldn't need a "high concurrency" counter :-)

      For larger sites, relying on a single memcache to do things like count total page hits may get you in trouble; in that case, you really do want to shard your memcaches, and update a random counter instance; the aggregation of counters will happen by the database update.

      When using memcache, though, beware that some client APIs will assume that a one second timeout means the value isn't there. If the TCP SYN packet to the memcache instance gets dropped, this means that your request will erroneously assume the data isn't there. (Similar problems can happen with UDP for memcache)

      这篇关于不具有分片的高并发计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆