如何处理AllServersUnavailable异常 [英] How to handle AllServersUnavailable Exception

查看:204
本文介绍了如何处理AllServersUnavailable异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对单个节点上的Cassandra实例(v1.1.10)执行一个简单的写操作。我只想看看它如何处理常量写入,如果它能跟上写入速度。

I wanted to do a simple write operation to a Cassandra instance (v1.1.10) on a single node. I just wanted to see how it handles constant writes and if it can keep up with the write speed.

pool = ConnectionPool('testdb')
test_cf = ColumnFamily(pool,'test')
test2_cf = ColumnFamily(pool,'test2')
test3_cf = ColumnFamily(pool,'test3')
test_batch = test_cf.batch(queue_size=1000)
test2_batch = test2_cf.batch(queue_size=1000)
test3_batch = test3_cf.batch(queue_size=1000)

chars=string.ascii_uppercase
counter = 0
while True:
    counter += 1
    uid = uuid.uuid1()
    junk = ''.join(random.choice(chars) for x in range(50))
    test_batch.insert(uid, {'junk':junk})
    test2_batch.insert(uid, {'junk':junk})
    test3_batch.insert(uid, {'junk':junk})
    sys.stdout.write(str(counter)+'\n')

pool.dispose()

代码在长时间写入后(当计数器大约为10M +时)保持压缩,并显示以下消息

The code keeps crushing after a long write (when the counter is around 10M+) with the following message

pycassa.pool.AllServersUnavailable:尝试连接到每个服务器两次,但没有尝试成功。最后一个失败是超时:超时

我设置 queue_size = 100 没有帮助。此外,我启动了 cqlsh -3 控制台截断脚本崩溃后的表,并得到以下错误:

I set the queue_size=100 which didn't help. Also I fired up the cqlsh -3 console to truncate the table after the script crashed and got the following error:

无法完成请求:一个或多个节点不可用。

尾部 / var /log/cassandra/system.log 在Compaction,FlushWriter等上没有给出错误符号,但是INFO。我做错了什么?

Tailing /var/log/cassandra/system.log gives no error sign but INFO on Compaction, FlushWriter and so on. What am I doing wrong?

推荐答案

我也有这个问题 - @ @ tyler-hobbs建议在他的评论中,节点可能重载为了我)。我使用的一个简单的修复是回退并让节点赶上。我已经重写了你的循环上面捕捉错误,睡一段时间,然后再试一次。我对单个节点集群运行它,它工作一个待遇 - 暂停(一分钟)和定期回退(一行不超过5次)。使用此脚本不会丢失任何数据,除非错误在一行中抛出五次(在这种情况下,您可能想要失败而不是返回到循环)。

I've had this problem too - as @tyler-hobbs suggested in his comment the node is likely overloaded (it was for me). A simple fix that I've used is to back-off and let the node catch up. I've rewritten your loop above to catch the error, sleep a while and try again. I've run this against a single node cluster and it works a treat - pausing (for a minute) and backing off periodically (no more than 5 times in a row). No data is missed using this script unless the error throws five times in a row (in which case you probably want to fail hard rather than return to the loop).

while True:
  counter += 1
  uid = uuid.uuid1()
  junk = ''.join(random.choice(chars) for x in range(50))
  tryCount = 5 # 5 is probably unnecessarily high
  while tryCount > 0:
    try:
      test_batch.insert(uid, {'junk':junk})
      test2_batch.insert(uid, {'junk':junk})
      test3_batch.insert(uid, {'junk':junk})
      tryCount = -1
    except pycassa.pool.AllServersUnavailable as e:
      print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down"
      time.sleep(60) # A delay of 60s is probably unnecessarily high
      tryCount = tryCount - 1
  sys.stdout.write(str(counter)+'\n')

我已在此添加完整的提示

这篇关于如何处理AllServersUnavailable异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆