Cassandra挂在任意命令上 [英] Cassandra hangs on arbitrary commands

查看:306
本文介绍了Cassandra挂在任意命令上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在AWS上托管Cassandra 2.0.2群集。我们最近开始从正常升级到SSD驱动器,通过引导新的和退役的旧节点。它相当好,除了两个节点永远停止退役。现在,在新的6个节点运行后,我们注意到我们的一些旧工具,使用phpcassa停止工作。没有什么改变与安全组,所有端口TCP / UDP是打开,telnet可以通过9160连接,cqlsh可以'使用一个集群,选择数据,然而'描述集群'失败,在cli,'显示键空间'也失败 - 并且失败,我的意思是从不退出提示,也不返回任何结果。查询从新节点完美地工作,但是即使等待被停用的旧节点也不能执行它们。生产系统,也使用phpcassa,做正常的数据请求 - 它工作正常。



所有cassandras有相同的配置,相同的版本, 。



版本:



已连接到### ####。compute-1.amazonaws.com:9160。
[cqlsh 4.1.0 | Cassandra 2.0.2 | CQL规范3.1.1 | Thrift协议19.38.0]



我用完了想法。任何提示都将非常感激。



更新:



经过一点随机调查,这里有一点详细说明。



如果我cassandra-cli到任何机器,并执行显示键空间,它的工作原理。





如果我将cqlsh连接到一个远程的cassandra,并且做一个描述keyspaces,它会被一个远程机器,并执行显示键空间挂。 ctrl + c,重复相同的查询,它立即响应。



如果我cqlsh到本地cassandra,并做一个描述keyspaces, b
$ b

如果我cqlsh到本地cassandra,并做一个select *从Keyspace限制x,它将返回数据达到一定的限制。我可以返回数据限制760,761将失败。



如果我做一致性所有,并选择相同,它挂起。



如果我做一个跟踪,不同的机器返回数据,但有时source_elapsed是null



不要忘记,应用程序查询集群有时会得到结果,经过几次尝试。



更新2



进一步播放导致两个节点失败引导失败,一个挂起4天,最终失败,可能是由于滚动重启,和其他平原在2天后失败。修复不会运行,并引入流失败错误,以及线程线程中的异常[StorageServiceShutdownHook,5,main] java.lang.NullPointerException。此外,执行修复后,开始获取读取无效帧大小为0.您是否在客户端使用tframedtransport?,因此...



解决方案



将rpc_server_type从hsha切换到同步。所有的问题都消失了。



如果有人也在这里绊脚:
http://planetcassandra.org/blog/post/hsha-thrift-server-corruption-cassandra-2- 0-2-5 /

解决方案

cassandra.yaml



hsha 切换到 rpc_server_type code> sync 。


We're hosting Cassandra 2.0.2 cluster on AWS. We've recently started upgrading from normal to SSD drives, by bootstrapping new and decommissioning old nodes. It went fairly well, aside from two nodes hanging forever on decommission. Now, after the new 6 nodes are operational, we noticed that some of our old tools, using phpcassa stopped working. Nothing has changed with security groups, all ports TCP/UDP are open, telnet can connect via 9160, cqlsh can 'use' a cluster, select data, however, 'describe cluster' fails, in cli, 'show keyspaces' also fails - and by fail, I mean never exits back to prompt, nor returns any results. The queries work perfectly from the new nodes, but even the old nodes waiting to be decommissioned cannot perform them. The production system, also using phpcassa, does normal data requests - it works fine.

All cassandras have the same config, the same versions, the same package they were installed from. All nodes were recently restarted, due to seed node change.

Versions:

Connected to ### at ####.compute-1.amazonaws.com:9160. [cqlsh 4.1.0 | Cassandra 2.0.2 | CQL spec 3.1.1 | Thrift protocol 19.38.0]

I've run out out of ideas. Any hints would be greatly appreciated.

Update:

After a bit of random investigating, here's a bit more detailed description.

If I cassandra-cli to any machine, and do "show keyspaces", it works.

If I cassandra-cli to a remote machine, and do "show keyspaces", it hangs indefinitely.

If I cqlsh to a remote cassandra, and do a describe keyspaces, it hangs. ctrl+c, repeat the same query, it instantly responds.

If I cqlsh to a local cassandra, and do a describe keyspaces, it works.

If I cqlsh to a local cassandra, and do a select * from Keyspace limit x, it will return data up to a certain limit. I was able to return data with limit 760, the 761 would fail.

If I do a consistency all, and select the same, it hangs.

If I do a trace, different machines return the data, though sometimes source_elapsed is "null"

Not to forget, applications querying the cluster sometimes do get results, after several attempts.

Update 2

Further playing introduced failed bootstrapping of two nodes, one hanging on bootstrap for 4 days, and eventually failing, possibly due to a rolling restart, and the other plain failing after 2 days. Repairs wouldn't function, and introduced "Stream failed" errors, as well as "Exception in thread Thread[StorageServiceShutdownHook,5,main] java.lang.NullPointerException". Also, after executing repair, started getting "Read an invalid frame size of 0. Are you using tframedtransport on the client side?", so..

Solution

Switch rpc_server_type from hsha to sync. All problems gone. We worked with hsha for months without issues.

If someone also stumbles here: http://planetcassandra.org/blog/post/hsha-thrift-server-corruption-cassandra-2-0-2-5/

解决方案

In cassandra.yaml:

Switch rpc_server_type from hsha to sync.

这篇关于Cassandra挂在任意命令上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆