多次成功请求后,Thrift TSimpleServer 变得无响应 [英] Thrift TSimpleServer becomes unresponsive after several successful requests

查看:37
本文介绍了多次成功请求后,Thrift TSimpleServer 变得无响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从 Linux 上运行的 Java 应用程序提供的 Thrift API.我正在使用 .NET 客户端连接到 API 并执行操作.

I have a Thrift API served from a Java application running on Linux. I'm using a .NET client to connect to the API and execute operations.

对服务的前几次调用工作正常,没有错误,但随后(似乎是随机的)调用将挂起".如果我强制退出我的客户端并尝试重新连接,服务要么再次挂起,要么我的客户端出现以下错误:

The first few calls to the service work fine without errors, but then (seemingly at random) a call will "hang." If I force-quit my client and try to reconnect, the service either hangs again, or my client has the following error:

Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at Thrift.Transport.TStreamTransport.Read(Byte[] buf, Int32 off, Int32 len) 
   (etc.)

当我使用 JConsole 获取线程转储时,服务器处于 accept()

When I use JConsole to get a thread dump, the server is on accept()

"Thread-1" prio=10 tid=0x00002aaad457a800 nid=0x79c7 runnable [0x00000000434af000]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
    - locked <0x00000005c0fef470> (a java.net.SocksSocketImpl)
    at java.net.ServerSocket.implAccept(ServerSocket.java:462)
    at java.net.ServerSocket.accept(ServerSocket.java:430)
    at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113)
    at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
    at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
    at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:63)

服务器上的

netstat 显示与服务端口的连接,这些连接位于 TIME_WAIT 上,在我强制退出客户端几分钟后最终消失(正如预期的那样).

netstat on the sever shows connections to the service port that are on TIME_WAIT which eventually disappear several minutes after I force-quit the client (as would be expected).

设置Thrift服务的代码如下:

The code that sets up the Thrift service is as follows:

        int port = thriftServicePort;
        String host = thriftServiceHost;
        InetAddress adr = InetAddress.getByName(host);
        InetSocketAddress address = new InetSocketAddress(adr, port);
        TServerTransport serverTransport = new TServerSocket(address);
        TServer server = new TSimpleServer(new TServer.Args(serverTransport).processor((org.apache.thrift.TProcessor)processor));

        server.serve();

请注意,我们正在使用采用显式主机名或 IP 地址的 TServerTransport 构造函数.我怀疑我应该更改它以采用仅指定端口的构造函数(最终绑定到 InetAddress.anyLocalAddress()).或者,我想我可以将服务配置为绑定到通配符"地址(0.0.0.0").

Note that we're using the TServerTransport constructor that takes an explicit hostname or IP address. I suspect that I should change it to take the constructor that only specifies a port (ultimately binding to InetAddress.anyLocalAddress()). Alternatively, I suppose I could configure the service to bind to the "wildcard" address ("0.0.0.0").

我应该提一下,该服务未托管在开放的 Internet 上.它托管在专用网络中,我使用 SSH 隧道来访问它.因此,服务绑定到的主机名在我的本地网络中无法解析(尽管我可以通过隧道建立初始连接).我想知道这是否类似于 RMI TCP 回调问题?

I should mention that the service is not hosted on the open Internet. It is hosted in a private network and I am using SSH tunneling to reach it. Hence, the hostname that the service is bound to does not resolve in my local network (although I can make the initial connection via tunneling). I wonder if this is something similar to the RMI TCP callback problem?

是否有技术解释(如果这是一个常见问题)或我可以采取的其他故障排除步骤?

Is there a technical explanation for what's going on (if this is a common issue), or additional troublehshooting steps that I can take?

更新

今天遇到了同样的问题,但这次 jstack 显示 Thrift 服务器永远阻止从输入流读取:

Had the same problem today, but this time jstack shows that the Thrift server is blocking forever reading from the input stream:

"Thread-1" prio=10 tid=0x00002aaad43fc000 nid=0x60b3 runnable [0x0000000041741000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
        at org.apache.thrift.server.TSimpleServer.serve(TSimpleServer.java:70)

所以我们需要在 TServerSocket 构造函数中设置一个客户端超时".但是为什么这会导致应用程序在 accept() 上阻塞时也拒绝连接?

So we need to set a "client timeout" in the TServerSocket constructor. But why would that cause the application to also refuse connections when blocking on accept()?

推荐答案

从您的堆栈跟踪来看,您似乎正在使用 TSimpleServer,其 javadocs 说

From your stack trace it seems you are using TSimpleServer, whose javadocs say,

用于测试的简单单线程服务器.

Simple singlethreaded server for testing.

大概你想用的是TThreadPoolServer.

最有可能发生的情况是 TSimpleServer 的单线程被阻塞,等待死客户端响应或超时.并且因为 TSimpleServer 是单线程的,所以没有线程可用于处理其他请求.

Most likely what is happening is the single thread of TSimpleServer is blocked waiting for the dead client to respond or timeout. And because the TSimpleServer is single threaded, no thread is available to process other requests.

这篇关于多次成功请求后,Thrift TSimpleServer 变得无响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆