GRPC:在Java/Scala中制作高吞吐量客户端 [英] GRPC: make high-throughput client in Java/Scala
问题描述
我有一项可以以很高的速率传输邮件的服务.
I have a service that transfers messages at a quite high rate.
当前,它由akka-tcp服务,每分钟发送350万条消息.我决定尝试一下grpc. 不幸的是,它导致吞吐量大大降低:每分钟约500k消息,甚至更低.
Currently it is served by akka-tcp and it makes 3.5M messages per minute. I decided to give grpc a try. Unfortunately it resulted in much smaller throughput: ~500k messages per minute an even less.
能否请您介绍如何对其进行优化?
Could you please recommend how to optimize it?
我的设置
硬件:32核,24Gb堆.
Hardware: 32 cores, 24Gb heap.
grpc版本:1.25.0
消息格式和终结点
消息基本上是一个二进制blob. 客户端将100K-1M和更多的消息流传输到同一请求中(异步),服务器没有任何响应,客户端使用无操作观察者
Message is basically a binary blob. Client streams 100K - 1M and more messages into the same request (asynchronously), server doesn't respond with anything, client uses a no-op observer
service MyService {
rpc send (stream MyMessage) returns (stream DummyResponse);
}
message MyMessage {
int64 someField = 1;
bytes payload = 2; //not huge
}
message DummyResponse {
}
问题:
与akka实施相比,邮件速率较低.
我观察到CPU使用率较低,因此我怀疑grpc调用实际上在内部阻塞,尽管它另有说明.确实,调用onNext()
不会立即返回,但是表上还存在GC.
Problems:
Message rate is low compared to akka implementation.
I observe low CPU usage so I suspect that grpc call is actually blocking internally despite it says otherwise. Calling onNext()
indeed doesn't return immediately but there is also GC on the table.
我试图产生更多的发件人来缓解此问题,但并没有太大的改进.
I tried to spawn more senders to mitigate this issue but didn't get much of improvement.
我的发现 Grpc序列化时实际上为每个消息分配8KB字节的缓冲区.请参阅堆栈跟踪:
My findings Grpc actually allocates a 8KB byte buffer on each message when serializes it. See the stacktrace:
java.lang.Thread.State:已阻止(在对象监视器上) 在com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58) 在com.google.common.io.ByteStreams.copy(ByteStreams.java:105) 在io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274) 在io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230) 在io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168) 在io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141) 在io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53) 在io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) 在io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252) 在io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) 在io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) 在io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) 在io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) 在io.grpc.stub.ClientCalls $ CallToStreamObserverAdapter.onNext(ClientCalls.java:346)
java.lang.Thread.State: BLOCKED (on object monitor) at com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58) at com.google.common.io.ByteStreams.copy(ByteStreams.java:105) at io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274) at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230) at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168) at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141) at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53) at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37) at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252) at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:346)
在构建高通量grpc客户方面的最佳实践方面的任何帮助都受到赞赏.
Any help with best practices on building high-throughput grpc clients appreciated.
推荐答案
我通过在每个目标位置创建多个ManagedChannel
实例来解决了该问题.尽管有文章说ManagedChannel
本身可以产生足够的连接,所以一个实例就足够了,在我的情况下不是这样.
I solved the issue by creating several ManagedChannel
instances per destination. Despite articles say that a ManagedChannel
can spawn enough connections itself so one instance is enough it's wasn't true in my case.
性能与akka-tcp实现相当.
Performance is in parity with akka-tcp implementation.
这篇关于GRPC:在Java/Scala中制作高吞吐量客户端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!