为什么pykafka中的生产者这么慢? [英] Why is producer in pykafka so slow?
问题描述
我用pykafka写了一个简单的生产者,但似乎无法使其执行.基本的生产者和生产请求如下.当我用一条小消息调用此方法100次,并添加一些计时/配置代码时,大约需要14秒钟.我知道这是异步发送消息,因此我希望它的运行速度非常快.我缺少某些设置吗?我也尝试过使用min_queued_messages = 1进行尝试,而这花费了大约2秒的时间.
I wrote a simple producer using pykafka but can't seem to get it to perform. The basic producer and call to produce is below. When I call this 100 times with a small message, and add some timing/profiling code, it takes about 14 seconds. I understand this to be an asynchronous sending of messages so I would expect it to be incredibly fast. Is there some setting I'm missing? I've also tried it with min_queued_messages=1 and those takes about 2 seconds longer.
from pykafka import KafkaClient
import time
client = KafkaClient(hosts="kafka1.mydomain.com:9092", exclude_internal_topics=False)
topic = client.topics['mytopic']
start = time.time()
for x in xrange(100):
with topic.get_producer(delivery_reports=False,
sync=True,
linger_ms=0) as producer:
producer.produce("This is a message")
end = time.time()
print "Execution Time (ms): %s" % round((end - start) * 1000)
我确实在pycharm中对此进行了描述,并且说帽子中有78.8%的时间都花在了"time.sleep"上?为什么要睡觉?
I did do a profile of this within pycharm and is says hat 78.8% of the time is spent on "time.sleep"?! Why would it be sleeping?
推荐答案
topic.get_producer
调用旨在在生产者的生命周期开始时被调用一次.如您的示例代码那样在紧密循环中调用它会导致初始化序列重复运行,这是不必要的,并且会增加很多开销.如果将代码更改为以下内容,则可以更快地工作:
The topic.get_producer
call is meant to be called once at the beginning of the producer's lifespan. Calling it in a tight loop as your example code does will cause the initialization sequence to be run repeatedly, which is unnecessary and will add a lot of overhead. Your code would work faster if it were changed to the following:
with topic.get_producer(delivery_reports=False,
sync=True,
linger_ms=0) as producer:
for x in xrange(100):
producer.produce("This is a message")
这篇关于为什么pykafka中的生产者这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!