插入mongodb(pymongo)时的效率 [英] Efficiency when inserting into mongodb (pymongo)
问题描述
为了清楚起见更新:在插入/附加到上限集合
时,我需要提供性能建议。我有两个python脚本运行:
(1)拖动游标。
while WSHandler.cursor.alive:
try:
doc = WSHandler.cursor.next
self.render(doc)
(2)插入如下:
def on_data(self,data):#Tweepy
if(len(data)> 5):
data = json.loads(data)
coll.insert(data)#insert into mongodb
#print(coll.count())
#print(data)
它运行良好一段时间(50次插入/秒)。然后,20-60秒后,它绊倒,击中cpu屋顶(虽然它运行在20%之前),从来没有恢复。我的mongostats进行潜水(潜水如下所示)。
Mongostat输出:
CPU现在被执行插入的进程阻塞了(至少根据 )
print(data)运行上面的Tweepy行时
< >而不是将它添加到db( coll.insert(data)
),一切都运行在15%cpu罚款。
我在mongostats中看到的内容:
-
res
保持攀登。 (虽然堵塞可能发生在40米,以及在100米跑得很好。)
-
冲洗
似乎不会干扰。 >
-
锁定%
稳定在0.1%。
(我正在运行AWS microinstance; pymongo。)
解决方案我建议在运行测试时使用mongostat。有很多事情可能是错误的,但mongostat会给你一个很好的迹象。
http://docs.mongodb .org / manual / reference / mongostat /
我要查看的前两项是锁定百分比和数据吞吐量。在专用机器上的合理吞吐量下,我通常在遭受任何降级之前每秒进入1000-2000个更新/插入。这是我已经合作过的几个大型生产部署的情况。
Updated for clarity: I need advice for performance when inserting/appending to a capped collection
. I have two python scripts running:
(1) Tailing the cursor.
while WSHandler.cursor.alive:
try:
doc = WSHandler.cursor.next()
self.render(doc)
(2) Inserting like so:
def on_data(self, data): #Tweepy
if (len(data) > 5):
data = json.loads(data)
coll.insert(data) #insert into mongodb
#print(coll.count())
#print(data)
and it's running fine for a while (at 50 inserts/second). Then, after 20-60secs, it stumbles, hits the cpu roof (though it was running at 20% before), and never recovers. My mongostats take a dive (the dive is shown below).
Mongostat output:
The CPU is now choked, by the processes doing the insertion (at least according to htop
).
When I run the Tweepy lines above with print(data)
instead of adding it to db (coll.insert(data)
), everything's running along fine at 15% cpu use.
What I see in mongostats:
res
keeps climbing. (Though clogs may happen at 40m as well as run fine on 100m.)
flushes
do not seem to interfere.
locked %
is stable at 0.1%. Would this lead to clogging eventually?
(I'm running AWS microinstance; pymongo.)
解决方案 I would suggest using mongostat while running your tests. There are many things that could be wrong but mongostat will give you a good indication.
http://docs.mongodb.org/manual/reference/mongostat/
The first two things I would look at are the lock percentage and the data throughput. With reasonable throughput on dedicated machines I typically get into the 1000-2000 updates/inserts per second before suffering any degradation. This has been the case for several large production deployments I have worked with.
这篇关于插入mongodb(pymongo)时的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!