为什么 MongoDB *client* 在这种情况下使用比服务器更多的内存? [英] Why does MongoDB *client* use more memory than the server in this case?
问题描述
我正在评估 MongoDB.我有一个小的 20GB 文档子集.每个本质上都是社交游戏的请求日志,以及用户当时正在玩的游戏的一些捕获状态.
I'm evaluating MongoDB. I have a small 20GB subset of documents. Each is essentially a request log for a social game along with some captured state of the game the user was playing at that moment.
我想我会尝试寻找游戏作弊者.所以我写了一个运行服务器端的函数.它在索引集合上调用 find() 并根据现有索引进行排序.使用游标,它按索引顺序遍历所有文档.索引是 {user_id,time}.因此,我正在查看每个用户的历史记录,检查某些值(金钱/健康/等)的增长速度是否比游戏中的增长速度更快.该脚本返回发现的第一个违规.它不收集违规行为.
I thought I'd try finding game cheaters. So I wrote a function that runs server side. It calls find() on an indexed collection and sorts according to the existing index. Using a cursor it goes through all documents in indexed order. The index is {user_id,time}. So I'm going through each user's history, checking if certain values (money/health/etc) increase faster than is possible in the game. The script returns the first violation found. It does not collect violations.
该脚本在客户端所做的唯一事情是定义函数并在另一个盒子上的 mongod 实例上调用 mymongodb.eval(myscript)
.
The ONLY thing that this script does on the client is define the function and calls mymongodb.eval(myscript)
on a mongod instance on another box.
运行 mongod 的盒子运行良好.启动脚本的那个开始丢失内存和交换.几小时后:客户端机器上使用了 8GB 的 RAM 和 6GB 的交换空间,这只是在另一个机器上启动脚本并等待返回值.
The box that mongod is running on does fine. The one that the script is launched from starts losing memory and swap. Hours later: 8GB of RAM and 6GB of swap are being used on the client machine that did nothing more than launch a script on another box and wait for a return value.
mongo 客户端真的那么古怪吗?我是否做错了什么或对 mongo/mongod 做出了错误的假设?
Is the mongo client really that flakey? Have I done something wrong or made an incorrect assumption about mongo/mongod?
推荐答案
来自 文档:
对于长时间运行的作业,使用 map/reduce 而不是 db.eval().db.eval 阻塞其他操作!
Use map/reduce instead of db.eval() for long running jobs. db.eval blocks other operations!
eval
是一个函数,如果您不使用特殊标志,它会阻塞整个服务器.再次,来自文档:
eval
is a function that blocks the entire server if you don't use a special flag. Again, from the docs:
如果不使用nolock"标志,db.eval() 在运行 [...] 时会阻塞整个 mongod 进程
If you don't use the "nolock" flag, db.eval() blocks the entire mongod process while running [...]
你在这里有点滥用 MongoDB.您当前的例程很奇怪,因为它返回发现的第一个违规行为,但下次运行时必须重新检查所有内容(除非您的用户 ID 已排序并且您存储了最后评估的用户 ID).
You are kind of abusing MongoDB here. Your current routine is strange, because it returns the first violation found, but it will have to re-check everything when run the next time (unless your user ids are ordered and you store the last evaluated user id).
Map/Reduce 通常是长时间运行任务的更好选择,但聚合数据似乎并不简单.然而,基于 map/reduce 的解决方案也可以解决重新评估问题.
Map/Reduce generally is the better option for a long-running task, but aggregating your data does not seem trivial. However, a map/reduce based solution would also solve the re-evaluation problem.
我可能会从 map/reduce 返回这样的东西:
I'd probably return something like this from map/reduce:
user id -> suspicious actions, e.g.
------
2525454 -> [{logId: 235345435, t: ISODate("...")}]
这篇关于为什么 MongoDB *client* 在这种情况下使用比服务器更多的内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!