多核服务器上的mongodb map reduce [英] mongodb map reduce on multicore server

查看:31
本文介绍了多核服务器上的mongodb map reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 mongodb,其中包含数千条记录,其中包含很长的向量.我正在使用某种算法寻找输入向量与我的 MDB 数据集之间的相关性.

I have a mongodb with thousands of records holding very long vectors. I am looking for correlations between an input vector with my MDB data set using a certain algorithm.

伪代码:

function find_best_correlation(input_vector)
    max_correlation = 0
    return_vector = []
    foreach reference_vector in dataset:
        if calculateCorrelation(input_vector,reference_vector) > max_correlation then:
            return_vector = reference_vector
    return return_vector

这是 map-reduce 模式的一个非常好的候选者,因为我不关心计算的运行顺序.

This is a very good candidate for map-reduce pattern as I don't care for the order the calculations are run in.

问题是我的数据库在一个节点上.我想同时运行多个映射(我有一台 8 核机器)

The issue is that my database is on one node. I would like to run many mappings simultaneously (I have an 8 core machine)

据我了解,MongoDb 每个节点只使用一个执行线程——实际上我是串行运行我的数据集.这是正确的吗?

From what I understand, MongoDb only uses one thread of execution per node - in practice I am running my data set serially. Is this correct?

如果可以,我可以配置每次 map-reduce 运行的进程/线程数吗?如果我管理并行运行 map-reduce 的多个线程,然后汇总结果,我会显着提高性能(有没有人尝试过)?如果没有 - 我可以在同一个节点上对我的数据库进行多次复制并欺骗"mongoDb 以在 2 次复制上运行吗?

If so can I configure the number of processes/threads per map-reduce run? If I manage multiple threads running map-reduce in parallel and then aggregate the results will I have substantial performance increase (Has anybody tried)? If not - can i have multiple replications of my DB on the same node and "trick" mongoDb to run on 2 replications?

谢谢!

推荐答案

MongoDB中的Map reduce使用Spidermonkey,一个单线程的Javascript引擎,所以无法配置多个进程(也没有技巧").有一个使用多线程 JS 引擎的 JIRA 票证,您可以在此处关注:https://jira.mongodb.org/browse/SERVER-2407

Map reduce in MongoDB uses Spidermonkey, a single-threaded Javascript engine, so it is not possible to configure multiple processes (and there are no "tricks"). There is a JIRA ticket to use a multi-threaded JS engine, which you can follow here: https://jira.mongodb.org/browse/SERVER-2407

如果可能,我会考虑研究新的聚合框架(在 MongoDB 2.2 版中可用),它是用 C++ 而不是 Javascript 编写的,并且可能会提供性能改进:http://docs.mongodb.org/manual/applications/aggregation/

If possible, I would consider looking into the new aggregation framework (available in MongoDB version 2.2), which is written in C++ instead of Javascript and may offer performance improvements: http://docs.mongodb.org/manual/applications/aggregation/

这篇关于多核服务器上的mongodb map reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆