Mongodb慢更新循环 [英] Mongodb slow update loop

查看:377
本文介绍了Mongodb慢更新循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚熟悉Mongodb,这就是为什么我做了一些愚蠢的事情.我的数据集的每个条目都包含一个时间戳记(它们是Tweets).插入之前,我没有将时间戳从字符串转换为实际的日期格式,而是将其简单地作为字符串插入了.

I am just getting familiar with Mongodb, which is why I did something stupid. Each of my dataset's entries include a timestamp (they're Tweets). Instead of converting the timestamp from a string to an actual date format before inserting, I inserted it simply as a string.

现在,我的数据集变得越来越大(3+百万条推文),我想开始对条目进行排序/排序.由于我的时间戳仍然是字符串("Wed Apr 29 09:52:22 +0000 2015"),因此我想将其转换为日期格式.

Now, my dataset is becoming huge (3+ million Tweets), and I want to begin sorting/ranging my entries. Since my timestamp is still a string ("Wed Apr 29 09:52:22 +0000 2015"), I want to convert this to a date format.

我在此答案中找到了以下代码: 如何转换属性在MongoDB中从文本到日期类型?

I found the following code in this answer: How do I convert a property in MongoDB from text to date type?

> var cursor = db.ClockTime.find()
> while (cursor.hasNext()) {
... var doc = cursor.next();
... db.ClockTime.update({_id : doc._id}, {$set : {ClockInTime : new Date(doc.ClockInTime)}})
... }

效果很好.但是,它非常慢.根据MongoHub应用程序,它每秒仅处理4个查询.使用3+百万条推文的数据集,大约需要8.6天的时间进行转换.我真的希望有一种方法可以加快速度,因为我的截止日期是8天:P

And it works great. However, it is incredibly slow. According to the MongoHub app, it only processes 4 queries per second. With a dataset of 3+ million tweets, this will take approximately 8.6 days to convert. I really hope there is a way to speed this up, as my deadline is in 8 days :P

有什么想法吗?

推荐答案

默认情况下,更新会阻塞,直到数据库发回确认已成功执行更新的确认为止.当您在本地工作站上使用mongo shell并连接到远程数据库时,这至少需要花ping您数据库的时间.

By default, updates block until the database sent back an acknowledgment that it performed the update successfully. When you are working with a mongo shell on your local workstation and connect to a remote database, this will take at least as long as your ping to the database.

如果允许这样做,则可以SSH进入数据库服务器(副本集的主服务器)并在其中运行脚本.这样可以将网络延迟降低到几乎为零.当您拥有集群时,结果可能仍会有所改善,但并不会带来太多改善,因为您需要登录mongos服务器,该服务器仍需要等待副本集的确认,并将其路由至更新的副本集.

When you are allowed to do so, you could SSH into the database server (primary server for a replica-set) and run the script there. This reduces the network latency to almost zero. When you have a cluster, the result would likely still be an improvement, but not that much, because you need to log into the mongos server which still needs to wait from an acknowledgment from the replica-set(s) it routes your update to.

另一种选择是执行更新而不必担心写操作.然后,程序将立即继续执行,这将大大提高速度.但是请记住,这种方式会忽略所有错误.

Another option is to perform the update with no write-concern. The program execution will then continue immediately which will improve speed drastically. But keep in mind that this way any errors are ignored.

db.ClockTime.update(
    {_id : doc._id}, 
    {$set : {ClockInTime : new Date(doc.ClockInTime)}},
    {writeConcern: {w: 0}}
)

另一个更快的选择是使用 mongoexport 以JSON格式导出整个集合的文件,使用本地脚本将其转换,然后使用

A third option which would be even faster would be to use mongoexport to get a file export of your whole collection in JSON format, convert it with a local script, and then use mongoimport to re-import the converted data. The drawback is that you won't be able to do this without a short downtime between export and import, because any data in between will be lost.

这篇关于Mongodb慢更新循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆