猫鼬找到一个并推送到文档数组 [英] Mongoose find one and push to array of documents

查看:22
本文介绍了猫鼬找到一个并推送到文档数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 MongoDB 和 Mongoose 的新手,我正在尝试使用它来保存股票报价以进行日间交易分析.所以我想象了这个架构:

I'm new to MongoDB and Mongoose and I'm trying to use it to save stock ticks for daytrading analysis. So I imagined this Schema:

symbolSchema = Schema({
    name:String,
    code:String
});

quoteSchema = Schema({
    date:{type:Date, default: now},
    open:Number, 
    high:Number,
    low:Number,
    close:Number,
    volume:Number
});

intradayQuotesSchema = Schema({
    id_symbol:{type:Schema.Types.ObjectId, ref:"symbol"},
    day:Date,
    quotes:[quotesSchema]
});

通过我的链接,我每分钟都会收到这样的信息:

From my link I receive information like this every minute:

日期 |符号 |打开 |高|低 |关闭 |音量

2015-03-09 13:23:00|AAPL|127,14|127,17|127,12|127,15|19734

2015-03-09 13:23:00|AAPL|127,14|127,17|127,12|127,15|19734

我必须:

  1. 找到符号的 ObjectId (AAPL).
  2. 发现这个交易品种的intradayQuote 文档是否已经存在(交易品种和日期的组合)
  3. 发现报价数组中是否存在此交易品种的分钟 OHLCV 数据(因为它可以重复)
  4. 更新或创建文档并更新或创建数组内的引号

如果引号已经存在,我可以在不使用veryfing的情况下完成此任务,但是此方法可以在引号数组中创建重复条目:

I'm able to accomplish this task without veryfing if the quotes already exists, but this method can creates repeated entries inside quotes array:

symbol.find({"code":mySymbol}, function(err, stock) {
    intradayQuote.findOneAndUpdate({
        { id_symbol:stock[0]._id, day: myDay },
        { $push: { quotes: myQuotes } },
        { upsert: true },
        myCallback
    });
});

我已经试过了:

  • $addToSet 而不是 $push,但不幸的是,这似乎不适用于文档数组
  • { id_symbol:stock[0]._id, day: myDay, 'quotes["date"]': myDate }findOneAndUpdate 的条件下;但不幸的是,如果 mongo 没有找到它,它会在一分钟内创建一个新文档,而不是附加到引号数组中.
  • $addToSet instead of $push, but unfortunatelly this doesn't seems to work with array of documents
  • { id_symbol:stock[0]._id, day: myDay, 'quotes["date"]': myDate } on the conditions of findOneAndUpdate; but unfortunatelly if mongo doesn't find it, it creates a new document for the minute instead of appending to the quotes array.

有没有办法在不使用更多查询的情况下完成这项工作(我已经在使用 2 个)?我应该重新考虑我的 Schema 以促进这项工作吗?任何帮助将不胜感激.谢谢!

Is there a way to get this working without using one more query (I'm already using 2)? Should I rethink my Schema to facilitate this job? Any help will be appreciated. Thanks!

推荐答案

基本上放一个 $addToSet 运算符无法为您工作,因为您的数据不是真正的 "set" 根据定义是完全不同"对象的集合.

Basically put an $addToSet operator cannot work for you because your data is not a true "set" by definition being a collection of "completely distinct" objects.

这里的另一个逻辑意义是,您将在数据到达时对其进行处理,无论是作为单个对象还是作为提要.我会假设它以某种形式提供了许多项目,并且您可以使用某种流处理器来为每个收到的文档得出这种结构:

The other piece of logical sense here is that you would be working on the data as it arrives, either as a sinlge object or a feed. I'll presume its a feed of many items in some form and that you can use some sort of stream processor to arrive at this structure per document received:

{
    "date": new Date("2015-03-09 13:23:00.000Z"),
    "symbol": "AAPL",
    "open": 127.14
    "high": 127.17,
    "low": 127.12 
    "close": 127.15,
    "volume": 19734
}

转换为标准十进制格式以及 UTC 日期,因为一旦从数据存储中检索数据,任何区域设置确实应该是您的应用程序的域.

Converting to a standard decimal format as well as a UTC date since any locale settings really should be the domain of your application once data is retrieved from the datastore of course.

我还至少会通过删除对其他集合的引用并将数据放在那里来使您的intraDayQuoteSchema"变平一点.您仍然需要在插入时进行查找,但读取时额外填充的开销似乎比存储开销更高:

I would also at least flatten out your "intraDayQuoteSchema" a little by removing the reference to the other collection and just putting the data in there. You would still need a lookup on insertion, but the overhead of the additional populate on read would seem to be more costly than the storage overhead:

intradayQuotesSchema = Schema({
    symbol:{
        name: String,
        code: String
    },
    day:Date,
    quotes:[quotesSchema]
});

这取决于您的使用模式,但这种方式可能更有效.

It depends on you usage patterns, but it's likely to be more effective that way.

剩下的真的归结为可以接受的

The rest really comes down to what is acceptable to

stream.on(function(data) {

    var symbol = data.symbol,
        myDay = new Date( 
            data.date.valueOf() - 
                ( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
    delete data.symbol;

    symbol.findOne({ "code": symbol },function(err,stock) {

        intraDayQuote.findOneAndUpdate(
            { "symbol.code": symbol , "day": myDay },
            { "$setOnInsert": { 
               "symbol.name": stock.name
               "quotes": [data] 
            }},
            { "upsert": true }
            function(err,doc) {
                intraDayQuote.findOneAndUpdate(
                    {
                        "symbol.code": symbol,
                        "day": myDay,
                        "quotes.date": data.date
                    },
                    { "$set": { "quotes.$": data } },
                    function(err,doc) {
                        intraDayQuote.findOneAndUpdate(
                            {
                                "symbol.code": symbol,
                                "day": myDay,
                                "quotes.date": { "$ne": data.date }
                            },
                            { "$push": { "quotes": data } },
                            function(err,doc) {

                            }
                       );    
                    }
                );
            }
        );    
    });
});

如果您实际上不需要响应中的修改文档,那么您可以通过在此处实现批量操作 API 并在单个数据库请求中发送此包中的所有更新而获得一些好处:

If you don't actually need the modified document in the response then you would get some benefit by implementing the Bulk Operations API here and sending all updates in this package within a single database request:

stream.on("data",function(data) {

    var symbol = data.symbol,
        myDay = new Date( 
            data.date.valueOf() - 
                ( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
    delete data.symbol;

     symbol.findOne({ "code": symbol },function(err,stock) {
         var bulk = intraDayQuote.collection.initializeOrderedBulkOp();
         bulk.find({ "symbol.code": symbol , "day": myDay })
             .upsert().updateOne({
                 "$setOnInsert": { 
                     "symbol.name": stock.name
                     "quotes": [data] 
                 }
             });

         bulk.find({
             "symbol.code": symbol,
             "day": myDay,
             "quotes.date": data.date
         }).updateOne({
             "$set": { "quotes.$": data }
         });

         bulk.find({
             "symbol.code": symbol,
             "day": myDay,
             "quotes.date": { "$ne": data.date }
         }).updateOne({
             "$push": { "quotes": data }
         });

         bulk.execute(function(err,result) {
             // maybe do something with the response
         });            
     });
});

关键是那里只有一个语句会实际修改数据,而且由于这些都是在同一个请求中发送的,因此应用程序和服务器之间的来回减少了.

The point is that only one of the statements there will actually modify data, and since this is all sent in the same request there is less back and forth between the application and server.

另一种情况是,在这种情况下,在另一个集合中引用实际数据可能会更简单.然后这只是处理 upserts 的一个简单问题:

The alternate case is that it might just be more simple in this case to have the actual data referenced in another collection. This then just becomes a simple matter of processing upserts:

intradayQuotesSchema = Schema({
    symbol:{
        name: String,
        code: String
    },
    day:Date,
    quotes:[{ type: Schema.Types.ObjectId, ref: "quote" }]
});


// and in the steam processor

stream.on("data",function(data) {

    var symbol = data.symbol,
        myDay = new Date( 
            data.date.valueOf() - 
                ( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
    delete data.symbol;

    symbol.findOne({ "code": symbol },function(err,stock) {
         quote.update(
            { "date": data.date },
            { "$setOnInsert": data },
            { "upsert": true },
            function(err,num,raw) {
                if ( !raw.updatedExisting ) {
                    intraDayQuote.update(
                        { "symbol.code": symbol , "day": myDay },
                        { 
                            "$setOnInsert": {
                                "symbol.name": stock.name
                            },
                            "$addToSet": { "quotes": data }
                        },
                        { "upsert": true },
                        function(err,num,raw) {

                        }
                    );
                }
            }
        );
    });
});

这真的归结为将引号数据嵌套在day"文档中对您来说有多重要.主要区别在于,如果您想根据数据查询这些文档中的一些引用"字段,或者以其他方式忍受使用 .populate() 从其他合集.

It really comes down to how important to you is it to have the data for quotes nested within the "day" document. The main distinction is if you want to query those documents based on the data some of those "quote" fields or otherwise live with the overhead of using .populate() to pull in the "quotes" from the other collection.

当然,如果引用并且引用数据对您的查询过滤很重要,那么您始终可以只查询该集合的 _id 值匹配并使用 $in 查询日"文档只匹配包含那些匹配的引用"文档的日期.

Of course if referenced and the quote data is important to your query filtering, then you can always just query that collection for the _id values that match and use an $in query on the "day" documents to only match days that contain those matched "quote" documents.

这是一个重大决定,根据您的应用程序使用数据的方式,您选择哪条路径最为重要.希望这能指导您了解实现您想要实现的目标背后的一般概念.

It's a big decision where it matters most which path you take based on how your application uses the data. Hopefully this should guide you on the general concepts behind doing what you want to achieve.

PS 除非您确定"源数据始终是四舍五入到精确分钟"的日期,否则您可能还想使用与用于获取离散天"相同类型的日期舍入数学.

P.S Unless you are "sure" that your source data is always a date rounded to an exact "minute" then you probably want to employ the same kind of date rounding math as used to get the discrete "day" as well.

这篇关于猫鼬找到一个并推送到文档数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆