如何使用新ID来mongoexport和mongoimport集合,但保持关系 [英] How to mongoexport and mongoimport collections with new ids but keeping the relation

查看:95
本文介绍了如何使用新ID来mongoexport和mongoimport集合,但保持关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用mongoexport导出2个集合:

I export 2 collections using mongoexport:

mongoexport -h -u -p --db dev -c parent -q '{_id: ObjectId("5ae1b4b93b131f57b33b8d11")}' -o parent.json

mongoexport -h -u -p --db dev -c children -q '{parentId: ObjectId("5ae1b4b93b131f57b33b8d11")}' -o children.json

从第一个记录中我得到一个记录(由ID选择),从第二个记录中我得到很多记录(由parentId选择,它是第一个记录的ID.

From the first one I got one record (selected by ID) and from the second I got many record (selected by parentId which is ID from the first one.

如何使用mongoimport导入具有新ID的对象,但保持关系-> parent.id = child.parentId

How can I use mongoimport to import those with new IDs but keeping the relation -> parent.id = child.parentId

?

推荐答案

TLDR;跳到部分内容以解决此问题,或者如果想清除误解,请实际阅读.

TLDR; Skip to the parts to solve this, or actually read if you want to clear up the misconception.

让我们澄清一下基本的误解,因为MongoDB 自身" 没有 没有关系的概念 .您所指的关系"仅是记录在一个集合中的属性中的一个值,而引用" 表示存在于不同集合中的另一个值.这些外键" 并没有在MongoDB中强制执行任何类型的约束,该约束定义了数据与相关" ,而它们只是"values" em>.

Let's clear up the basic misconception here in that MongoDB "itself" has no concept of relations, whatsoever. A "relation" as you are referring to it merely a value recorded in the property in one collection which "refers" to another value present in a different collection. Nothing about these "foreign keys" enforces any kind on constraint in MongoDB which defines that the data is "related", and they are just "values".

一旦您接受了这一事实,那么 mongoexport 甚至是基本的查询操作,例如 真的不知道仅仅是因为您将一个值放在打算" 去的某个地方,然后从其他地方获取该数据.

Once you accept that fact then it should be self apparent why any tool such as mongoexport or even basic query operations like find() really have no idea that just because you put a value in somewhere that it is "meant to" go and get that data from somewhere else.

从前有(在较小程度上)一个叫做 DBRef ,它不仅存储值",而且还存储有关该值在参考"术语中的位置的一些详细信息.这可以是集合,也可以是数据库和集合.但是,该概念寿命相对较短,在现代环境中没有用.即使存储了此元数据",数据库本身也没有涵盖与数据的关系"的概念.检索仍然是一个客户端概念",其中某些驱动程序能够查看 DBRef ,然后通过向服务器发出另一个查询以检索相关"数据来解析"它.

Once upon a time there was ( and it sill is to a lesser extent ) a concept called DBRef, which stored not just a "value" but also some detail as to where that value resided in a "referential" term. This could be collection or database and collection. The concept however remains relatively short lived and not useful in modern context. Even with this "metadata" stored, the database itself covered no concept of "relation" to the data. Retrieval still remained a "client concept" where some drivers had the ability to see a DBRef and then "resolve" it by issuing another query to the server to retrieve the "related" data.

自然地,这是充满漏洞",并且为了更现代的概念而放弃了该概念,尤其是 $lookup .

Naturally this is "full of holes" and the concept was abandoned in favor of more modern concepts, notably $lookup.

这实际上归结为,就MongoDB本身而言,关系"实际上是 客户概念" ,在某种意义上,外部实体"实际上是通过您定义为相等引用"的数据来确定"this"与"that"相关的.

What this all really comes down to is that as far as MongoDB itself is concerned a "relation" is actually "client concept", in the sense that an "external entity" actually makes the decision that "this" is related to "that" via what data you define as "equal references".

没有强制执行此操作的数据库规则",因此与各种传统的RDBMS解决方案相比,MongoDB的对象存储"性质本质上说"……这不是我的工作,委托给其他人" ,这通常意味着您在用于访问数据库的内容的客户端"逻辑中进行定义.

There are no "database rules" that enforce this, so in contrast to various traditional RDBMS solutions, the "object store" nature of MongoDB essentially says "...that's not my job, delegate to someone else" and that typically means you define that within the "client" logic of what is used to access the database.

但是,有一些工具" 允许服务器"根据该逻辑进行实际操作.这些基本上围绕 $lookup 进行,实际上,他是现代使用MongoDB时在服务器上执行加入"的方法.

However, there are "some tools" that allow the "server" to actually act on that logic. These essentially revolve around $lookup which is essentially he "modern method" to perform "joins" on the server when working with MongoDB.

因此,如果您要导出"相关数据",那么基本上可以有两种选择:

So if you have "related data" you want to "export", then you basically have a couple of options:

  • 使用$ lookup创建一个视图"

    MongoDB在3.2版本中引入了视图" .这些本质上是集合流水线",它们将化装" 作为一个集合.出于.find()甚至mongoexport之类的正常操作的所有目的,该管道看起来就像一个集合,可以这样访问.

  • Create a "view" using $lookup

    MongoDB introduced "views" with the 3.2 release. These are essentially "aggregation pipelines" which "masquerade" as a collection. For all purposes of normal operations like .find() or even mongoexport this pipeline looks just like a collection and can be accessed as such.

db.createView("related_view", "parent", [
  { "$lookup": {
    "from": "children",
    "localField": "_id",
    "foreignField": "parentId",
    "as": "children"
  }}
])

有了该视图",您可以使用为视图"定义的名称简单地调用mongoexport:

With that "view" in place you can simply call mongoexport using the name defined for the "view":

mongoexport -c related_view -o output.json

就像 $lookup 一样,每个父"项现在将包含一个带有外键的子项"中相关"内容的数组.

Just as $lookup will do, each "parent" item will now contain an array with the "related" content from "children" by the foreign key.

由于 $lookup 产生的输出为BSON在文档中,与所有MongoDB一样,存在相同的约束,因为在任何文档中生成的"join"不得超过16MB.因此,如果数组导致父文档超出此限制,则不能将输出用作嵌入文档中的数组".

As $lookup produces the output as a BSON Document, the same constraints apply as with all MongoDB in that the resulting "join" cannot exceed 16MB within any document. So if the array causes the parent document to grow beyond this limit then using output as an "array" embedded within the document is not an option.

在这种情况下,通常会使用 $unwind 为了使输出内容反规范化",就像在典型的SQL连接中一样.在这种情况下,将为每个相关"成员复制父"文档,并且输出文档与相关子文档中的文档匹配,但所有父信息和子文档"都是唯一的嵌入属性.

For this case you would generally use $unwind in order to "de-normalize" the output content just as it would appear with a typical SQL join. In this case the "parent" document would be copied for each "related" member, and output documents match those from the related children but with all the parent information and the "children" as a singular embedded property.

这仅意味着将 $unwind 添加到这样的视图":

This just means adding $unwind to such a "view":

db.createView("related_view", "parent", [
  { "$lookup": {
    "from": "children",
    "localField": "_id",
    "foreignField": "parentId",
    "as": "children"
  }},
  { "$unwind": "$children" }
])

由于我们基本上只为每个相关孩子"输出一个文档,所以不太可能违反BSON限制.尽管父母和子女都有非常大的文件,但这种可能性仍然存在,尽管很少见.对于这种情况,将有不同的处理方式,我们将在后面提到.

Since we are just outputting essentially one document for each "related child" then it's unlikely the BSON Limit is breached. With very large documents for both parent and child it is still a possibility, though rare. For that case there would be different handling as we can mention later.

如果您没有支持视图"的MongoDB版本,但仍然有

If you don't have a MongoDB version supporting "views" but you still have $lookup and no restriction of the BSON Limit, you could still essentially "script" the invocation of the aggregation pipeline with the mongo shell and output as JSON.

过程与之类似,但是我们没有使用"view"和mongoexport,而是手动包装了一些命令,这些命令可以从命令行从shell中调用:

The process is similar yet instead of using a "view" and mongoexport, we manually wrap a few commands which can be invoked into the shell from the command line:

mongo --quiet --eval '
    db.parent.aggregate([ 
      { "$lookup": {
        "from": "children",
        "localField": "_id",
        "foreignField": "parentId",
        "as": "children"
      }}
    ]).forEach(p => printjson(p))'

同样,该过程与您可以选择 ,如果这就是您想要的,也请在管道中

And again much the same as the process before you can optionally $unwind as well in the pipeline if that is what you are after

如果您在没有 $lookup 支持(并且您不应该支持,因为低于3.0的版本不再有正式支持),或者实际上您实际上存在一种情况,其中联接"将为每个父"文档创建超出BSON限制的数据,然后是另一种选择通过执行查询以获取相关"数据并将其输出来编写"整个联接过程的脚本.

If you are running on a MongoDB instance without $lookup support ( and you should not be, since lower than 3.0 has no more official support ) or indeed you actually have a scenario where the "join" would create data per "parent" document which exceeded the BSON limit, then the other option is to "script" the entire join process by executing queries to obtain the "related" data and output it.

mongo --quiet --eval '
    db.parent.find().forEach(p =>
      printjson(
        Object.assign(p, {
        children: db.children.find({ parentId: p._id }).toArray()
        })
      )
    )'

或者甚至以展开"或非规范化"形式:

Or even in the "unwound" or "de-normalized" form:

mongo --quiet --eval '
    db.parent.find().forEach(p =>
      db.children.find({ parentId: p._id }).forEach(child =>
        printjson(Object.assign(p,{ child }))
      )
    )'

最底线是"MongoDB本身"不了解关系",确实需要您提供该细节.可以以视图"的形式访问它,也可以通过其他方式定义必要的代码"来明确声明此关系"的术语",因为就数据库本身而言,这根本就不会存在任何其他形式.

Bottom line is "MongoDB itself" does not know about "relations", and it really is up to you to provide that detail. Be it in the form of a "view" you can access or by other means of defining the "code" necessary to explicitly state the "terms" of this "relation", because as far as the database itself is concerned this simply does not exist in any other form.

也只是为了解决一个评论问题,如果您对导出"的意图只是创建新集合",则只需创建视图"或使用

Also just to address a point in comment, if your intention on "export" is just to create a "new collection" then either simply create the "view" or use the $out aggregation operator:

db.parent.aggregate([
  { "$lookup": {
    "from": "children",
    "localField": "_id",
    "foreignField": "parentId",
    "as": "children"
  }},
  { "$out": "related_collection" }
])

如果您想更改父项"和嵌入"数据,则循环并使用

And if you wanted to "alter the parent" with "embedded" data, then loop and use bulkWrite():

var batch = [];

db.parent.aggregate([
  { "$lookup": {
    "from": "children",
    "localField": "_id",
    "foreignField": "parentId",
    "as": "children"
  }}
]).forEach(p => {
  batch.push({
    "updateOne": {
      "filter": { "_id": p._id },
      "update": {
        "$push": { "children": { "$each": p.children } }
      }
    }
  });

  if (batch.length > 1000) {
    db.parent.bulkWrite(batch);
    batch = [];
  })
});

if (batch.length > 0) {
  db.parent.bulkWrite(batch);
  batch = [];
}

仅创建一个新集合或更改一个现有集合就无需导出".当然,当您想将集合实际保留为嵌入式"数据并且不需要

There simply is no need to "export" merely to create a new collection or alter an existing one. You would do this of course when you want to actually keep the collection as "embedded" data and not need the overhead of $lookup on every request. But the decision of whether to "embed or reference" is a whole other story.

这篇关于如何使用新ID来mongoexport和mongoimport集合,但保持关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆