如何使用GridFs保存JSON文件 [英] How to save an JSON file using GridFs
问题描述
我有一个庞大的数据集,我使用的是猫鼬模式,每个数据元素如下所示:
I have a huge dataset, I am using mongoose schemas, and each data element looks like this:
{
field1: ">HWI-ST700660_96:2:1101:1455:2154#5@0/1":
field2: "GAA…..GAATG"
}
来源:读取FASTA文件
如您所见,各个元素既简单又小,但是数量却很大!加起来,它们将超过200MB.
As you can see, the individual elements are simple and small, but they are huge in number! Together, they will exceed 200MB.
问题是:我无法将其保存到mongo,因为它太大(> 200MB).
尽管如此,我还是找到了GridFs
I have found GridFs, nonetheless,
-
到目前为止,我发现的所有材料都涉及图像和视频上传;
All the materials I have found so far talks about image and videos uploads;
他们没有说我怎么仍然可以使用猫鼬模式功能;
They do not say how I could still use the mongoose schema capability;
到目前为止,我看到的示例并未像将猫鼬那样,将数据保存到用户定义的路径中.
The examples I have seen so far does not save the data into paths defined by the user, like we do with mongoose.
在最简单的情况下:如何使用GridFS或任何与小JSON文件类似的解决方案来保存JSON文件.与其他方法(如果有)相比,该方法的优缺点是什么?您认为我的方法有效吗?我的意思是,我在这里提到的那个,使用一棵JSON文件树,随后populate
,它起作用了!
In the simplest scenario: how can I save a JSON file using GridFS, or any similar solution as I do with small JSON files. What are the pros and cons of this approach compared to other approaches, if any? Do you consider my approach valid? I mean, the one I have mentioned here, using a tree of JSON files and populate
later, it works!
作为使用猫鼬保存JSON文件的示例:
As an example of saving a JSON file using mongoose:
Model.create([
{
field1: ">HWI-ST700660_96:2:1101:1455:2154#5@0/1":
field2: "GAA…..GAATG"
},
{
field1: ">HWI-ST700660_96:2:1101:1455:2154#5@0/1":
field2: "GAA…..GAATG"
}]);
在这里,我刚刚保存了一个包含两个元素的JSON文件,我无法用一个巨大的文件来保存,我需要分成较小的块(例如1%的块),并创建刚才提到的树,至少那是我的解决方案.
Here I have just saved a two-element JSON file, I cannot do that with a huge one, I need to break into smaller pieces (chunks of say 1%), and create the tree just mentioned, at least that was my solution.
恐怕我可能正在重新发明轮子.我可以独立保存这些文件,并且可以工作,但是我需要使它们保持关联,因为它们属于同一文件,就像图像的较小块属于同一图像一样.
I am afraid I may be reinventing the wheels. I could save those files independently, and it works, but I need to keep them correlated, because they belong to the same file, like the smaller chunks of an image belongs to the same image.
恐怕我可能正在重新发明轮子.
I am afraid I may be reinventing the wheels.
当前解决方案
这是我目前的解决方案,使用我自己的见解! 请注意,我在这里只是出于好奇而提及,它没有使用GridFS,因此,我仍然愿意使用GridFS提出建议.它仅使用JSON文件,并将文档分解为较小的文件,例如层次结构.它是一棵树,我只想要叶子在溶液中.
This is my current solution, using my own insights! See that I am mentioning here just for curiosity, it does not use GridFS, as so, I am still opened for suggestions using GridFS. It is using just JSON files, and breaking the document into smaller ones, in a level like hierarchy. It is a tree, and I just want the leaves in the solution.
我已经使用此图解决了问题,不过,出于学习目的,我希望查看,如果可以使用GridFS进行同样的操作.
I have solved the problem using this diagram, nonetheless, I want, for learning purposes, see if it is possible to do the same using GridFS.
讨论
我的第一个方法是将它们保留为子文档:它失败了!那么我尝试仅保留其ID,它们的ID对应于整个块的35%,并且它大于16MB:失败!然后我决定创建一个虚拟文档,仅保留ID,并仅存储虚拟文档的ID:成功!
My first approach was to keep them as subdoc: it failed! then I have tried to keep just their ids, their ids correspend to 35% of the whole chunk, and it is bigger than 16MB: failed! then I have decided to create a dummy document, just to keep the ids, and stores just the id of the dummy documents: successes!
推荐答案
我找到了比我已经实现的问题描述中的一种更好的方法来解决此问题.我只需要使用虚拟机!
I have found a better way to solve this problem than the one I have implemented, the one in the question description. I just need to use Virtuals!
首先,我认为使用ForEach
向Fasta文件添加额外的元素会很慢,不是,这非常快!
First I thought that using ForEach
for adding an extra element to the Fasta file would be slow, it is not, it is pretty fast!
我可以对每个Fasta文件执行以下操作:
I can do something like this for each Fasta file:
{
Parentid: { type: mongoose.Schema.Types.ObjectId, ref: "Fasta" }//add this new line with its parent id
field1: ">HWI-ST700660_96:2:1101:1455:2154#5@0/1":
field2: "GAA…..GAATG"
}
然后这样:
FastaSchema.virtual("healthy", {
ref: "FastaElement",
localField: "_id",
foreignField: "parent",
justOne: false,
});
最后填充:
Fasta.find({ _id: ObjectId("5e93b9b504e75e5310a43f46") })
.populate("healthy")
.exec(function (error, result) {
res.json(result);
});
魔术完成了,子文档重载没有问题!应用于Virtual的填充非常快,不会造成任何重载!我还没有做到这一点,但是与传统的种群进行比较会很有趣.但是,这种方法的优点是无需创建隐藏文档来存储ID.
And the magic is done, no problem with subdocument overload! Populate applied to Virtual is pretty fast and causes no overload! I have not done that, but it would interesting to compare with conventional populate; however, this approach has the advantage of no need to create hidden doc to store the ids.
这个简单的解决方案让我无言以对,这是我在这里回答另一个问题时出现的,而它只是出现了!
I am speechless with this simple solution, that came up when I was answering another question here, and it just came up!
谢谢猫鼬!
这篇关于如何使用GridFs保存JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!