在数据库中存储深层目录树 [英] Storing a deep directory tree in a database

查看:139
本文介绍了在数据库中存储深层目录树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用类似于WinDirStat或voidtools的Everything的桌面应用程序-它映射硬盘驱动器,即在目录树之外创建一个深度嵌套的字典.

I am working on a desktop application that is much like WinDirStat or voidtools' Everything - it maps hard drives, i.e. creates a deeply nested dictionary out of the directory tree.

然后,桌面应用程序应将目录树存储在某种数据库中,以便可以使用Web应用程序从根目录,深度级别逐级浏览它们.

The desktop application should then store the directory trees in some kind of database, so that a web application can be used to browse them from root, depth level by depth level.

假设两个应用程序暂时在同一台计算机上本地运行.

Assume both applications run locally on the same machine for the time being.

想到的问题是,应考虑以下因素,如何对数据进行结构化以及应使用哪种数据库? 1)RAM消耗应合理 2)目录准备就绪,可以在Web应用程序中查看的时间应尽可能短

The question that comes to mind is how the data should be structured and what database should be utilized, considering: 1) RAM consumption should be reasonable 2) The time it takes to for the directory to be ready for viewing in the web application should be minimal

P-S- 我最初的方法是将每个文件系统节点分别序列化为JSON,然后将每个节点插入Mongo,并通过对象引用将它们链接到其子级.这样,Web应用程序可以根据用户需求轻松加载数据. 但是,我担心要为Mongo制作这么多(平均一百万个)独立插入内容会花费很多时间.如果我进行批量插入,那意味着我必须将每个批量存储在内存中.

P.S - My initial approach was serializing each file system node to JSON separately and inserting each into Mongo, with object references linking them to their children. That way the web application could easily load the data based on user demand. However, I am worried that making so many (a million, by average) independent inserts to Mongo will take a lot of time; if I make bulk inserts that means I have to keep each bulk in memory.

我还考虑将整个树作为一个深层嵌套的JSON进行转储,但是数据太大而无法成为Mongo文档.可以使用GridFS来存储它,但是即使深度节点可能没有兴趣,我仍将整个树加载到Web应用程序中.

I also considered dumping the entire tree as one deeply nested JSON, but the data is too large to be a Mongo document. GridFS can be used to store it, but then I would have the load the entire tree in the web application even though the deep nodes may not be of interest.

推荐答案

鉴于您的要求:

  • A)RAM使用率低
  • B)在Mongo中满足文件大小限制
  • C)响应式用户界面

我会考虑以下内容.

使用此示例目录

C:\
C:\X\
C:\X\B\
C:\X\file.txt
C:\Y\
C:\Y\file.pdf
C:\Y\R\
C:\Y\R\file.js

在JSON中,它可能表示为:

In JSON it could possibly be represented as:

{
    "C:" : {
        "X" : {
            "B" : { },
            "file.txt" : "C:\X\file.txt",
        },
        "Y" : {
            "file.pdf" : "C:\Y\file.pdf",
            "R" : {
                "file.js" : "C:\Y\R\file.js",
            }
        }
    }
}

正如您所指出的那样,后者在大型目录结构中无法很好地扩展(我可以直接告诉您,浏览器不会欣赏甚至包含数千个文件/文件夹的适度目录所代表的JSON blob).前者虽然类似于某些实际的文件系统,并且在正确的上下文中是高效的,但要在JSON之间进行转换是很痛苦的.

The latter, as you pointed out, does not scale well with large directory structures (I can tell you first hand that browsers will not appreciate a JSON blob representing even a modest directory with a few thousand files/folders). The former, though akin to some actual filesystems and efficient in the right context, is a pain to work with converting to and from JSON.

我的建议是将每个目录分成一个单独的JSON文档,因为这将解决所有三个问题,但是没有免费的东西,这将增加代码复杂性,每个会话的请求数等.

My proposal is to break each directory into a separate JSON document, as this will address all three issues, however nothing is free, and this will increase code complexity, number of requests per session, etc.

以上结构可以分为以下文件:

The above structure could be broken into the following documents:

{
    "id" : "CCCCCCCC",
    "type" : "p",
    "name" : "C:",
    "children" : [
        { "name" : "X", "type" : "p", "id" : "XXXXXXXX" },
        { "name" : "Y", "type" : "p", "id" : "YYYYYYYY" }
    ]
}

{
    "id" : "XXXXXXXX",
    "type" : "p",
    "name" : "X",
    "children" : [
        { "name" : "B", "type" : "p", "id" : "BBBBBBBB" },
        { "name" : "file.txt", "type" : "f", "path" : "C:\X\file.txt", "size" : "1024" }
    ]
}

{
    "id" : "YYYYYYYY",
    "type" : "p",
    "name" : "Y",
    "children" : [
        { "name" : "R", "type" : "p", "id" : "RRRRRRRR" },
        { "name" : "file.pdf", "type" : "f", "path" : "C:\Y\file.pdf", "size" : "2048" }
    ]
}

{
    "id" : "BBBBBBBB",
    "type" : "p",
    "name" : "B",
    "children" : [ ]
}

{
    "id" : "RRRRRRRR",
    "type" : "p",
    "name" : "R",
    "children" : [
        { "name" : "file.js", "type" : "f", "path" : "C:\Y\R\file.js", "size" : "2048" }
    ]
}

每个文档仅代表一个文件夹及其直接子文件夹.子文件夹可以使用其ID进行延迟加载,并附加到UI中的父文件夹.实施良好的延迟加载可以将子节点预加载到所需深度,从而创建响应速度非常快的UI. RAM使用量极小,因为您的服务器仅需预先处理较小的有效负载.与单文档方法相比,请求的数量确实增加了很多,但是同样,一些聪明的延迟加载可以将请求聚类并减少总数.

Where each document represents a folder and its immediate children only. Child folders can be lazy loaded using their ids and appended to their parent in the UI. Well implemented lazy loading can pre load child nodes to a desired depth, creating a very responsive UI. RAM usage is minimal as your sever only has to handle small payloads pre request. The number of requests does go up considerably versus a single document approach, but again, some clever lazy-loading can cluster requests and reduce the total number.

更新1 :在回答之前,我以某种方式忽略了您的倒数第二段,因此,这大概是您所想到的.为了解决文档太多的问题,文档中某些级别的群集节点可能是按顺序排列的.我现在必须离开,但我会考虑一下.

UPDATE 1: I somehow overlooked your second last paragraph before answering, so this is probably more or less what you had in mind. To address the issue of too many documents some level of clustering nodes within documents may be in order. I have to head off now but I'll give it some thought.

更新2 :我已经创建了我提到的集群概念的简化版本的要点.它不考虑文件,仅考虑文件夹,也不包括更新文档的代码.希望它会给您一些想法,我将继续为自己的目的对其进行更新.

UPDATE 2: I've created a gist of a simplified version of the clustering concept I mentioned. It doesn't take into account files, just folders, and it isn't doesn't include and code to update the documents. Hopefully it'll give you some ideas, I'll continue to update it for my own purposes.

要点: tree_docs_cluster.js

这篇关于在数据库中存储深层目录树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆