从文件路径列表构建目录树 [英] Building a directory tree from a list of file paths

查看:354
本文介绍了从文件路径列表构建目录树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种时间有效的方法来将文件列表解析成树。可以有数亿个文件路径。

I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.

强力解决方案是在目录分隔符发生时拆分每个路径,并遍历目录中添加的树,文件条目通过进行字符串比较,但这将非常慢。

The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.

输入数据通常按字母顺序排序,因此列表将类似于:

The input data is usually sorted alphabetically, so the list would be something like:


C:\Users\Aaron\AppData\Amarok\Afile

C:\Users\Aaron\AppData\Amarok\Afile

C: \Users\Aaron\AppData\Amarok\Afile2

C:\Users\Aaron\AppData\Amarok\Afile2

C:\Users\Aaron\AppData\Amarok\Afile3

C:\Users\Aaron\AppData\Amarok\Afile3

C:\Users\Aaron\AppData\Blender\alibrary.dll

C:\Users\Aaron\AppData\Blender\alibrary.dll

C:\ Users \Aaron\AppData\Blender\and_so_on.txt

C:\Users\Aaron\AppData\Blender\and_so_on.txt

从这个顺序我的自然反应是分区目录列表进入群体...不知何故...在做slo之前w字符串比较。我真的不确定我会感谢任何想法。

From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.

编辑:如果可能的话,这棵树从上到下懒惰加载,这会更好。

It would be better if this tree were lazy loaded from the top down if possible.

推荐答案

除了做完整的字符串比较,你别无选择,因为不能保证字符串可能不同。有一些技巧可能会加快一点:

You have no choice but to do full string comparisons since you can't guarantee where the strings might differ. There are a couple tricks that might speed things up a little:


  • 正如David所说,形成树,但搜索新的插入点从前一个(可能借助某种 matchingPrefix 例程,将告诉你新的不同之处)。

  • 对于树的每个级别使用哈希表,如果内部可能有很多文件,并且需要重复数据。 (否则,附加到堆栈就可以了。)

  • As David said, form a tree, but search for the new insertion point from the previous one (perhaps with the aid of some sort of matchingPrefix routine that will tell you where the new one differs).
  • Use a hash table for each level of the tree if there may be very many files within and you need to count duplicates. (Otherwise, appending to a stack is fine.)

这篇关于从文件路径列表构建目录树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆