从文件路径列表构建目录树 [英] Building a directory tree from a list of file paths
问题描述
我正在寻找一种时间有效的方法来将文件列表解析成树。可以有数亿个文件路径。
I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.
强力解决方案是在目录分隔符发生时拆分每个路径,并遍历目录中添加的树,文件条目通过进行字符串比较,但这将非常慢。
The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.
输入数据通常按字母顺序排序,因此列表将类似于:
The input data is usually sorted alphabetically, so the list would be something like:
C:\Users\Aaron\AppData\Amarok\Afile
C:\Users\Aaron\AppData\Amarok\Afile
C: \Users\Aaron\AppData\Amarok\Afile2
C:\Users\Aaron\AppData\Amarok\Afile2
C:\Users\Aaron\AppData\Amarok\Afile3
C:\Users\Aaron\AppData\Amarok\Afile3
C:\Users\Aaron\AppData\Blender\alibrary.dll
C:\Users\Aaron\AppData\Blender\alibrary.dll
C:\ Users \Aaron\AppData\Blender\and_so_on.txt
C:\Users\Aaron\AppData\Blender\and_so_on.txt
从这个顺序我的自然反应是分区目录列表进入群体...不知何故...在做slo之前w字符串比较。我真的不确定我会感谢任何想法。
From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.
编辑:如果可能的话,这棵树从上到下懒惰加载,这会更好。
It would be better if this tree were lazy loaded from the top down if possible.
推荐答案
除了做完整的字符串比较,你别无选择,因为不能保证字符串可能不同。有一些技巧可能会加快一点:
You have no choice but to do full string comparisons since you can't guarantee where the strings might differ. There are a couple tricks that might speed things up a little:
- 正如David所说,形成树,但搜索新的插入点从前一个(可能借助某种
matchingPrefix
例程,将告诉你新的不同之处)。 - 对于树的每个级别使用哈希表,如果内部可能有很多文件,并且需要重复数据。 (否则,附加到堆栈就可以了。)
- As David said, form a tree, but search for the new insertion point from the previous one (perhaps with the aid of some sort of
matchingPrefix
routine that will tell you where the new one differs). - Use a hash table for each level of the tree if there may be very many files within and you need to count duplicates. (Otherwise, appending to a stack is fine.)
这篇关于从文件路径列表构建目录树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!