从列表 os 文件路径 (Python) 构建一棵树 - 取决于性能 [英] Construct a tree from list os file paths (Python) - Performance dependent

查看:17
本文介绍了从列表 os 文件路径 (Python) 构建一棵树 - 取决于性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿,我正在开发一个用 python 编写的非常高性能的文件管理/分析工具包.我想创建一个函数,以树格式为我提供列表或类似内容.类似于这个问题(java相关)

Hey I am working on a very high performance file-managing/analyzing toolkit written in python. I want to create a function that gives me a list or something like that in a tree format. Something like in this question (java-related)

来自:

dir/file
dir/dir2/file2
dir/file3
dir3/file4
dir3/file5

注意:路径列表未排序

致:

dir/
    file
    dir2/
        file2
    file3
dir3/
    file4
    file5

[[dir, [file, [dir2, [file2]], file3]], [dir3, [file4, file5]]]

类似的东西.我一直在尝试一些想法,但没有一个能提供我想要的速度.

something along those lines. I've been playing around with some ideas but none of them provided the speed that I would like to have.

注意:我已经有了路径列表,所以不用担心.该函数采用路径列表并给出树列表.

Note: I do already have the list of paths, so no worrying about that. The function takes paths list and gives tree list.

提前致谢

推荐答案

既然你把问题澄清了一点,我想以下是你想要的:

Now that you clarified the question a bit more, I guess the following is what you want:

from collections import defaultdict

input_ = '''dir/file
dir/dir2/file2
dir/file3
dir2/alpha/beta/gamma/delta
dir2/alpha/beta/gamma/delta/
dir3/file4
dir3/file5'''

FILE_MARKER = '<files>'

def attach(branch, trunk):
    '''
    Insert a branch of directories on its trunk.
    '''
    parts = branch.split('/', 1)
    if len(parts) == 1:  # branch is a file
        trunk[FILE_MARKER].append(parts[0])
    else:
        node, others = parts
        if node not in trunk:
            trunk[node] = defaultdict(dict, ((FILE_MARKER, []),))
        attach(others, trunk[node])

def prettify(d, indent=0):
    '''
    Print the file tree structure with proper indentation.
    '''
    for key, value in d.iteritems():
        if key == FILE_MARKER:
            if value:
                print '  ' * indent + str(value)
        else:
            print '  ' * indent + str(key)
            if isinstance(value, dict):
                prettify(value, indent+1)
            else:
                print '  ' * (indent+1) + str(value)



main_dict = defaultdict(dict, ((FILE_MARKER, []),))
for line in input_.split('
'):
    attach(line, main_dict)

prettify(main_dict)

它输出:

dir3
  ['file4', 'file5']
dir2
  alpha
    beta
      gamma
        ['delta']
        delta
          ['']
dir
  dir2
    ['file2']
  ['file', 'file3']

需要注意的几点:

  • 脚本大量使用defaultdicts,基本上这允许跳过检查是否存在键及其初始化(如果不存在)
  • 目录名称被映射到字典键,我认为这对你来说可能是一个很好的功能,因为键是散列的,你将能够比使用列表更快地检索信息.您可以以 main_dict['dir2']['alpha']['beta']...
  • 的形式访问层次结构
  • 注意 .../delta.../delta/ 之间的区别.我认为这有助于您快速区分叶子是目录还是文件.
  • The script make heavy use of defaultdicts, basically this allows to skip checking for the existence of a key and its initialisation if it is not there
  • Directory names are mapped to dictionary keys, I thought this might be a good feature for you, as key are hashed and you will able to retrieve information much faster this way than with lists. You can access the hierarchy in the form main_dict['dir2']['alpha']['beta']...
  • Note the difference between .../delta and .../delta/. I thought this was helpful for you to be able to quickly differenciate between your leaf being a directory or a file.

我希望这能回答您的问题.如果有什么不清楚的,请发表评论.

I hope this answers your question. If anything is unclear, post a comment.

这篇关于从列表 os 文件路径 (Python) 构建一棵树 - 取决于性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆