从列表os文件路径构造树(Python)-取决于性能 [英] Construct a tree from list os file paths (Python) - Performance dependent

查看:71
本文介绍了从列表os文件路径构造树(Python)-取决于性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿,我正在研究一个用python编写的非常高性能的文件管理/分析工具包. 我想创建一个函数,以树格式给出列表或类似内容. 问题(与Java相关)

Hey I am working on a very high performance file-managing/analyzing toolkit written in python. I want to create a function that gives me a list or something like that in a tree format. Something like in this question (java-related)

发件人:

dir/file
dir/dir2/file2
dir/file3
dir3/file4
dir3/file5

注意:路径列表未排序

收件人:

dir/
    file
    dir2/
        file2
    file3
dir3/
    file4
    file5

[[dir, [file, [dir2, [file2]], file3]], [dir3, [file4, file5]]]

沿着这些思路.我一直在研究一些想法,但是没有一个想法能提供我想要的速度.

something along those lines. I've been playing around with some ideas but none of them provided the speed that I would like to have.

注意:我已经有了路径列表,因此不必担心.该函数获取路径列表并给出树列表.

Note: I do already have the list of paths, so no worrying about that. The function takes paths list and gives tree list.

预先感谢

推荐答案

现在您已进一步澄清了这个问题,我想以下是您想要的:

Now that you clarified the question a bit more, I guess the following is what you want:

from collections import defaultdict

input_ = '''dir/file
dir/dir2/file2
dir/file3
dir2/alpha/beta/gamma/delta
dir2/alpha/beta/gamma/delta/
dir3/file4
dir3/file5'''

FILE_MARKER = '<files>'

def attach(branch, trunk):
    '''
    Insert a branch of directories on its trunk.
    '''
    parts = branch.split('/', 1)
    if len(parts) == 1:  # branch is a file
        trunk[FILE_MARKER].append(parts[0])
    else:
        node, others = parts
        if node not in trunk:
            trunk[node] = defaultdict(dict, ((FILE_MARKER, []),))
        attach(others, trunk[node])

def prettify(d, indent=0):
    '''
    Print the file tree structure with proper indentation.
    '''
    for key, value in d.iteritems():
        if key == FILE_MARKER:
            if value:
                print '  ' * indent + str(value)
        else:
            print '  ' * indent + str(key)
            if isinstance(value, dict):
                prettify(value, indent+1)
            else:
                print '  ' * (indent+1) + str(value)



main_dict = defaultdict(dict, ((FILE_MARKER, []),))
for line in input_.split('\n'):
    attach(line, main_dict)

prettify(main_dict)

它输出:

dir3
  ['file4', 'file5']
dir2
  alpha
    beta
      gamma
        ['delta']
        delta
          ['']
dir
  dir2
    ['file2']
  ['file', 'file3']

一些注意事项:

  • 该脚本大量使用了 defaultdicts ,基本上,这可以跳过是否存在检查密钥及其初始化(如果不存在的话)
  • 目录名映射到字典键,我认为这可能对您来说是个好功能,因为键是散列的,因此您可以比列表更快地检索信息.您可以以main_dict['dir2']['alpha']['beta'] ...
  • 的形式访问层次结构
  • 请注意.../delta.../delta/之间的区别.我认为这有助于您快速区分目录或文件.
  • The script make heavy use of defaultdicts, basically this allows to skip checking for the existence of a key and its initialisation if it is not there
  • Directory names are mapped to dictionary keys, I thought this might be a good feature for you, as key are hashed and you will able to retrieve information much faster this way than with lists. You can access the hierarchy in the form main_dict['dir2']['alpha']['beta']...
  • Note the difference between .../delta and .../delta/. I thought this was helpful for you to be able to quickly differenciate between your leaf being a directory or a file.

我希望这能回答您的问题.如果不清楚,请发表评论.

I hope this answers your question. If anything is unclear, post a comment.

这篇关于从列表os文件路径构造树(Python)-取决于性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆