从目录中获取单个文件的最有效/最快速的方法 [英] Most efficient/fastest way to get a single file from a directory

查看:118
本文介绍了从目录中获取单个文件的最有效/最快速的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python从目录中获取单个文件的最有效和最快捷的方法是什么?



有关我的具体问题的更多细节:

我有一个包含大量预生成文件的目录,我只想选一个随机文件。由于我知道没有一个真正有效的方式从除了列出所有文件之外的目录中选择一个随机文件,我的文件是以随机的名字生成的,因此它们已经被随机排序,我只需要选择第一个文件夹中的文件。



所以我的问题是:如何从我的文件夹中选择第一个文件,而无需从目录中加载整个文件列表让操作系统这样做,我的最佳目标是强制操作系统只返回一个文件,然后停止!)。



注意:我有很多的文件,所以为什么我想避免列出所有的文件来选择一个。



注2:每个文件只被挑选一次,然后删除以确保下一次只会选择新的文件(从而确保某种随机性)。



解决方案



我终于选择使用一个索引文件来存储:




  • 要选择的当前文件的索引(例如:file1ext为1,file2.ext等为2)。

  • 生成的最后一个文件的索引(例如:1999 file1999.ext)



当然,这意味着我的文件不是用随机名称生成的,而是使用确定性递增模式(例如:%s.ext%ID)



因此,我的两个主要操作有几乎不变的时间:




  • 访问文件夹中的下一个文件

  • 计算剩余的文件数(以便我可以在后台生成新文件线程,需要时)。



这是我的问题的一个具体解决方案,对于更通用的解决方案,请阅读接受的答案。 p>

此外,您可能对这两种其他解决方案感兴趣,我们发现使用Python优化文件访问和目录行走:




解决方案

创建文件时,将最新文件的名称添加到文本文件中存储的列表中。当您要阅读/处理/删除文件时:


  1. 打开文本文件

  2. 设置文件名到列表顶部的名称。

  3. 从列表顶部删除名称

  4. 关闭文本文件

  5. 处理文件名。 / li>


What is the most efficient and fastest way to get a single file from a directory using Python?

More details on my specific problem:
I have a directory containing a lot of pregenerated files, and I just want to pick a random one. Since I know that there's no really efficient way of picking a random file from a directory other than listing all the files first, my files are generated with an already random name, thus they are already randomly sorted, and I just need to pick the first file from the folder.

So my question is: how can I pick the first file from my folder, without having to load the whole list of files from the directory (nor having the OS to do that, my optimal goal would be to force the OS to just return me a single file and then stop!).

Note: I have a lot of files in my directory, hence why I would like to avoid listing all the files to just pick one.

Note2: each file is only picked once, then deleted to ensure that only new files are picked the next time (thus ensuring some kind of randomness).

SOLUTION

I finally chose to use an index file that will store:

  • the index of the current file to be picked (eg: 1 for file1.ext, 2 for file2.ext, etc..)
  • the index of the last file generated (eg: 1999 for file1999.ext)

Of course, this means that my files are not generated with a random name anymore, but using a deterministic incrementable pattern (eg: "file%s.ext" % ID)

Thus I have a near constant time for my two main operations:

  • Accessing the next file in the folder
  • Counting the number of files that are left (so that I can generate new files in a background thread when needed).

This is a specific solution for my problem, for more generic solutions, please read the accepted answer.

Also you might be interested into these two other solutions I've found to optimize the access of files and directory walking using Python:

解决方案

when creating the files add the name of the newest file to a list stored in a text file. When you want to read/process/delete a file:

  1. Open the text file
  2. Set filename to the name on the top of the list.
  3. Delete the name from the top of the list
  4. Close the text file
  5. Process filename.

这篇关于从目录中获取单个文件的最有效/最快速的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆