如何使用Python处理从一个子文件夹到每个目录中另一个子文件夹的文件? [英] How to process files from one subfolder to another in each directory using Python?
问题描述
我在桌面上有一个基本的文件/文件夹结构,其中测试文件夹包含文件夹1,该文件夹又包含2个子文件夹:
I have a basic file/folder structure on the Desktop where the "Test" folder contains "Folder 1", which in turn contains 2 subfolders:
- 一个原始文件子文件夹,其中包含shapefile(.shp)。
- 已处理文件子文件夹为空。
我正在尝试编写一个脚本会查找每个父文件夹(文件夹1 ,文件夹2 等),如果找到了 Original Files 子文件夹,它将运行一个函数并将结果输出到已处理文件子文件夹中。
I am attempting to write a script which looks into each parent folder (Folder 1, Folder 2 etc) and if it finds an Original Files subfolder, it will run a function and output the results into the Processed files subfolder.
我制作了一个简单的图表来说明这一点,如果 Folder 1 包含相关的子文件夹,则该函数将运行;如果文件夹2 确实不包含子文件夹,则将其忽略:
I made a simple diagram to showcase this where if Folder 1 contains the relevant subfolders then the function will run; if Folder 2 does not contain the subfolders then it's simply ignored:
我查看了以下帖子,但遇到了一些麻烦:
I looked into the following posts but having some trouble:
以下是似乎运行得很顺利的脚本,恼人的是它不会产生错误,所以这个真正的菜鸟看不到问题是:
The following is the script which seems to run happily, annoying thing is that it doesn't produce an error so this real noob can't see where the problem is:
import os, sys
from os.path import expanduser
home = expanduser("~")
for subFolders, files in os.walk(home + "\Test\\" + "\*Original\\"):
if filename.endswith('.shp'):
output = home + "\Test\\" + "\*Processed\\" + filename
# do_some_function, output
推荐答案
我猜您在 os.walk()
循环中混合了一些东西。
I guess you mixed something up in your os.walk()
-loop.
我刚刚创建了一个简单的结构,如您的问题所示,并使用此代码来获取您要查找的内容:
I just created a simple structure as shown in your question and used this code to get what you're looking for:
root_dir = '/path/to/your/test_dir'
original_dir = 'Original files'
processed_dir = 'Processed files'
for path, subdirs, files in os.walk(root_dir):
if original_dir in path:
for file in files:
if file.endswith('shp'):
print('original dir: \t' + path)
print('original file: \t' + path + os.path.sep + file)
print('processed dir: \t' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir)
print('processed file: ' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir + os.path.sep + file)
print('')
如果您确实确定什么,我建议仅在目录爬行脚本中使用通配符您的目录树看起来像。我宁愿使用文件夹的全名来进行搜索,就像在脚本中一样。
I'd suggest to only use wildcards in a directory-crawling script if you are REALLY sure what your directory tree looks like. I'd rather use the full names of the folders to search for, as in my script.
每当使用路径时,都要注意路径分隔符-斜线。
Whenever you use paths, take care of your path separators - the slashes.
在Windows系统上,反斜杠用于此目的:
On windows systems, the backslash is used for that:
C:\any\path\you\name
大多数其他系统使用正常的正斜杠:
Most other systems use a normal, forward slash:
/the/path/you/want
在python中,可以直接使用正斜杠,而不会出现任何问题:
In python, a forward slash could be used directly, without any problem:
path_var = '/the/path/you/want'
...而不是反斜杠。反斜杠是python字符串中的特殊字符。例如,它用于换行命令: \n
...as opposed to backslashes. A backslash is a special character in python strings. For example, it's used for the newline-command: \n
用来说明您不需要要将其用作特殊字符,但作为反斜杠本身,则必须使用另一个反斜杠 $来转义它:'\\'
。这样会使Windows路径看起来像这样:
To clarify that you don't want to use it as a special character, but as a backslash itself, you either have to "escape" it, using another backslash: '\\'
. That makes a windows path look like this:
path_var = 'C:\\any\\path\\you\\name'
...或者您可以将字符串标记为原始字符串(或文字字符串),后跟 r
。请注意,这样做就不能再在该字符串中使用特殊字符。
...or you could mark the string as a "raw" string (or "literal string") with a proceeding r
. Note that by doing that, you can't use special characters in that string anymore.
path_var = r'C:\any\path\you\name'
在您的注释中,您使用了示例 root_dir = home + \Test\\
。该字符串中的反斜杠在此处用作特殊字符,因此python会尝试从反斜杠和以下字符中理解: \T
。我不确定在python中是否有任何意义,但是 \t
是否将转换为制表符。无论哪种方式-都不会解析为您要使用的路径。
In your comment, you used the example root_dir = home + "\Test\\"
. The backslash in this string is used as a special character there, so python tries to make sense out of the backslash and the following character: \T
. I'm not sure if that has any meaning in python, but \t
would be converted to a tab-stop. Either way - that will not resolve to the path you want to use.
我想知道您的其他示例为何起作用。在 C:\Users\me\Test\\
中, \U
和 \m
应该导致类似的错误。而且您还混合了单反斜线和双反斜线。
I'm wondering why your other example works. In "C:\Users\me\Test\\"
, the \U
and \m
should lead to similar errors. And you also mixed single and double backslashes.
说……
当您照顾好OS路径分隔符并尝试使用新路径时,请注意python为您做了很多与路径有关的事情。例如,如果您的脚本像 os.walk()
一样读取目录,则在我的Windows系统上,分隔符已经被处理为双反斜杠了。
When you take care of your OS path separators and trying around with new paths now, also note that python does a lot of path-concerning things for you. For example, if your script reads a directory, as os.walk()
does, on my windows system the separators are already processed as double backslashes. There's no need for me to check that - it's usually just hardcoded strings, where you'll have to take care.
最后是: Python os.path模块提供了许多处理路径的方法,分隔符等。例如, os.path.sep
(以及 os.sep
)也将在正确的分隔符中转换为系统python正在运行。您还可以使用 os.path.join()
来构建路径。
And finally: The Python os.path module provides a lot of methods to handle paths, seperators and so on. For example, os.path.sep
(and os.sep
, too) wil be converted in the correct seperator for the system python is running on. You can also build paths using os.path.join()
.
您使用 expanduser(〜)
来获取当前用户的起始路径。那应该工作正常,但是如果您使用的是旧的python版本,则可能存在错误-请参阅:在Windows上,expanduser(〜)在HOME优先搜索
You use expanduser("~")
to get the home-path of the current user. That should work fine, but if you're using an old python version, there could be a bug - see: expanduser("~") on Windows looks for HOME first
因此,请检查该home-path是否正确解析,然后使用 os
-module的功能来构建路径:-)
So check if that home-path is resolved correct, and then build your paths using the power of the os
-module :-)
希望有帮助!
这篇关于如何使用Python处理从一个子文件夹到每个目录中另一个子文件夹的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!