从查找文件中批量重命名文件名的一部分 [英] Batch rename part of a filename from a lookup file

查看:528
本文介绍了从查找文件中批量重命名文件名的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:查看底部的最终解决方案

edit: see the bottom for my eventual solution

我有一个约12,700个文本文件的目录。

I have a directory of ~12,700 text files.

他们有这样的名字:

1 - Re / Report Novenator public call for bury - 由Lizbett在Thu上,2009年9月10日。

每个文件的前导数字递增(例如,目录中的最后一个文件以12,700 - )。

Where the leading digital increments with each file (e.g. the last file in the directory begins with "12,700 - ").

不幸的是,这些文件没有时间,我需要它们。幸运的是,我有一个单独的CSV文件,其中ID号被映射。上面的例子中的1应该是25(因为它之前有24个消息),2应该是8,而3应该是1,等等,像这样:

Unfortunately, the files are not timesorted, and I need them to be. Luckily I have a separate CSV file where the ID numbers are mapped e.g. the 1 in the example above should really be 25 (since there are 24 messages before it), and 2 should really be 8, and 3 should be 1, and so forth, like so:

OLD_FILEID  TIMESORT_FILEID
21      0
23      1
24      2
25      3

我不需要更改文件标题中的任何内容,除了我需要与其关联值进行交换的单个前导数字。在我的头脑中,这样做的方法是打开一个文件名,检查出现在破折号之前的数字,在CSV中查找它们,将其替换为关联的值,然后保存带有调整标题的文件,然后转到到下一个文件。

I don't need to change anything in the file title except for this single leading number which I need to swap with its associated value. In my head, the way this would work is to open a file name, check the digits which appear before the dash, look them up in the CSV, replace them with the associated value, and then save the file with the adjusted title and go on to the next file.

什么是最好的方式去做这样的事情?我是一个蟒蛇新手,但已经玩起来足够舒服,遵循大多数方向或建议。谢谢:)

What would be the best way to go about doing something like this? I'm a python newbie but have played around enough to feel comfortable following most directions or suggestions. Thanks :)

e:按照以下说明,我可以做到这一点,这不行,但我不知道为什么:

e: following the instructions below as best I could I did this, which doesn't work, but I'm not sure why:

import os
import csv
import sys

#open and store the csv file
with open('timesortmap.csv','rb') as csvfile:
timeReader = csv.reader(csvfile, delimiter = ',', quotechar='"')

#get the list of files
for filename in os.listdir('DiggOutput-TIMESORT/'):
oldID = filename.split(' - ')[0]
newFilename = filename.replace(oldID, timeReader[oldID],1)
os.rename(oldID, newFilename)

我得到的错误是:

TypeError: '_csv.reader' object is not subscriptable

我没有使用DictReader,但是这是因为当我使用csv.reader并打印行时,它看起来像这样:

I am not using DictReader, but that's because when I use csv.reader and print the rows, it looks like this:

['12740', '12738']
['12742', '12739']
['12738', '12740']
['12737', '12741']
['12739', '12742']

当我使用DictReader它看起来像这样:

And when I use DictReader it looks like this:

{'FILEID-TS': '12738', 'FILEID-OLD': '12740'}
{'FILEID-TS': '12739', 'FILEID-OLD': '12742'}
{'FILEID-TS': '12740', 'FILEID-OLD': '12738'}
{'FILEID-TS': '12741', 'FILEID-OLD': '12737'}
{'FILEID-TS': '12742', 'FILEID-OLD': '12739'}

终端中的错误:

File "TimeSorter.py", line 16, in <module>
newFilename = filename.replace(oldID, timeReader[oldID],1)
AttributeError: DictReader instance has no attribute '__getitem__'


推荐答案

在Python中,只要使用 csv os 模块。

This should really be very simple to do in Python just using the csv and os modules.

Python有一个内置的字典类型称为 dict ,可用于在处理时将csv文件的内容存储在内存中。基本上,您需要使用 csv 模块读取csv文件,并将每个条目转换为字典条目,可能使用 OLD_FILEID 字段作为键, TIMESORT_FILEID 作为值。

Python has a built-in dictionary type called dict that could be used to store the contents of the csv file in-memory while you are processing. Basically, you would need to read the csv file using the csv module and convert each entry into a dictionary entry, probably using the OLD_FILEID field as the key and the TIMESORT_FILEID as the value.

然后可以使用 os.listdir() 获取文件列表,并使用循环依次获取每个文件名。 (如果您需要过滤文件名列表以排除某些文件,请查看 glob 模块)。在你的循环中,你只需要提取与文件相关联的数字,可以使用这样的方式:

You can then use os.listdir() to get the list of files and use a loop to get each file name in turn. (If you need to filter the list of file names to exclude some files, take a look at the glob module). Inside your loop, you just need to extract the number associated with the file, which can be done using something like this:

file_number = filename.split(' - ')[0] 

然后调用 os.rename() 传入旧文件名和新文件名。新文件名可以通过以下方式找到:

Then call os.rename() passing in the old file name and the new file name. The new filename can be found using something like:

new_filename = filename.replace(file_number, file_mapping[file_number], 1)

其中 file_mapping 是从csv创建的字典文件。这将用您的映射文件中的数字替换 file_number 的第一次出现。

Where file_mapping is the dictionary created from the csv file. This will replace the first occurrence of the file_number with the number from your mapping file.

编辑

正如TheodrosZelleke指出的那样,有可能按照我上面列出的方式覆盖现有文件。几种可能的策略:

As TheodrosZelleke points out, there is the potential to overwrite an existing file by literally following what I laid out above. Several possible strategies:


  1. 使用 os.rename()移动重命名的版本的文件进入不同的目录(例如当前目录的子目录,或者更好的是使用 tempfile.mkdtemp() 。所有文件重命名后,使用 os.rename 将文件从临时目录移动到当前目录。

  2. 向新文件名添加扩展名,例如 .tmp ,假设所选的扩展名不会导致其他冲突,一旦所有重命名完成,请使用第二个循环重命名文件以排除 .tmp 扩展名。

  1. Use os.rename() to move the renamed versions of the files into a different directory (e.g. a subdirectory of the current directory or, even better, a temporary directory created using tempfile.mkdtemp(). Once all the files have been renamed, use os.rename to move the files from the temporary directory to the current directory.
  2. Add an extension to the new filename, e.g., .tmp, assuming that the extension chosen will not cause other conflicts. Once all the renames are done, use a second loop to rename the files to exclude the .tmp extension.

这篇关于从查找文件中批量重命名文件名的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆