使用python在多个文件中进行多个正则表达式替换 [英] multiple regex substitution in multiple files using python

查看：58 发布时间：2021/9/1 18:40:08 python regex substitution

本文介绍了使用python在多个文件中进行多个正则表达式替换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个项目，我需要使用 python 将一打左右的正则表达式应用于大约 100 个文件.在网上搜索了 4 个多小时的各种组合，包括(merge|concatenate|stack|join|compile) multiple regex in python"，但我没有找到任何关于我需要的帖子.

I have one project where I need to apply a dozen or so regex to about 100 files using python. 4+ hours of searching the web for various combinations including "(merge|concatenate|stack|join|compile) multiple regex in python" and I haven't found any posts regarding my need.

这对我来说是一个中等规模的项目.我需要几个较小的正则表达式项目，它们只需要在十几个左右的文件中应用 5-6 个正则表达式模式.虽然这些对我的工作有很大帮助，但我的祖父项目是应用 100 多个搜索的文件，将字符串替换为我得到的任何新文件.(某些语言的拼写约定未标准化，能够快速处理文件将提高工作效率.)

This is a mid-sized project for me. There are several smaller regex projects that I need which take only 5-6 regex patterns applied over only a dozen or so files. While these will be a great aid in my work, the grand-daddy project is a applying a file of 100+ search, replace strings to any new file I get. (Spelling conventions in certain languages are not standardized and being able to quick-process files will increase productivity.)

理想情况下，正则表达式字符串需要可由非程序员更新，但这可能超出了本文的范围.

Ideally, the regex strings need to be update-able by a non programmer, but that maybe outside of the scope of this post.

这是我目前所拥有的:

import os, re, sys # Is "sys" necessary?

path = "/Users/mypath/testData"
myfiles = os.listdir(path)

for f in myfiles:

    # split the filename and file extension for use in renaming the output file
    file_name, file_extension = os.path.splitext(f)
    generated_output_file = file_name + "_regex" + file_extension

    # Only process certain types of files.
    if re.search("txt|doc|odt|htm|html")

    # Declare input and output files, open them, and start working on each line.
        input_file = os.path.join(path, f)
        output_file = os.path.join(path, generated_output_file)

        with open(input_file, "r") as fi, open(output_file, "w") as fo:
            for line in fi:

    # I realize that the examples are not regex, but they are in my real data.
    # The important thing, is that each of these is a substitution.
                line = re.sub(r"dog","cat" , line)
                line = re.sub(r"123", "789" , line)
                # Etc.

    # Obviously this doesn't work, because it is only writing the last instance of line.
                fo.write(line)
                fo.close()

推荐答案

这是您要找的吗?

不幸的是，您没有指定如何知道应该应用哪些正则表达式，所以我将它们放入元组列表中(第一个元素是正则表达式，第二个元素是替换文本).

Unfortunately you didn't specify how you know which regexes are supposed to be applied, so I put them into a list of tuples (first element is the regex, second is the replacement text).

import os, os.path, re

path = "/Users/mypath/testData"
myfiles = os.listdir(path)
# its much faster if you compile your regexes before you
# actually use them in a loop
REGEXES = [(re.compile(r'dog'), 'cat'),
           (re.compile(r'123'), '789')]
for f in myfiles:
    # split the filename and file extension for use in
    # renaming the output file
    file_name, file_extension = os.path.splitext(f)
    generated_output_file = file_name + "_regex" + file_extension

    # As l4mpi said ... if odt is zipped, you'd need to unzip it first
    # re.search is slower than a simple if statement
    if file_extension in ('.txt', '.doc', '.odt', '.htm', '.html'):

        # Declare input and output files, open them,
        # and start working on each line.
        input_file = os.path.join(path, f)
        output_file = os.path.join(path, generated_output_file)

        with open(input_file, "r") as fi, open(output_file, "w") as fo:
            for line in fi:
                for search, replace in REGEXES:
                    line = search.sub(replace, line)
                fo.write(line)
        # both the input and output files are closed automatically
        # after the with statement closes

这篇关于使用python在多个文件中进行多个正则表达式替换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用python在多个文件中进行多个正则表达式替换 [英] multiple regex substitution in multiple files using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python在多个文件中进行多个正则表达式替换 [英] multiple regex substitution in multiple files using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭