基于在多个文件中列出了多个正则表达式替换 [英] Multiple regex replacements based on lists in multiple files

查看:103
本文介绍了基于在多个文件中列出了多个正则表达式替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有使用多个替换名单看起来像这里面的多个文本文件,我需要处理和格式的文件夹:

I have a folder with multiple text files inside that I need to process and format using multiple replacement lists looking like this:

old string1~new string1
old string2~new string2
etc~blah

我运行更换列出每个更换一对那些多个文本文件的每一行。现在我有一组Python脚本来执行此操作。让我感到奇怪的是将它使code简单,更好的维护,如果我切换到awk或者sed?这将是一个更好的解决方案,或者我应该更好的提高我的Python code?我问,因为收到的文本文件来定期,经常有一点点不同的结构比以前一样的错误,拼写错误,多个空格,如正在由人类创造的这些文件。所以,我必须不断地调整我的code和更换名单,使其正常工作。
谢谢你。

I run each replacement pair from replacement lists on each line of those multiple text files. Now I have a set of python scripts to perform this operation. What I wonder about is will it make the code simpler and better maintainable if I switch to sed or awk? Will it be a better solution or should I better improve my Python code? I ask because incoming text files come on regular basis and often have a little different structure than it was before, like mistakes, misspellings, multiple spaces, as these files are being created by humans. So I have to constantly tweak my code and replacement lists to make it work properly. Thanks.

推荐答案

除非你的Python code是非常糟糕的,它是不太可能切换到awk将使其更易于维护。这就是说,它是pretty在AWK简单,但不能很好地扩展:

Unless your python code is really bad, it is not likely that switching to awk will make it more maintainable. That said, it's pretty simple in awk, but does not scale well:

cat replacement-list-files* | awk 'FILENAME == "-" { 
  split( $0, a, "~" ); repl[ a[1] ] = a[2]; next }
  { for( i in repl ) gsub( i, repl[i] ) }1' - input-file

请注意,这个工作在一个文件上的时间。替换 1 的东西,如 {打印> (文件名。新)} 在多个文件的工作,但你必须处理关闭文件,如果你想在大量的文件工作,并迅速成为一个不可维护的混乱。与Python坚持,如果你已经有了一个可行的解决方案。

Note that this works on one file at a time. Replace 1 with something like { print > ( FILENAME ".new" ) } to work on multiple files, but then you have to deal with closing the files if you want to work on a large number of files, and it quickly becomes an unmaintainable mess. Stick with Python if you already have a working solution.

这篇关于基于在多个文件中列出了多个正则表达式替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆