如何只替换文件中的某些单词 [英] how to replace only certain words in a file

查看:64
本文介绍了如何只替换文件中的某些单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此程序将检查两个特定单词(例如:'ஒன்று' 和 'கோடி')是否连续出现,如果是,则将第一个单词替换为特定单词(例如:'ஒரு').我必须从文件中读取内容并将它们写入另一个文件.我使用了一个标志变量,从 2 初始化,并且仅在标志 %2==0 时才打印到文件中,这样单词就不会重复到新文件中.pgm 仅在单词位于偶数位置时才起作用,否则不起作用.我应该如何更改检查和打印.代码如下:

This program is to check if two specific words(ex: 'ஒன்று' and 'கோடி' ) occur consecutively and if so replace the first word with a particular word(ex: 'ஒரு'). I have to read the contents from a file and write them into another. I have used a flag variable, initialized from 2,and am printing into file only when flag%2==0, so that the words are not repeated into the new file. The pgm works only when the words are in even places, it doesn't work otherwise. How should i change the checking and printing. Here is the code:

filename = raw_input("enter file:")
ff = open(filename+'.rep_out','w')
with open(filename, 'r') as f: 
    for line in f:
        words = line.strip().split() 
        flag = 2
        for word1, word2 in zip(words, words[1:]): 
            if flag%2 == 0:
                if word1 == 'ஒன்று' or word1 == '1':
                    if word2 == 'கோடி'  or word2 == 'லட்சம்' or word2 == 'ஆயிரம்' :
                        #word1=word1.replace(word1,'ஒரு')
                        word1='ஒரு'
                        #ff.write(word1+" ")
                ff.write(word1+" ")
                ff.write(word2+" ")             
            flag=flag+1
f.close()
ff.close()  

推荐答案

当您的字符串操作变得如此复杂时,您很有可能使用比 Python 的字符串方法更强大的工具来做得更好.在这种情况下,使用正则表达式要容易一些:

When your string manipulations get this complex, there's a good chance you can do better with a more powerful tool than Python's string methods. In this case, it is quite a bit easier to use regular expressions:

import re

with open(filename) as f:
   with open(filename + '.rep_out') as ff:
       for line in f:
          ff.write(re.sub("ஒன்று (கோடி|லட்சம்|ஆயிரம்)", r"ஒரு \1", line))

说明:

正则表达式<代码>ஒன்று(கோடி|லட்சம்|ஆயிரம்)" 匹配ஒன்று的任何发生连续跟随任何的கோடி,லட்சம்或ஆயிரம் - 你可以扩展到尽可能多的候选第二话,你需要.

the regex "ஒன்று (கோடி|லட்சம்|ஆயிரம்)" matches any occurrence of ஒன்று followed consecutively by any of கோடி, லட்சம் or ஆயிரம் - you can extend that to as many candidate second words as you need to.

re.sub 将其替换为替换的第一个单词(ஒரு),后跟它之前找到的 same 第二个单词 - \1 告诉它将与第一组 () 匹配的位放回这里"(这就是 \1需要是一个原始字符串 - 你希望它由 re.sub 解析,而不是由 Python 的字符串文字规则解析).

re.sub replaces that with the replacement first word (ஒரு), followed by the same second word that it found before - the \1 tells it "put the bit that matched the first set of () back in here" (that \1 is why it needs to be a raw string - you want that parsed by re.sub, not by Python's string literal rules).

正如所写的那样,上面的代码假设每行上的单词都由一个空格分隔,这与您的原始代码不同,原始代码允许它们由任意数量的空格分隔,但会输出它们由一个空格隔开.要匹配该行为,您可以像这样修改上面的正则表达式:

As written, the above code assumes words on each line are separated by exactly one space, which is different from your original code that allows them to be separated by any amount of whitespace, but would output them separated by a single space. To match that behaviour, you can modify the regex above like so:

 re.sub("ஒன்று\s+(கோடி|லட்சம்|ஆயிரம்)", r"ஒரு \1", line)

\s 匹配任何空白字符,+ 表示匹配一行中的多个,只要至少有一个".

The \s matches any whitespace character, and the + means "match multiples of that in a row, as long as there is at least one".

请注意,当您使用 with open(...) as f: 时,您无需在之后调用 f.close() - 这会自动发生当您退出 with 块时.

Note that when you use with open(...) as f:, you don't need to call f.close() afterwards - that happens automatically when you exit the with block.

这篇关于如何只替换文件中的某些单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆