Python 2.7 - 使用字典从文本文件中查找和替换到新文本文件 [英] Python 2.7 - find and replace from text file, using dictionary, to new text file

查看:44
本文介绍了Python 2.7 - 使用字典从文本文件中查找和替换到新文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编程新手,过去几个月一直在业余时间学习 Python.我决定尝试创建一个小脚本,将美式拼写转换为文本文件中的英文拼写.

I am newbie to programming, and have been studying python in my spare time for the past few months. I decided I was going to try and create a little script that converts American spellings to English spellings in a text file.

过去 5 个小时我一直在尝试各种各样的事情,但最终想出了一些让我更接近目标的东西,但还没有完全实现!

I have been trying all sorts of things for the past 5 hours, but eventually came up with something that got me somewhat closer to my goal, but not quite there!

#imported dictionary contains 1800 english:american spelling key:value pairs. 
from english_american_dictionary import dict


def replace_all(text, dict):
    for english, american in dict.iteritems():
        text = text.replace(american, english)
    return text


my_text = open('test_file.txt', 'r')

for line in my_text:
    new_line = replace_all(line, dict)
    output = open('output_test_file.txt', 'a')
    print >> output, new_line

output.close()

我确信有更好的方法来解决问题,但是对于这个脚本,我遇到了以下问题:

I am sure there is a considerably better way to go about things, but for this script,here are the issues I am having:

  • 在输出文件中,每行写入一行,中间有一个换行符,但原始的 test_file.txt 没有这个.本页底部显示的 test_file.txt 的内容
  • 只有一行中美式拼写的第一个实例才会转换为英语.
  • 我并不真的想以追加模式打开输出文件,但无法找出此代码结构中的 'r'.

感谢这位热心的新手的任何帮助!

Any help appreciated for this eager newb!

test_file.txt 的内容是:

The contents of the test_file.txt are:

I am sample file.
I contain an english spelling: colour.
3 american spellings on 1 line: color, analyze, utilize.
1 american spelling on 1 line: familiarize.

推荐答案

您看到的额外空行是因为您正在使用 print 写出一行已经包含换行符的行结尾.由于 print 也编写了自己的换行符,因此您的输出将变为双倍行距.一个简单的解决方法是使用 outfile.write(new_line) 代替.

The extra blank line you are seeing is because you are using print to write out a line that already includes a newline character at the end. Since print writes its own newline too, your output becomes double spaced. An easy fix is to use outfile.write(new_line) instead.

至于文件模式,问题在于您一遍又一遍地打开输出文件.您应该在开始时打开它一次.使用 with 语句来处理打开文件通常是一个好主意,因为它们会在您完成后为您关闭它们.

As for the file modes, the issue is that you're opening the output file over and over. You should just open it once, at the start. Its usually a good idea to use with statements to handle opening files, since they'll take care of closing them for you when you're done with them.

我不明白你的另一个问题,只是发生了一些替换.您的字典是否缺少 'analyze''utilize' 的拼写?

I don't undestand your other issue, with only some of the replacements happening. Is your dictionary missing the spellings for 'analyze' and 'utilize'?

我提出的一个建议是不要逐行进行替换.您可以使用 file.read() 一次性读取整个文件,然后将其作为一个单元进行处理.这可能会更快,因为它不需要在您的拼写词典中的项目上经常循环(一次,而不是每行一次):

One suggestion I'd make is to not do your replacements line by line. You can read the whole file in at once with file.read() and then work on it as a single unit. This will probably be faster, since it won't need to loop as often over the items in your spelling dictionary (just once, rather than once per line):

with open('test_file.txt', 'r') as in_file:
    text = in_file.read()

with open('output_test_file.txt', 'w') as out_file:
    out_file.write(replace_all(text, spelling_dict))

为了让您的代码正确处理包含其他词的词(例如整个"包含轮胎"),您可能需要放弃简单的 str.replace 方法,转而使用正则表达式.

To make your code correctly handle words that contain other words (like "entire" containing "tire"), you probably need to abandon the simple str.replace approach in favor of regular expressions.

这是一个使用 re.sub 的快速组合解决方案,给出了从美式英语到英式英语的拼写变化的字典(即,按照您当前字典的相反顺序):

Here's a quickly thrown together solution that uses re.sub, given a dictionary of spelling changes from American to British English (that is, in the reverse order of your current dictionary):

import re

#from english_american_dictionary import ame_to_bre_spellings
ame_to_bre_spellings = {'tire':'tyre', 'color':'colour', 'utilize':'utilise'}

def replacer_factory(spelling_dict):
    def replacer(match):
        word = match.group()
        return spelling_dict.get(word, word)
    return replacer

def ame_to_bre(text):
    pattern = r'\b\w+\b'  # this pattern matches whole words only
    replacer = replacer_factory(ame_to_bre_spellings)
    return re.sub(pattern, replacer, text)

def main():
    #with open('test_file.txt') as in_file:
    #    text = in_file.read()
    text = 'foo color, entire, utilize'

    #with open('output_test_file.txt', 'w') as out_file:
    #    out_file.write(ame_to_bre(text))
    print(ame_to_bre(text))

if __name__ == '__main__':
    main()

这种代码结构的一个好处是,如果您将字典以其他顺序传递给 replacer_factory 函数,您可以轻松地将英式英语拼写转换回美式英语拼写.

One nice thing about this code structure is that you can easily convert from British English spellings back to American English ones, if you pass a dictionary in the other order to the replacer_factory function.

这篇关于Python 2.7 - 使用字典从文本文件中查找和替换到新文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆