如何查找和替换文本文件中的多行? [英] How to find and replace multiple lines in text file?

查看:743
本文介绍了如何查找和替换文本文件中的多行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Python 2.7。

我有三个文本文件: data.txt find.txt replace.txt 。现在, find.txt 包含我要在 data.txt 中搜索的几行,并用内容在 replace.txt 中。这里有一个简单的例子:

data.txt
$ b

  code>南瓜
苹果
香蕉
樱桃
喜马拉雅
骷髅
苹果
香蕉
樱桃
西瓜
水果

find.txt

  apple 
banana
cherry

replace.txt

  1 
2
3

所以,在上面的例子中,我想搜索所有苹果香蕉樱桃并替换那些行与 1,2,3



我对正确的方法有一些麻烦,因为我的 data.txt 约为1MB,所以我想尽可能高效。一个愚蠢的方法是把所有东西都连接成一个长字符串,并使用替换,然后输出到一个新的文本文件,所有的换行符都会被恢复。

  import re 

data = open(data.txt,'r')
find = open (find.txt,r)
replace = open(replace.txt,'r')

data_str =
find_str =
换行符=

用于数据行:#将它连接成一个长字符串
data_str + =行

用于find中的行:#concatenate它变成一个长字符串
find_str + =行

换行:
replace_str + =行


new_data = data_str。替换(查找,替换)
new_file = open(new_data.txt,w)
new_file.write(new_data)
替换函数似乎不推荐使用,所以这不好。

另外一种方法是逐行并保持跟踪你找到匹配的线。



类似于这样:

  location = 0 

LOOP1:
用于查找find_line:
用于i,data_line用于枚举(数据).startingAtLine(位置):
如果find_line == data_line:
location = i#找到的可能性

为范围内的idx(NUMBER_LINES_IN_FIND):
如果find_line [idx]!= data_line [idx + location]#逐行比较
#如果后续行不匹配,则返回并重新搜索
goto LOOP1

不完整的代码,我知道。我甚至不知道是否可以通过特定行上或特定行之间的某一行搜索文件,但是再一次,我对这一切的逻辑有点困惑。什么是最好的方法来做到这一点?



谢谢!

解决方案

读取写入 一次一行,所以整个事情不是

 #创建一个查找键的字典并替换值
findlines = open('find ('\\\
')
replacelines = open('replace.txt')。read()。split('\\\
')
find_replace = dict(findlines,replacelines))

with open('data.txt')as data:
with open('new_data.txt','w')as new_data:
为数据行:
为find_replace中的键:
如果键入行:
line = line.replace(key,find_replace [key])
new_data.write (line)

编辑:我将代码更改为 read ('\\\
')
而不是 readliens() so \\\
isn T包括在查找和替换字符串中

I am running Python 2.7.

I have three text files: data.txt, find.txt, and replace.txt. Now, find.txt contains several lines that I want to search for in data.txt and replace that section with the content in replace.txt. Here is a simple example:

data.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt

apple
banana
cherry

replace.txt

1
2
3

So, in the above example, I want to search for all occurences of apple, banana, and cherry in the data and replace those lines with 1,2,3.

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and use replace, and then output to a new text file so all the line breaks will be restored.

import re

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine. Also, the replace function appears to be deprecated so that's not good.

Another way is to step through the lines and keep a track of which line you found a match.

Something like this:

location = 0

LOOP1: 
for find_line in find:
    for i, data_line in enumerate(data).startingAtLine(location):
        if find_line == data_line:
            location = i # found possibility

for idx in range(NUMBER_LINES_IN_FIND):
    if find_line[idx] != data_line[idx+location]  # compare line by line
        #if the subsequent lines don't match, then go back and search again
        goto LOOP1

Not fully formed code, I know. I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all. What is the best way to do this?

Thanks!

解决方案

If the file is large, you want to read and write one line at a time, so the whole thing isn't loaded into memory at once.

# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))

with open('data.txt') as data:
    with open('new_data.txt', 'w') as new_data:
        for line in data:
            for key in find_replace:
                if key in line:
                    line = line.replace(key, find_replace[key])
            new_data.write(line)

Edit: I changed the code to read().split('\n') instead of readliens() so \n isn't included in the find and replace strings

这篇关于如何查找和替换文本文件中的多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆