如何查找和替换文本文件中的多行? [英] How to find and replace multiple lines in text file?
问题描述
我有三个文本文件:
data.txt
, find.txt
和 replace.txt
。现在, find.txt
包含我要在 data.txt
中搜索的几行,并用内容在 replace.txt
中。这里有一个简单的例子:data.txt
$ b
code>南瓜
苹果
香蕉
樱桃
喜马拉雅
骷髅
苹果
香蕉
樱桃
西瓜
水果
find.txt
apple
banana
cherry
replace.txt
1
2
3
所以,在上面的例子中,我想搜索所有苹果
,香蕉
和樱桃
并替换那些行与 1,2,3
。
我对正确的方法有一些麻烦,因为我的 data.txt
约为1MB,所以我想尽可能高效。一个愚蠢的方法是把所有东西都连接成一个长字符串,并使用替换
,然后输出到一个新的文本文件,所有的换行符都会被恢复。
import re
$ p $但是对于像我这样的大型数据文件来说,这似乎过于复杂而且效率不高。此外,
data = open(data.txt,'r')
find = open (find.txt,r)
replace = open(replace.txt,'r')
data_str =
find_str =
换行符=
用于数据行:#将它连接成一个长字符串
data_str + =行
用于find中的行:#concatenate它变成一个长字符串
find_str + =行
换行:
replace_str + =行
new_data = data_str。替换(查找,替换)
new_file = open(new_data.txt,w)
new_file.write(new_data)
替换
函数似乎不推荐使用,所以这不好。另外一种方法是逐行并保持跟踪你找到匹配的线。
类似于这样:
location = 0
LOOP1:
用于查找find_line:
用于i,data_line用于枚举(数据).startingAtLine(位置):
如果find_line == data_line:
location = i#找到的可能性
为范围内的idx(NUMBER_LINES_IN_FIND):
如果find_line [idx]!= data_line [idx + location]#逐行比较
#如果后续行不匹配,则返回并重新搜索
goto LOOP1
不完整的代码,我知道。我甚至不知道是否可以通过特定行上或特定行之间的某一行搜索文件,但是再一次,我对这一切的逻辑有点困惑。什么是最好的方法来做到这一点?
谢谢!
解决方案到
读取
和写入
一次一行,所以整个事情不是
#创建一个查找键的字典并替换值
findlines = open('find ('\\\
')
replacelines = open('replace.txt')。read()。split('\\\
')
find_replace = dict(findlines,replacelines))
with open('data.txt')as data:
with open('new_data.txt','w')as new_data:
为数据行:
为find_replace中的键:
如果键入行:
line = line.replace(key,find_replace [key])
new_data.write (line)
编辑:我将代码更改为
read ('\\\
而不是
')readliens()
so\\\
isn T包括在查找和替换字符串中
I am running Python 2.7.
I have three text files:
data.txt
,find.txt
, andreplace.txt
. Now,find.txt
contains several lines that I want to search for indata.txt
and replace that section with the content inreplace.txt
. Here is a simple example:data.txt
pumpkin apple banana cherry himalaya skeleton apple banana cherry watermelon fruit
find.txt
apple banana cherry
replace.txt
1 2 3
So, in the above example, I want to search for all occurences of
apple
,banana
, andcherry
in the data and replace those lines with1,2,3
.I am having some trouble with the right approach to this as my
data.txt
is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and usereplace
, and then output to a new text file so all the line breaks will be restored.import re data = open("data.txt", 'r') find = open("find.txt", 'r') replace = open("replace.txt", 'r') data_str = "" find_str = "" replace_str = "" for line in data: # concatenate it into one long string data_str += line for line in find: # concatenate it into one long string find_str += line for line in replace: replace_str += line new_data = data_str.replace(find, replace) new_file = open("new_data.txt", "w") new_file.write(new_data)
But this seems so convoluted and inefficient for a large data file like mine. Also, the
replace
function appears to be deprecated so that's not good.Another way is to step through the lines and keep a track of which line you found a match.
Something like this:
location = 0 LOOP1: for find_line in find: for i, data_line in enumerate(data).startingAtLine(location): if find_line == data_line: location = i # found possibility for idx in range(NUMBER_LINES_IN_FIND): if find_line[idx] != data_line[idx+location] # compare line by line #if the subsequent lines don't match, then go back and search again goto LOOP1
Not fully formed code, I know. I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all. What is the best way to do this?
Thanks!
解决方案If the file is large, you want to
read
andwrite
one line at a time, so the whole thing isn't loaded into memory at once.# create a dict of find keys and replace values findlines = open('find.txt').read().split('\n') replacelines = open('replace.txt').read().split('\n') find_replace = dict(zip(findlines, replacelines)) with open('data.txt') as data: with open('new_data.txt', 'w') as new_data: for line in data: for key in find_replace: if key in line: line = line.replace(key, find_replace[key]) new_data.write(line)
Edit: I changed the code to
read().split('\n')
instead ofreadliens()
so\n
isn't included in the find and replace strings这篇关于如何查找和替换文本文件中的多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!