在python中删除大文本文件中的特定行 [英] Remove specific lines from a large text file in python
问题描述
我有几个大文本文本文件都具有相同的结构,我想删除前3行,然后从第4行删除非法字符。我不希望读取整个数据集,然后修改每个文件超过100MB并记录超过400万条记录。
范围150.0dB -64.9dBm
移动单元1基数-17.19968 145.40369 999.8
固定单位2移动-17.20180 145.29514 533.0
纬度经度Rx(dB)最佳单位
-17.06694 145.23158 -050.5 2
-17.06695 145.23297 -044.1 2
所以1,2和3行应该被删除,并且在第4行中,Rx(db)应该只是Rx并且最佳单位被更改为Best_Unit。然后,我可以使用其他脚本对数据进行地理编码。
我不能使用像grep这样的命令行程序(,因为前三行不完全相同 - 数字(例如150.0dB,-64 *)将在每个文件中发生变化,因此您必须删除整行1-3,然后grep或类似的可以在第4行进行搜索替换。
感谢你们,
===编辑新的pythonic方式来处理来自@heltonbiker的大文件。错误。
import os,re
## infile = arcpy.GetParameter(0)
## chunk_size = arcpy.GetParameter(1)#每个数据集中记录的数量
infile ='trc_emerald.txt'
fc =打开(infile)
名称= infile [:infile .rfind('。')]
outfile = Name +'_ db.txt'
line4 = fc.readlines(100)[3]
line4 = re.sub(' \([^ \)]。*?\)','',line4)
line4 = re.sub('Best(\s。*?)','Best_',line4)
newfilestring =''.join(line4 + [line for line in fc.readlines [4:]])
fc.close()
newfile = open(outfile,'w')
newfile.write(newfilestring)
newfile.close()
$ b $ del行
del outfile
del名称
#return chunk_size,fl
#arcpy.SetParameterAsText(2,fl)
printCompleted
追溯(最近一次调用最后一次):文件P:\2012\Job_044_ DM_Radio_Propogation \Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py,
line 13,in
newfilestring =''.join(line4 + [line for line in fc.readlines [ 4:]])TypeError:'builtin_function_or_method'对象是
unsubscriptable
正如wim在评论中所说, sed
是正确的工具。
sed -i -e'4 s /(dB)//'-e '4 s / Best Unit / Best_Unit /'-e'1,3 d'yourfile.whatever
稍微解释一下这个命令:
-i
就地执行该命令,也就是将输出写回进入输入文件
$ -c $ -c $执行一个命令
< '4 s /(dB)//'
在线 4
,替换' ' for
'(dB)'
'4 s / Best Unit / Best_Unit /'
与上面相同,但不同的查找和替换字符串除外
'1,3 d'
从第1行到第3行(含)删除整行
sed
是一个非常强大的工具,它可以做的不仅仅是这一点,值得学习。
I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don't want to have to read the entire dataset and then modify as each file is over 100MB with over 4 million records.
Range 150.0dB -64.9dBm
Mobile unit 1 Base -17.19968 145.40369 999.8
Fixed unit 2 Mobile -17.20180 145.29514 533.0
Latitude Longitude Rx(dB) Best unit
-17.06694 145.23158 -050.5 2
-17.06695 145.23297 -044.1 2
So lines 1,2 and 3 should be deleted and in line 4, "Rx(db)" should be just "Rx" and "Best Unit" be changed to "Best_Unit". Then I can use my other scripts to geocode the data.
I can't use commandline programs like grep (as in this question) as the first 3 lines are not all the same -the numbers (such as 150.0dB, -64*) will change in each file so you have to just delete the whole of lines 1-3 and then grep or similar can do the search-replace on line 4.
Thanks guys,
=== EDIT new pythonic way to handle larger files from @heltonbiker. Error.
import os, re
##infile = arcpy.GetParameter(0)
##chunk_size = arcpy.GetParameter(1) # number of records in each dataset
infile='trc_emerald.txt'
fc= open(infile)
Name = infile[:infile.rfind('.')]
outfile = Name+'_db.txt'
line4 = fc.readlines(100)[3]
line4 = re.sub('\([^\)].*?\)', '', line4)
line4 = re.sub('Best(\s.*?)', 'Best_', line4)
newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])
fc.close()
newfile = open(outfile, 'w')
newfile.write(newfilestring)
newfile.close()
del lines
del outfile
del Name
#return chunk_size, fl
#arcpy.SetParameterAsText(2, fl)
print "Completed"
Traceback (most recent call last): File "P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py", line 13, in newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]]) TypeError: 'builtin_function_or_method' object is unsubscriptable
As wim said in the comments, sed
is the right tool for this. The following command should do what you want:
sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever
To explain the command a little:
-i
executes the command in place, that is it writes the output back into the input file
-e
execute a command
'4 s/(dB)//'
on line 4
, subsitute ''
for '(dB)'
'4 s/Best Unit/Best_Unit/'
same as above, except different find and replace strings
'1,3 d'
from line 1 to line 3 (inclusive) delete the entire line
sed
is a really powerful tool, which can do much more than just this, well worth learning.
这篇关于在python中删除大文本文件中的特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!