在python中删除大文本文件中的特定行 [英] Remove specific lines from a large text file in python

查看：701 发布时间：2018/5/28 19:13:46 python text grep

本文介绍了在python中删除大文本文件中的特定行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有几个大文本文本文件都具有相同的结构，我想删除前3行，然后从第4行删除非法字符。我不希望读取整个数据集，然后修改每个文件超过100MB并记录超过400万条记录。

 范围150.0dB -64.9dBm 
移动单元1基数-17.19968 145.40369 999.8 
固定单位2移动-17.20180 145.29514 533.0 
纬度经度Rx（dB）最佳单位
 -17.06694 145.23158 -050.5 2 
 -17.06695 145.23297 -044.1 2

所以1,2和3行应该被删除，并且在第4行中，Rx（db）应该只是Rx并且最佳单位被更改为Best_Unit。然后，我可以使用其他脚本对数据进行地理编码。

我不能使用像grep这样的命令行程序（，因为前三行不完全相同 - 数字（例如150.0dB，-64 *）将在每个文件中发生变化，因此您必须删除整行1-3，然后grep或类似的可以在第4行进行搜索替换。

感谢你们，

===编辑新的pythonic方式来处理来自@heltonbiker的大文件。错误。

  import os，re 
 ## infile = arcpy.GetParameter（0）
 ## chunk_size = arcpy.GetParameter（1）＃每个数据集中记录的数量
 
 infile ='trc_emerald.txt'
 fc =打开（infile）
名称= infile [：infile .rfind（'。'）] 
 outfile = Name +'_ db.txt'
 
 line4 = fc.readlines（100）[3] 
 line4 = re.sub（' \（[^ \）]。*？\）'，''，line4）
 line4 = re.sub（'Best（\s。*？）'，'Best_'，line4） 
 newfilestring =''.join（line4 + [line for line in fc.readlines [4：]]）
 fc.close（）
 newfile = open（outfile，'w'） 
 newfile.write（newfilestring）
 newfile.close（）
 $ b $ del行
 del outfile 
 del名称
 #return chunk_size，fl 
＃arcpy.SetParameterAsText（2，fl）
 printCompleted

追溯（最近一次调用最后一次）：文件P：\2012\Job_044_ DM_Radio_Propogation \Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py，
line 13，in
newfilestring =''.join（line4 + [line for line in fc.readlines [ 4：]]）TypeError：'builtin_function_or_method'对象是
unsubscriptable

解决方案

正如wim在评论中所说， sed 是正确的工具。

  sed -i -e'4 s /（dB）//'-e '4 s / Best Unit / Best_Unit /'-e'1,3 d'yourfile.whatever

稍微解释一下这个命令：

-i 就地执行该命令，也就是将输出写回进入输入文件

$ -c $ -c $执行一个命令

< '4 s /（dB）//'在线 4 ，替换' ' for '（dB）'

 
 
  '4 s / Best Unit / Best_Unit /'与上面相同，但不同的查找和替换字符串除外
 
 
  '1,3 d'从第1行到第3行（含）删除整行 
 
 
  sed 是一个非常强大的工具，它可以做的不仅仅是这一点，值得学习。
 
I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don't want to have to read the entire dataset and then modify as each file is over 100MB with over 4 million records.
Range   150.0dB -64.9dBm
Mobile unit 1   Base    -17.19968    145.40369  999.8
Fixed unit  2   Mobile  -17.20180    145.29514  533.0
Latitude    Longitude   Rx(dB)  Best unit
-17.06694    145.23158  -050.5  2
-17.06695    145.23297  -044.1  2
So lines 1,2 and 3 should be deleted and in line 4, "Rx(db)" should be just "Rx" and "Best Unit" be changed to "Best_Unit". Then I can use my other scripts to geocode the data.

I can't use commandline programs like grep (as in this question) as the first 3 lines are not all the same -the numbers (such as 150.0dB, -64*) will change in each file so you have to just delete the whole of lines 1-3 and then grep or similar can do the search-replace on line 4.

Thanks guys,

=== EDIT new pythonic way to handle larger files from @heltonbiker. Error.
import os, re
##infile = arcpy.GetParameter(0)
##chunk_size = arcpy.GetParameter(1) # number of records in each dataset

infile='trc_emerald.txt'
fc= open(infile)
Name = infile[:infile.rfind('.')]
outfile = Name+'_db.txt'

line4 = fc.readlines(100)[3]
line4 = re.sub('\([^\)].*?\)', '', line4)
line4 = re.sub('Best(\s.*?)', 'Best_', line4)
newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])
fc.close()
newfile = open(outfile, 'w')
newfile.write(newfilestring)
newfile.close()

del lines
del outfile
del Name
#return chunk_size, fl
#arcpy.SetParameterAsText(2, fl)
print "Completed"



  

    

      

        Traceback (most recent call last):   File "P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py",
        line 13, in 
            newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]]) TypeError: 'builtin_function_or_method' object is
        unsubscriptable
      

    

  


 解决方案 
As wim said in the comments, sed is the right tool for this. The following command should do what you want:
sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever
To explain the command a little:

-i executes the command in place, that is it writes the output back into the input file

-e execute a command

'4 s/(dB)//' on line 4, subsitute '' for '(dB)'

'4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings

'1,3 d' from line 1 to line 3 (inclusive) delete the entire line

sed is a really powerful tool, which can do much more than just this, well worth learning.

                        这篇关于在python中删除大文本文件中的特定行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在python中删除大文本文件中的特定行 [英] Remove specific lines from a large text file in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python中删除大文本文件中的特定行 [英] Remove specific lines from a large text file in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭