用Python替换文本文件行中的单个字符 [英] Replace Single Character in a line of a Text file with Python

查看:122
本文介绍了用Python替换文本文件行中的单个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,当前所有文件都具有相同的结束字符(N),该文件用于标识系统的进度.我想将结束字符更改为"Y",以防程序由于错误或其他中断而结束,以便在重新启动程序时搜索到一行的结束字符为"N",然后从那里开始工作.下面是我的代码以及文本文件中的示例.

I have a text file with all of them currently having the same end character (N), which is being used to identify progress the system makes. I want to change the end character to "Y" in case the program ends via an error or other interruptions so that upon restarting the program will search until a line has the end character "N" and begin working from there. Below is my code as well as a sample from the text file.

更新的代码:

def GeoCode():
    f = open("geocodeLongLat.txt", "a")
    with open("CstoGC.txt",'r') as file:
        print("Geocoding...")
        new_lines = []
        for line in file.readlines():
            check = line.split('~')
            print(check)
            if 'N' in check[-1]:
                geolocator = Nominatim()
                dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5] 
                address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                f.write(dot_number + '\n')
                try:
                    location = geolocator.geocode(address)
                    f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                except AttributeError:
                    try:
                        address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                        location = geolocator.geocode(address)
                        f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                    except AttributeError:
                        print("Cannot Geocode")
            check[-1] = check[-1].replace('N','Y')
        new_lines.append('~'.join(check))

    with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
        for line in new_lines:
            file.writelines(line)        

    f.close()

输出:

2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N

如您所见,仅通过使用x,我已经在跟踪我所在的行.我应该使用诸如file.readlines()之类的东西吗?

As you can see I am already keeping track of which line I am at just by using x. Should I use something like file.readlines()?

文本文档示例:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N

谢谢!

由于@idlehands而更新了代码

updated code thanks to @idlehands

推荐答案

有几种方法可以做到这一点.

There are a few ways to do this.

我最初的想法是使用tell()seek()方法返回几步,但是它很快表明,当您不在bytes中打开文件并且绝对不在其中时,您无法方便地执行此操作readlines()for循环.您可以在此处查看参考线程:

My original thought was to use the tell() and seek() method to go back a few steps but it quickly shows that you cannot do this conveniently when you're not opening the file in bytes and definitely not in a for loop of readlines(). You can see the reference threads here:

可以在其中修改行一个文件就可以了吗?
如何解决"OSError:告诉位置下一个禁用的位置" ()呼叫"

调查导致了这段代码:

with open('file.txt','rb+') as file:
    line = file.readline() # initiate the loop
    while line: # continue while line is not None
        print(line)
        check = line.split(b'~')[-1]
        if check.startswith(b'N'): # carriage return is expected for each line, strip it

            # ... do stuff ... #

            file.seek(-len(check), 1) # place the buffer at the check point
            file.write(check.replace(b'N', b'Y')) # replace "N" with "Y"
        line = file.readline() # read next line

在第一个引用的线程中,提到的答案之一可能会导致您潜在的问题,并且在读取缓冲区时直接修改缓冲区中的字节可能被认为是一个不好的主意.很多专业人士甚至可能会建议我骂我.

In the first referenced thread one of the answers mentioned this could lead you to potential problems, and directly modifying the bytes on the buffer while reading it is probably considered a bad idea™. A lot of pros probably will scold me for even suggesting it.

(如果文件大小不是很大)

with open('file.txt','r') as file:
    new_lines = []
    for line in file.readlines():
        check = line.split('~')
        if 'N' in check[-1]:

            # ... do stuff ... #

            check[-1] = check[-1].replace('N','Y')
        new_lines.append('~'.join(check))

with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
    for line in new_lines:
        file.writelines(line)

此方法首先将所有行加载到内存中,因此您可以在内存中进行修改,但不要理会缓冲区.然后,您重新加载文件并写入已更改的行.需要注意的是,从技术上讲,您是在逐行重写整个文件-即使是唯一的更改,也不只是字符串N.

This approach loads all the lines into memory first, so you do the modification in memory but leave the buffer alone. Then you reload the file and write the lines that were changed. The caveat is that technically you are rewriting the entire file line by line - not just the string N even though it was the only thing changed.

从技术上讲,您可以从一开始就以r+模式打开文件,然后在迭代完成后执行此操作(仍在with块内,但在循环之外):

Technically you could open the file as r+ mode from the onset and then after the iterations have completed do this (still within the with block but outside of the loop):

# ... new_lines.append('~'.join(check)) #
    file.seek(0)
    for line in new_lines: 
        file.writelines(line)

我不确定这与Option#1有什么区别,因为您仍在读取和修改文件.如果某个精通IO/缓冲区/内存管理的人想打扰,请这么做.

I'm not sure what distinguishes this from Option #1 since you're still reading and modifying the file in the same go. If someone more proficient in IO/buffer/memory management wants to chime in please do.

选项2a/b的缺点是,即使只剩下几行需要从"N"更新为"Y"的情况,您也总是最终会存储和重写文件中的行.

The disadvantage for Option 2a/b is that you always end up storing and rewriting the lines in the file even if you are only left with a few lines that needs to be updated from 'N' to 'Y'.

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y

如果您要说的是,在以220940开头的行上遇到了中断,该文件将变为:

And if you were to say, encountered a break at the line starting with 220940, the file would become:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N

这些方法各有利弊.尝试看看哪一种最适合您的用例.

There are pros and cons to these approaches. Try and see which one fits your use case the best.

这篇关于用Python替换文本文件行中的单个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆