删除包含2个单词的引号并删除它们之间的逗号 [英] Remove quotes holding 2 words and remove comma between them

查看:114
本文介绍了删除包含2个单词的引号并删除它们之间的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

中进行后续操作在引号中2个单词之间替换符号

扩展输入和预期输出:

尝试将第二行中的 Durango和PC 中的两个单词之间的逗号替换为& ,然后也删除引号。 Orbis和PC 第四行的行中有两个单词组合,我要处理 AAA-Character Tech,SOF-UPI, Durango ,Orbis,PC

trying to replace comma between 2 words Durango and PC in the second line by & and then remove the quotes " as well. Same for third line with Orbis and PC and 4th line has 2 word combos in quotes that I would like to process "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"

我想使用Python保留其余的行。

I would like to retain the rest of the lines using Python.

输入

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened
3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For
...
... 
...

像这些一样,可以有100行在我的样本中。因此,预期输出为:

Like these, there can be 100 lines in my sample. So the expected output is:

2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For
...
...
...

到目前为止,我可以考虑逐行阅读,然后如果该行包含引号,则将其替换为没有字符,但是替换里面的符号是我所坚持的。

So far, I could think of reading line by line and then if the line contains quote replace it with no character but then replacement of symbol inside is something I am stuck with.

这是我现在所拥有的:

for line in lines:
            expr2 =  re.findall('"(.*?)"', line)
            if len(expr2)!=0:
                expr3 = re.split('"',line)
                expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2]
                print >>k, expr4
            else:
                print >>k, line

,但不考虑第4行的情况吗?超过3个连击。例如,

but it does not consider the case in 4th line? There can be more than 3 combos as well. For eg.

3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open 

,并希望将其设为
3,SIN-Audio,AAA-Audio& xx xx& yyyy,Orbis& PC,13和22,打开

如何实现这一点,有什么建议吗?学习Python。

How to achieve this, any suggestion? Learning Python.

推荐答案

因此,通过将输入文件视为 .csv 我们可以轻松地将线条变成易于使用的东西。

So, by treating the input file as a .csv we can easily turn the lines into something easy to work with.

例如,

2,Cenny Chong,核心技术-渲染,Durango& PC,55,重新打开

读取为:

['2','Kenny Chong','Core Tech-Rendering','Durango,PC','55','Reopened']

然后,用 _& (空格)替换所有实例, :

Then, by replacing all instances of , with _& (space) we would have the line:

['2','Kenny Chong','Core Tech-Rendering','Durango& PC, 55,重新打开]

它替换了 s在一行中,最后写入时,我们不再有原始的双引号。

And it replaces multiple instances of ,s within a line, and when finally writing we no longer have the original double quotes.

这里是代码,考虑到 .txt 是您的输入文件,它将写入 out.txt

Here is the code, given that in.txt is your input file and it will write to out.txt.

import csv

with open('in.txt') as infile:
    reader = csv.reader(infile)

    with open('out.txt', 'w') as outfile:
        for line in reader:
            line = list(map(lambda s: s.replace(',', ' &'), line))
            outfile.write(','.join(line) + '\n')

第四行输出为:

LTY-168499,[PC] [PS4] [XB1]缺少来自Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA的纹理-Character Tech& SOF-UPI,Durango&奥比斯PC,29,正在等待

这篇关于删除包含2个单词的引号并删除它们之间的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆