在csv文件中标记重复 [英] marking duplicates in a csv file
本文介绍了在csv文件中标记重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我遇到以下示例中所示的问题:
ID,NAME,PHONE ,REF,DISCARD
pre>
1,JOHN,12345 ,,
2,PETER,6232 ,,
3,JON,12345 ,,
4,PETERSON,6232 ,,
5,ALEX,7854 ,,
6,JON,12345 ,,
我要检测列PHONE中的重复项,并使用列REF标记后续重复项,其值指向DISCARD列
的值为Yes。
ID,NAME,PHONE REF,DISCARD
1,JOHN,12345,1,
2,PETER,6232,2,
3,JON,12345,1,是
4,PETERSON,6232,2,是
5,ALEX,7854 ,,
6,JON,12345,1,是 b $ b那么,我该怎么办呢?
我试过这个代码,但我的逻辑是不对的,当然。import csv
myfile = open(C:\Users\Eduardo\Documents\TEST2.csv,rb)
myfile1 = open(C:\Users\Eduardo\Documents\TEST2。 csv,rb)
dest = csv.writer(open(C:\Users\Eduardo\Documents\TESTFIXED.csv,wb),dialect = excel)
reader = csv.reader(myfile)
verum = list(reader)
verum.sort(key = lambda x:x [2])
for i,enumerate(verum)中的行:
if row [2] == verum [i] [2]:
verum [i] [3] = row [0]
print verum
您的指导和帮助将非常感激。
解决方案在运行时,你必须在内存中保存的唯一一个地址是电话号码到其ID的地图。
map = {}
打开(r'c:\temp\input.csv','r')as fin:
reader = csv.reader(fin)
with open(r'c:\temp\output.csv','w')as fout:
writer = csv.writer
#如果文件没有标题行,省略此操作
writer.writerow(下一个(阅读器))
读取器中的行:
(id,name,phone,ref,discard )= row
if map.has_key(phone):
ref = map [phone]
discard =YES
else:
map [phone] = id
writer.writerow((id,name,phone,ref,discard))
I'm stumped with a problem illustrated in the sample below:
"ID","NAME","PHONE","REF","DISCARD" 1,"JOHN",12345,, 2,"PETER",6232,, 3,"JON",12345,, 4,"PETERSON",6232,, 5,"ALEX",7854,, 6,"JON",12345,,
I want to detect duplicates in column "PHONE", and mark the subsequent duplicates using the column "REF", with a value pointing to the "ID" of the first item and the value "Yes" for the "DISCARD" column
"ID","NAME","PHONE","REF","DISCARD" 1,"JOHN",12345,1, 2,"PETER",6232,2, 3,"JON",12345,1,"Yes" 4,"PETERSON",6232,2,"Yes" 5,"ALEX",7854,, 6,"JON",12345,1,"Yes"
So, how do I go about it? I tried this code but my logic wasn't right, of course.
import csv myfile = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb") myfile1 = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb") dest = csv.writer(open("C:\Users\Eduardo\Documents\TESTFIXED.csv", "wb"), dialect="excel") reader = csv.reader(myfile) verum = list(reader) verum.sort(key=lambda x: x[2]) for i, row in enumerate(verum): if row[2] == verum[i][2]: verum[i][3] = row[0] print verum
Your direction and help would be much appreciated.
解决方案The only thing you have to keep in memory while this is running is a map of phone numbers to their IDs.
map = {} with open(r'c:\temp\input.csv', 'r') as fin: reader = csv.reader(fin) with open(r'c:\temp\output.csv', 'w') as fout: writer = csv.writer(fout) # omit this if the file has no header row writer.writerow(next(reader)) for row in reader: (id, name, phone, ref, discard) = row if map.has_key(phone): ref = map[phone] discard = "YES" else: map[phone] = id writer.writerow((id, name, phone, ref, discard))
这篇关于在csv文件中标记重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文