如何在Python中删除文件中的重复行 [英] how to delete duplicate lines in a file in Python
问题描述
TypeError:writelines()参数必须是字符串序列
我已经搜索了相同的问题,但我仍然不明白什么是错的。
我的代码:
pre $
def uniquelines(lineslist):
unique = {}
result
如果item.strip()在唯一:继续
unique [item.strip()] = 1
result.append(item)
返回结果
file1 = codecs.open('organizations.txt','r +','cp1251')
filelines = file1.readlines()
file1.close()
output = open(wordlist_unique.txt,w)
output.writelines(uniquelines(filelines))
output.close()
代码使用不同的打开方式: codecs.open
当它读取时,打开
当它写入。
readlines $ c $使用
codecs.open
创建的文件对象的c>返回unicode字符串列表。使用打开
创建的文件对象的 writelines
期望一串(字节)字符串
替换以下行:
output = open(wordlist_unique.txt,w)
output.writelines(uniquelines(filelines))
output.close()
with:
output = codecs.open(wordlist_unique.txt,w,cp1251)
output.writelines(uniquelines(filelines))
output.close()
(使用和
语句):
with codecs.open(wordlist_unique输出:
output.writelines(uniquelines(filelines))
I have a file with duplicate lines. What I want is to delete one duplicate to have a file with unique lines. But i get an error output.writelines(uniquelines(filelines)) TypeError: writelines() argument must be a sequence of strings I have searched the same issues but i still don-t understand what is wrong. My code:
def uniquelines(lineslist):
unique = {}
result = []
for item in lineslist:
if item.strip() in unique: continue
unique[item.strip()] = 1
result.append(item)
return result
file1 = codecs.open('organizations.txt','r+','cp1251')
filelines = file1.readlines()
file1.close()
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()
The code uses different open: codecs.open
when it reads, open
when it writes.
readlines
of file object created using codecs.open
returns list of unicode strings. While writelines
of file objects create using open
expect a sequence of (bytes) strings.
Replace following lines:
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()
with:
output = codecs.open("wordlist_unique.txt", "w", "cp1251")
output.writelines(uniquelines(filelines))
output.close()
or preferably (using with
statement):
with codecs.open("wordlist_unique.txt", "w", "cp1251") as output:
output.writelines(uniquelines(filelines))
这篇关于如何在Python中删除文件中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!