如何在Python中删除文件中的重复行 [英] how to delete duplicate lines in a file in Python

查看:992
本文介绍了如何在Python中删除文件中的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个重复行的文件。我想要的是删除一个副本有一个独特的文件行。但我得到一个错误 output.writelines(uniquelines(filelines))
TypeError:writelines()参数必须是字符串序列

我已经搜索了相同的问题,但我仍然不明白什么是错的。
我的代码:

pre $ def uniquelines(lineslist):
unique = {}
result
如果item.strip()在唯一:继续
unique [item.strip()] = 1
result.append(item)
返回结果
file1 = codecs.open('organizations.txt','r +','cp1251')
filelines = file1.readlines()
file1.close()
output = open(wordlist_unique.txt,w)
output.writelines(uniquelines(filelines))
output.close()


解决方案

代码使用不同的打开方式: codecs.open 当它读取时,打开当它写入。



readlines codecs.open 创建的文件对象的c>返回unicode字符串列表。使用打开创建的文件对象的 writelines 期望一串(字节)字符串



替换以下行:

  output = open(wordlist_unique.txt,w) 
output.writelines(uniquelines(filelines))
output.close()

with:

  output = codecs.open(wordlist_unique.txt,w,cp1251)
output.writelines(uniquelines(filelines))
output.close()

(使用语句):

  with codecs.open(wordlist_unique输出:
output.writelines(uniquelines(filelines))


I have a file with duplicate lines. What I want is to delete one duplicate to have a file with unique lines. But i get an error output.writelines(uniquelines(filelines)) TypeError: writelines() argument must be a sequence of strings I have searched the same issues but i still don-t understand what is wrong. My code:

def uniquelines(lineslist):
    unique = {}
    result = []
    for item in lineslist:
        if item.strip() in unique: continue
        unique[item.strip()] = 1
        result.append(item)
    return result
file1 = codecs.open('organizations.txt','r+','cp1251')
filelines = file1.readlines()
file1.close()
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()

解决方案

The code uses different open: codecs.open when it reads, open when it writes.

readlines of file object created using codecs.open returns list of unicode strings. While writelines of file objects create using open expect a sequence of (bytes) strings.

Replace following lines:

output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()

with:

output = codecs.open("wordlist_unique.txt", "w", "cp1251")
output.writelines(uniquelines(filelines))
output.close()

or preferably (using with statement):

with codecs.open("wordlist_unique.txt", "w", "cp1251") as output:
    output.writelines(uniquelines(filelines))

这篇关于如何在Python中删除文件中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆