从文本文件中删除重复 [英] Remove Duplicates from Text File

查看:118
本文介绍了从文本文件中删除重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从文本文件中删除重复的单词。



我有一些文本文件包含如下所示:

  None_None 

ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624

无None

ColumnConverter_56963312
ColumnConverter_56963312

PredicatesFactory_56963424
PredicatesFactory_56963424

PredicateConverter_56963648
PredicateConverter_56963648

ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888

结果输出需要是:

  None_None 

ConfigHandler_56663624

ColumnConverter_56963312

PredicatesFactory_56963424

PredicateConverter_56963648

ConfigHandler_80134888

I只使用这个命令:
en = set(open(' file.txt')
但它不起作用。



任何人都可以帮助我如何从文件中提取唯一的集合



谢谢

解决方案

这是关于保留订单的选项(不像一套),但是仍然有相同的行为(注意,EOL字符被故意剥离,空白行被忽略)...

 从集合导入有条件的

打开('/ home / jon / testdata.txt')作为fin:
lines =(line.rstrip()为线)
unique_lines = OrderedDict .fromkeys((行行中的行如果行))

打印unique_lines.keys()
#['None_None','ConfigHandler_56663624','ColumnConverter_56963312',PredicatesFactory_56963424','PredicateConverter_56963648 ','ConfigHandler_80134888']

然后你只需要将上面的内容写入你的输出文件。 p>

I want to remove duplicate word from a text file.

i have some text file which contain such like following:

None_None

ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624

None_None

ColumnConverter_56963312
ColumnConverter_56963312

PredicatesFactory_56963424
PredicatesFactory_56963424

PredicateConverter_56963648
PredicateConverter_56963648

ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888

The resulted output needs to be:

None_None

ConfigHandler_56663624

ColumnConverter_56963312

PredicatesFactory_56963424

PredicateConverter_56963648

ConfigHandler_80134888

I have used just this command: en=set(open('file.txt') but it does not work.

Could anyone help me with how to extract only the unique set from the file

Thank you

解决方案

Here's about option that preserves order (unlike a set), but still has the same behaviour (note that the EOL character is deliberately stripped and blank lines are ignored)...

from collections import OrderedDict

with open('/home/jon/testdata.txt') as fin:
    lines = (line.rstrip() for line in fin)
    unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )

print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']

Then you just need to write the above to your output file.

这篇关于从文本文件中删除重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆