在两个文件中找到重复的单词 [英] find duplicate words in two files

查看:1359
本文介绍了在两个文件中找到重复的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文本文件。我需要检查他们内部的重复单词。有没有比这个代码更简洁的方法?

  file1 = set(line.strip()for open in open('/ home / user1 / file1.txt'))
file2 = set(line.strip()for open in open('/ home / user1 / file2.txt'))

在file1& file2:
如果行:
打印(行)


解决方案

您可以编写简洁的代码,但更重要的是您不需要创建两个集合,您可以使用 set.intersection ,这将允许您的代码工作对于较大的数据集并运行速度更快:

 打开('/ home / user1 / file1.txt')为f1,打开('/home/user1/file2.txt')为f2:
为行中的行(map(str.rstrip,f2))。intersection(map(str.rstrip,f2))):
print(line)

对于python2,使用 itertools.imap
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $作为f1,打开('/ home / user1 / file2.txt')为f2:
用于集合中的行(imap(str.rstrip,f2))。intersection(imap(str.rstrip(f2))) :
print(line)

您创建一个单独的集合,然后将其添加到迭代中迭代传递即可将文件2的tr.trstripped行作为对象,首先创建两组完整的行,然后执行交集。


I've two text files. I need to check for duplicate words inside them. Is there a way more concise than this code?

file1 = set(line.strip() for line in open('/home/user1/file1.txt'))
file2 = set(line.strip() for line in open('/home/user1/file2.txt'))

for line in file1 & file2:
    if line:
        print(line)

解决方案

You can write concise code but more importantly you don't need to create two sets, you can use set.intersection which will allow your code to work for larger data sets and run faster:

with open('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line in set(map(str.rstrip,f2)).intersection(map(str.rstrip,f2))):
        print(line)

For python2 use itertools.imap:

from itertools import imap
with open('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line in set(imap(str.rstrip,f2)).intersection(imap(str.rstrip(f2))):
        print(line)

You create a single set which is then added to iterating over the iterable passed in i.e the str.rstripped lines of file2 as oopposed to creating two full sets of lines first then doing the intersection.

这篇关于在两个文件中找到重复的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆