在使用for循环时如何防止输出文件中的重复文本 [英] how to prevent duplicate text in the output file while using for loop

查看:258
本文介绍了在使用for循环时如何防止输出文件中的重复文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个代码比较一个数字到一个数字(我在代码中我称之为项目)在域范围内,看看它是否已经存在。如果它打印到输出文件,如果不是,那么只打印一次。

I have this code which compares a number to a number(what i called item in my code) in the domain range to see if it is already there. If it its then print to the output file if it is not then only print it once.

问题如何确保如果数字不在域范围之间,然后只打印一次。 (我使用了真实的和虚假的陈述,但这不工作,因为它是假的,它会打印几个重复 - 在下面的代码我不知道如何实现,以便打印不在域范围内的数字一次多次)

Question How to make sure that if the number isn't between the domain range then print only one time. ( I used true and false statements but this doesn't work because when it is false, it would print several duplicates- on the code below i am not sure how to implement so that it print the number that not in the domain range once instead of multiple times )

for item in lookup[uniprotID]:
    for varain in wholelookup[uniprotID]:
        for names in wholeline[uniprotID]:
            statement=False
    if re.search(r'\d+',varain).group(0)==item and start <= int(item) <= end:
        result = str(int(item) - start + 1)
        if varain in names.split(' '):
            statement = True
            print ">{0} | at position {1} | start= {2}, end= {3} | description: {4} | {5}".format(uniprotID, result, start, end, varain, names)
            if statement == True:
                print(''.join(makeList[start-1:end]))


推荐答案

有些基于此的事情可能适用于您:

Something based on this might work for you:

already_seen = set()
for line in sys.stdin:
   if line not in already_seen:
      already_seen.add(line)
      sys.stdout.write(line)

不是如果你的文件很大,你可能会最终消耗大量的虚拟内存来做到这一点。如果是,请查看anydbm或bloom过滤器。

Not that if your files are large, you could end up consuming a lot of Virtual Memory doing this. If so, look into anydbm or a bloom filter.

这篇关于在使用for循环时如何防止输出文件中的重复文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆