Python同时逐行迭代两个文件 [英] Python iterating through two files by line at the same time

查看:425
本文介绍了Python同时逐行迭代两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较两个文件中的列以查看值是否匹配,如果匹配,我想合并/连接该行的数据。我的问题是,当分别从两个文件中逐行读取时,我无法让python一起遍历文件并寻找匹配。相反,它将在一个文件中正确迭代,并在第二个文件中多次迭代... ...

I am trying to compare columns in two files to see if the values match, and if there is a match I want to merge/concatenate the data for that row together. My issue is that when reading line by line from the two files separately, I can't get python to iterate through the files together and look for a match. Instead it will iterate properly through one file and iterate over the same line in the second file multiple times...

我过去遇到过这个问题,但仍然没有找到了解决方法。我知道缩进是一个问题,因为我通过使用for a in line in a line in b使得循环混乱,所以我认为我在下面尝试的东西可以工作但它没有。我一直在寻找解决方案,但似乎没有人使用相同的方法,所以我想知道我是否完全偏离了如何做到这一点?任何人都可以解释什么是更好的方法来做到这一点,以及我的方法是否会起作用,如果没有,为什么不呢?谢谢,非常感谢!

I have had this issue in the past and still not really found a way around it. I know that indentation is one problem since I mess with the loop by using "for line in a, for line in b" so I thought that what I tried below would work but it hasn't. I have looked around for solutions but nobody seems to be using the same method so I wonder if I am completely off track as to how to do this? Can anyone explain what is a better way to do this, and whether my method would work at all and if not, why not? Thanks, it is much appreciated!

这些是我的两个文件的格式,基本上我想比较两个文件中的列文件名,如果它们匹配我想要合并这些行在一起。

These are the formats of my two files, basically I want to compare the columns filename in both files and if they match I want to merge the rows together.

file1:
cluster_id  hypothesis_id   filename    M1_name_offset  Orientation
1   71133076    unique_name_1.png   esc_sox2_Sox1_80_4  forward
1   50099120    unique_name_4.png   hb_cebpb_ETS1_139_7 forward
1   91895576    unique_name_11.png  he_tal1_at_AC_acptr_258_11  forward

file2:
Name                Cluster_No  Pattern     filename
esc_sox2_Sox1_80    Cluster1    AP1(1N)ETS      unique_name_4.png
hb_cebpb_ETS1_139   Cluster1    CREB(1N)ETS     unique_name_11.png
he_tal1_at_AC_acptr_258 Cluster2    ETS(-1N)ZIC     unique_name_3.png

我尝试过:

for aline in file1:
    motif1 = aline.split()[2]
    for bline in file2:
        motif2 = bline.split()[-1]
            if motif1 = motif2:
                print "match", aline, bline

我也试过:

for aline in file1:
    motif1 = aline.split()[2]
for bline in file2:
    motif2 = bline.split()[-1]
        if motif1 = motif2:
            print "match", aline, bline

我也尝试过使用字符串格式,但这并没有什么区别。第一种方式错误地遍历file2,第二种方式不给我任何输出。我已经玩了很多,并尝试了各种缩进和额外的位,但我很难过如何尝试修复它!请帮助我:(

I have also tried using string formatting but that didn't make a difference. The first way iterates through file2 incorrectly and the second way doesn't give me any output. I have played around with it a lot and tried various indentations and extra bits but I am stumped as to how to even try and fix it! Please help me :(

推荐答案

使用 zip 内置函数。

Use the zip builtin function.

with open(file1) as f1, open(file2) as f2:
    for line1, line2 in zip(f1, f2):
        motif1 = line1.split()[0]
        motif2 = line2.split()[0]
        ...

请注意, zip 在python2和python3中的行为有所不同。在python2中,使用 itertools.izip 相反。

Note that zip behaves differently in python2 and python3. In python2, it would be more efficient to use itertools.izip instead.

这篇关于Python同时逐行迭代两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆