如何在python中逐行并行读取两个文件? [英] How to read two files in parallel, line by line, in python?

查看:108
本文介绍了如何在python中逐行并行读取两个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力解决这一问题,但没有成功.

I've been trying to solve this issue all day without success.

我有一个原始文件",我们称它为"infile",这是我要编辑的文件. 另外,我还有另一个充当字典"的文件,我们称它为"inlist".

I have an 'original file', let's call it 'infile', which is the file I want to edit. Additionaly I have another file that functions as a 'dictionary', let's call it 'inlist'.

以下是infile的示例:

Here are examples of the infile:

PRMT6   10505   Q96LA8  HMGA1   02829   NP_665906
WDR77   14387   NP_077007   SNRPE   00548   NP_003085
NCOA3   03570   NP_858045   RELA    01241   NP_068810
ITCH    07565   Q96J02  DTX1    03991   NP_004407

还有inlist:

NP_060607   Q96LA8
NP_001244066    Q96J02
NP_077007   Q9BQA1
NP_858045   Q9Y6Q9

我当前的方法是在相应的列中拆分行,并通过现有的选项卡拆分行. 目的是读取infile的每一行并检查一些内容:

My current approach consists in splitting the lines in the respective columns, splitting the lines by the existing tabs. The objective is to read each line of the infile and check some stuff:

  1. 如果在inlist的第一列中找到了infile的第三列中的元素,请将该元素更改为inlist的第二列中的相应元素
  2. 如果在inlist的第二列中找到了infile的第三列中的元素,则不执行任何操作
  3. 与infile的第5列相同

这应该检索输出:

PRMT6   10505   Q96LA8  HMGA1   02829   Q(...)
WDR77   14387   Q9BQA1  SNRPE   00548   Q(...)
NCOA3   03570   Q9Y6Q9  RELA    01241   Q(...)
ITCH    07565   Q96J02  DTX1    03991   Q(...)

注意:并非所有代码都以Q开头

NOTE: not all codes start with Q

我曾经尝试过使用while循环,但没有成功,我很ham愧将代码发布到这里(我是编程新手,所以我不想这么早就被游戏'). 完美解决此问题的方法是:

I've tried using a while loop, but wasn't successful and I'm to ashamed to post the code here (I'm new to programming, so I don't want to get demotivated so early in the 'game'). Something that would be perfect to solve this would be:

for line in inlist #, infile: <--- THIS PART! Reading both files, splitting both files, replacing both files...
        inlistcolumns = line.split('\t')
        infilecolumns = line.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]) + "\n")
        elif inlistcolumns[0] in infilecolumns[5]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]) + "\n")
        else:
            outfile.write('\t'.join(infilecolumns) + '\n')

我们将不胜感激.谢谢!

Help would be much appreciated. Thanks!

好吧,在Sephallia和Jlengrand的提示下,我得到了:

Ok, after the hints of Sephallia and Jlengrand I got this:

for line in infile:
    try:
    # Read lines in the dictionary
        line2 = inlist.readline()
        inlistcolumns = line.split('\t')
        infilecolumns = line.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
        elif inlistcolumns[0] in infilecolumns[5]:
                outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
        else:
                    outfile.write('\t'.join(infilecolumns))
    except IndexError:
        print "End of dictionary reached. Restarting from top."

问题在于,显然if语句没有完成其工作,因为输出文件仍然等于输入文件.我该怎么办?

The problem is that apparently the if statements are not doing their job, as the output file remained equal to the input file. What can I be doing wrong?

有些人问,这里是完整的代码:

As asked by some, here goes the full code:

    import os

def replace(infilename, linename, outfilename):
    # Open original file and output file
    infile = open(infilename, 'rt')
    inlist = open(linename, 'rt')
    outfile = open(outfilename, 'wt')

    # Read lines and find those to be replaced
    for line in infile:
        infilecolumns = line.split('\t')
        line2 = inlist.readline()
        inlistcolumns = line2.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
        elif inlistcolumns[0] in infilecolumns[5]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
        outfile.write('\t'.join(infilecolumns))

    # Close files
    infile.close()
    inlist.close()
    outfile.close()


if __name__ == '__main__':
    wdir = os.getcwd()
    outdir = os.path.join(wdir, 'results.txt')
    outname = os.path.basename(outdir)
    original = raw_input("Type the name of the file to be parsed\n")
    inputlist = raw_input("Type the name of the libary to be used\n")
    linesdir = os.path.join(wdir, inputlist)
    linesname = os.path.basename(linesdir)
    indir = os.path.join(wdir, original)
    inname = os.path.basename(indir)

    replace(indir, linesdir, outdir)

    print "Successfully applied changes.\nOriginal: %s\nLibrary: %s\nOutput:%s" % (inname, linesname, outname)

要使用的第一个文件是hprdtotal.txt: https://www.dropbox .com/s/hohvlcdqvziewte/hprdmap.txt 第二个是hprdmap.txt: https://www.dropbox.com/s/9hd0e3a8rt95pao /hprdtotal.txt

The first file to be used is hprdtotal.txt: https://www.dropbox.com/s/hohvlcdqvziewte/hprdmap.txt And the second is hprdmap.txt: https://www.dropbox.com/s/9hd0e3a8rt95pao/hprdtotal.txt

希望这会有所帮助.

推荐答案

好,我发现了. 这就是我所做的:

Ok, I found it out. This is what I did:

data = {}
    for line in inlist:
        k, v = [x.strip() for x in line.split('\t')]
        data[k] = v

    for line in infile:
        infilecolumns = line.strip().split('\t')

        value1 = data.get(infilecolumns[2])
        value2 = data.get(infilecolumns[5])

        if value1:
            infilecolumns[2] = value1
        if value2:
            infilecolumns[5] = value2

        outfile.write('\t'.join(infilecolumns) + '\n')

这给所需的输出带来了轻松愉快的体验. 感谢您的所有回答,对我有很大帮助!

This gives the desired output nice and easy. Thanks for all your answers, helped me a lot!

这篇关于如何在python中逐行并行读取两个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆