如何在python中逐行并行读取两个文件? [英] How to read two files in parallel, line by line, in python?

查看：108 发布时间：2020/5/5 14:00:22 python dictionary

本文介绍了如何在python中逐行并行读取两个文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在努力解决这一问题，但没有成功.

I've been trying to solve this issue all day without success.

我有一个原始文件"，我们称它为"infile"，这是我要编辑的文件. 另外，我还有另一个充当字典"的文件，我们称它为"inlist".

I have an 'original file', let's call it 'infile', which is the file I want to edit. Additionaly I have another file that functions as a 'dictionary', let's call it 'inlist'.

以下是infile的示例:

Here are examples of the infile:

PRMT6   10505   Q96LA8  HMGA1   02829   NP_665906
WDR77   14387   NP_077007   SNRPE   00548   NP_003085
NCOA3   03570   NP_858045   RELA    01241   NP_068810
ITCH    07565   Q96J02  DTX1    03991   NP_004407

还有inlist:

NP_060607   Q96LA8
NP_001244066    Q96J02
NP_077007   Q9BQA1
NP_858045   Q9Y6Q9

我当前的方法是在相应的列中拆分行，并通过现有的选项卡拆分行. 目的是读取infile的每一行并检查一些内容:

My current approach consists in splitting the lines in the respective columns, splitting the lines by the existing tabs. The objective is to read each line of the infile and check some stuff:

如果在inlist的第一列中找到了infile的第三列中的元素，请将该元素更改为inlist的第二列中的相应元素
如果在inlist的第二列中找到了infile的第三列中的元素，则不执行任何操作
与infile的第5列相同

这应该检索输出:

PRMT6   10505   Q96LA8  HMGA1   02829   Q(...)
WDR77   14387   Q9BQA1  SNRPE   00548   Q(...)
NCOA3   03570   Q9Y6Q9  RELA    01241   Q(...)
ITCH    07565   Q96J02  DTX1    03991   Q(...)

注意:并非所有代码都以Q开头

NOTE: not all codes start with Q

我曾经尝试过使用while循环，但没有成功，我很ham愧将代码发布到这里(我是编程新手，所以我不想这么早就被游戏'). 完美解决此问题的方法是:

I've tried using a while loop, but wasn't successful and I'm to ashamed to post the code here (I'm new to programming, so I don't want to get demotivated so early in the 'game'). Something that would be perfect to solve this would be:

for line in inlist #, infile: <--- THIS PART! Reading both files, splitting both files, replacing both files...
        inlistcolumns = line.split('\t')
        infilecolumns = line.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]) + "\n")
        elif inlistcolumns[0] in infilecolumns[5]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]) + "\n")
        else:
            outfile.write('\t'.join(infilecolumns) + '\n')

我们将不胜感激.谢谢！

Help would be much appreciated. Thanks!

好吧，在Sephallia和Jlengrand的提示下，我得到了:

Ok, after the hints of Sephallia and Jlengrand I got this:

for line in infile:
    try:
    # Read lines in the dictionary
        line2 = inlist.readline()
        inlistcolumns = line.split('\t')
        infilecolumns = line.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
        elif inlistcolumns[0] in infilecolumns[5]:
                outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
        else:
                    outfile.write('\t'.join(infilecolumns))
    except IndexError:
        print "End of dictionary reached. Restarting from top."

问题在于，显然if语句没有完成其工作，因为输出文件仍然等于输入文件.我该怎么办?

The problem is that apparently the if statements are not doing their job, as the output file remained equal to the input file. What can I be doing wrong?

有些人问，这里是完整的代码:

As asked by some, here goes the full code:

    import os

def replace(infilename, linename, outfilename):
    # Open original file and output file
    infile = open(infilename, 'rt')
    inlist = open(linename, 'rt')
    outfile = open(outfilename, 'wt')

    # Read lines and find those to be replaced
    for line in infile:
        infilecolumns = line.split('\t')
        line2 = inlist.readline()
        inlistcolumns = line2.split('\t')
        if inlistcolumns[0] in infilecolumns[2]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
        elif inlistcolumns[0] in infilecolumns[5]:
            outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
        outfile.write('\t'.join(infilecolumns))

    # Close files
    infile.close()
    inlist.close()
    outfile.close()


if __name__ == '__main__':
    wdir = os.getcwd()
    outdir = os.path.join(wdir, 'results.txt')
    outname = os.path.basename(outdir)
    original = raw_input("Type the name of the file to be parsed\n")
    inputlist = raw_input("Type the name of the libary to be used\n")
    linesdir = os.path.join(wdir, inputlist)
    linesname = os.path.basename(linesdir)
    indir = os.path.join(wdir, original)
    inname = os.path.basename(indir)

    replace(indir, linesdir, outdir)

    print "Successfully applied changes.\nOriginal: %s\nLibrary: %s\nOutput:%s" % (inname, linesname, outname)

要使用的第一个文件是hprdtotal.txt: https://www.dropbox .com/s/hohvlcdqvziewte/hprdmap.txt 第二个是hprdmap.txt: https://www.dropbox.com/s/9hd0e3a8rt95pao /hprdtotal.txt

The first file to be used is hprdtotal.txt: https://www.dropbox.com/s/hohvlcdqvziewte/hprdmap.txt And the second is hprdmap.txt: https://www.dropbox.com/s/9hd0e3a8rt95pao/hprdtotal.txt

希望这会有所帮助.

如何在python中逐行并行读取两个文件? [英] How to read two files in parallel, line by line, in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中逐行并行读取两个文件? [英] How to read two files in parallel, line by line, in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭