如何将stdin视为文本文件 [英] How to treat stdin like a text file

查看:74
本文介绍了如何将stdin视为文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序可以读取分析文本文件并对其进行一些分析。我想对其进行修改,以便可以通过命令行获取参数。

I have a program that reads parses a text file and does some analysis on it. I want to modify it so it can take parameters via the command line. Reading from the file when it is designated stdin.

解析器看起来像这样:

class FastAreader :
    '''
    Class to provide reading of a file containing one or more FASTA
    formatted sequences:
    object instantiation:
    FastAreader(<file name>):

    object attributes:
    fname: the initial file name

    methods:
    readFasta() : returns header and sequence as strings.
    Author: David Bernick
    Date: April 19, 2013
    '''
    def __init__ (self, fname):
        '''contructor: saves attribute fname '''
        self.fname = fname

    def readFasta (self):
        '''
        using filename given in init, returns each included FastA record
        as 2 strings - header and sequence.
        whitespace is removed, no adjustment is made to sequence contents.
        The initial '>' is removed from the header.
        '''
        header = ''
        sequence = ''

        with open(self.fname) as fileH:
            # initialize return containers
            header = ''
            sequence = ''

            # skip to first fasta header
            line = fileH.readline()
            while not line.startswith('>') :
                line = fileH.readline()
            header = line[1:].rstrip()

            # header is saved, get the rest of the sequence
            # up until the next header is found
            # then yield the results and wait for the next call.
            # next call will resume at the yield point
            # which is where we have the next header
            for line in fileH:
                if line.startswith ('>'):
                    yield header,sequence
                    header = line[1:].rstrip()
                    sequence = ''
                else :
                    sequence += ''.join(line.rstrip().split()).upper()
        # final header and sequence will be seen with an end of file
        # with clause will terminate, so we do the final yield of the data
        yield header,sequence

# presumed object instantiation and example usage
# myReader = FastAreader ('testTiny.fa');
# for head, seq in myReader.readFasta() :
#     print (head,seq)

它解析如下文件:

>test
ATGAAATAG
>test2
AATGATGTAA
>test3
AAATGATGTAA

>test-1
TTA CAT CAT

>test-2
TTA CAT CAT A

>test-3
TTA CAT CAT AA

>test1A
ATGATGTAAA
>test2A
AATGATGTAAA
>test3A
AAATGATGTAAA

>test-1A
A TTA CAT CAT

>test-2A
AA TTA CAT CAT A

>test-3A
AA TTA CAT CAT AA

我的测试程序如下:

import argparse
import sequenceAnalysis as s
import sys

class Test:
    def __init__(self, infile, longest, min, start):
        self.longest = longest
        self.start = set(start)
        self.infile = infile
        self.data = sys.stdin.read()
        self.fasta = s.FastAreader(self.data)
        for head, seq in self.fasta.readFasta():
            self.head = head
            self.seq = "".join(seq).strip()
        self.test()

    def test(self):
        print("YUP", self.start, self.head)


def main():
    parser = argparse.ArgumentParser(description = 'Program prolog', 
                                     epilog = 'Program epilog', 
                                     add_help = True, #default is True 
                                     prefix_chars = '-', 
                                     usage = '%(prog)s [options] -option1[default] <input >output')
    parser.add_argument('-i', '--inFile', action = 'store', help='input file name')
    parser.add_argument('-o', '--outFile', action = 'store', help='output file name') 
    parser.add_argument('-lG', '--longestGene', action = 'store', nargs='?', const=True, default=True, help='longest Gene in an ORF')
    parser.add_argument('-mG', '--minGene', type=int, choices= range(0, 2000), action = 'store', help='minimum Gene length')
    parser.add_argument('-s', '--start', action = 'append', nargs='?', help='start Codon') #allows multiple list options
    parser.add_argument('-v', '--version', action='version', version='%(prog)s 0.1')  
    args = parser.parse_args()
    test = Test(args.inFile, args.longestGene, args.minGene, args.start)


if __name__ == '__main__':
    main()

我的命令行输入如下:

python testcommand2.py -s ATG <tass2.fa >out.txt

其中tass2.fa是可以由FastAreader解析的文件。我可以传递像start这样的参数并将其输出到文本文件,但是当我尝试解析应为stdin的输入文件时,它将打印所有内容而不是对其进行解析,而不是输出到应为stdout的指定文本文件中,从而将其打印出来

Where tass2.fa is a filewhich can be parsed by FastAreader. I can pass paramters like start and get them to output to the text file but when I try to parse the input file which should be stdin it prints everything instead of parsing it and instead of outputting to the designated textfile which should be stdout it prints it directly to the commandline.

推荐答案

使用I / O重定向时(即您具有< | > << 在命令行中),甚至在程序运行之前就由外壳程序处理。因此,当Python运行时,其标准输入连接到您要重定向到的文件或管道,其标准输出连接到您要重定向到的文件或管道,并且文件名对Python不(直接)可见,因为正在处理已经 open() ed的文件句柄,而不是文件名。参数解析器只是不返回任何内容,因为没有文件名参数。

When you use I/O redirection (i.e. you have < or | or > or << in the command line), that is handled by the shell even before your program runs. So when Python runs, its standard input is connected to the file or pipe you are redirecting from, and its standard output is connected to the file or pipe you are redirecting to, and the file names are not (directly) visible to Python because you are dealing with already open()ed file handles, not file names. Your argument parser simply returns nothing, because there are no file name arguments.

要正确处理此问题,应修改代码以直接使用文件句柄-而是

To correctly cope with this, you should adapt your code to work with file handles directly -- instead of, or in addition to, explicit file names.

对于后一种情况,通常的约定是文件名有特殊情况。 -,当传入时,请使用标准输入(或标准输出,具体取决于上下文),而不要打开文件。 (通过使用相对路径 ./- 的简单变通方法,您仍然可以使用类似的名称来命名文件,因此名称并非完全是一个破折号。)

For the latter scenario, a common convention is to have a special case for the file name - and when that is passed in, use standard input (or standard output, depending on context) instead of opening a file. (You can still have files named like that by the simple workaround of using a relative path ./- so the name is not exactly a single dash.)

这篇关于如何将stdin视为文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆