从for循环Python中的if语句中提取最后一行的信息 [英] Pull out information from last line from a if else statement within a for loop Python

查看:932
本文介绍了从for循环Python中的if语句中提取最后一行的信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不认为这是可能的,但我想我会问,以防万一。所以我正在尝试编写一个高效的内存分析文件,通常是100 +演出的大小。我想要做的是使用一个for循环读取一行,多次分割各种字符,并写在同一个循环内。

诀窍是文件的行以#开始,除了以#开头的最后一行文件。我希望能够从最后一行获取信息,因为它包含样本名称。

pre $ 对于seqfile中的行:
line = line.rstrip()
如果line.startswith(#):
继续(除非最后一行以#开头)
SampleNames = lastline [8: - 1]
newheader.write(带有样本名称的新标题)
else:
列= line.split(\ t)
然后再分割
然后写

如果这是不可能的,那么唯一的另一种选择是我可以把它存储在行用#(它仍然可以是5演出的大小),然后回去,并写入文件的开始,我相信这是不能直接完成的,但如果有办法有效地做到这一点,这将是很好的。 / b>

任何帮助将不胜感激。



谢谢

解决方案

如果你想要以开头的最后一行的索引, code>,使用 takewhile 读取一次,直到第一行不是以开头,寻找并使用itertools.islice来获得行:

  from itertools import takewhile,islice 

with打开(文件)为f:
start = sum(1 for _ in takewhile(lambda x:x [0] ==#,f))-1
f.seek(0)
data = next(islice(f,start,start + 1))
print(data)

第一个参数需要是一个谓词while谓词为True的时候,将会从第二个参数传入的迭代器中获取元素,因为当我们使用sum来消耗takewhile对象时,文件对象返回它自己的迭代器,现在文件指针指向头之后的下一行你想要的线,所以这只是一个追求和islice线。
如果你只想回到前面的行,并且用几行过滤出来,直到到达最后一行以

文件:

  ### 
##
#我是标题
blah
blah
blah

输出:

 #我是标题

如果行可能在任何地方,我可以想到的唯一有效的内存方式意味着读取文件一次总是更新索引变量用#,那么你可以通过上面的答案islice或使用 linecache.getline 在这个答案中:

 导入linecache 

打开(文件)作为f:
index = None
为ind,枚举中的行(f,1):
如果line [0] ==#:
index = ind
data = linecache.getline(file,index)
print(data)

我们使用一个起始索引 1 ,枚举为 getline 开始计数。 1



或者简单地更新一个变量数据,它将保存每行以

 与打开(文件)作为f:
data = None
for line:f
如果line [0] ==#:
data = line
print(data) #将会是```

或者使用文件开头的最后一行.tell ,保留前面的指针位置,并使用它来寻找,然后调用下一个文件对象来获得我们想要的行/行:

打开(文件)为f:
curr_tell,prev_tell = None,None
for iter(f.readline ,):
if line [0] ==#:
curr_tell = prev_tell
prev_tell = f.tell()
f.seek(curr_tell)
data = next(f)
print(data)
#我是标题

还有一些itertools代码中的使用配方您可以使用文件迭代器直到您的标题行索引-1,然后简单地调用下一个文件对象:

  def消费(迭代器,n):
提前n步迭代器。如果n为none,则完全消耗
#使用以C速度使用迭代器的函数
如果n是None:
#将整个迭代器喂入一个零长度deque
collection.deque(iterator,maxlen = 0)
else:
#从位置n
next(islice(iterator,n,n),None)前进到空片段


I don't think this is possible but I figured I would ask just in case. So I am trying to write a memory efficient python program for parsing files that are typically 100+ gigs in size. What I am trying to do is use a for loop to read in a line, split on various characters multiple times and write it all within the same loop.

The trick is that the file has lines that start with "#" which is not important except for the last line that starts with a "#" which is the header of the file. I want to be able to pull information from that last line because it contains the sample names.

for line in seqfile:
line = line.rstrip()
if line.startswith("#"):
    continue (unless its the last line that starts with #)
    SampleNames = lastline[8:-1]
    newheader.write(New header with sample names)
else:
    columns = line.split("\t") 
    then do more splitting
    then write

If this is not possible then the only other alternative I can think of it to store the lines with # (which can still be 5 gigs in size) then go back and write to the beginning of the file which I believe which can't be done directly but if there is a way to do that memory efficiently it would be nice.

Any help would be greatly appreciated.

Thank you

解决方案

If you want the index of the last line starting with #, read once using takewhile, consuming lines until you hit the first line not starting with # then seek and use itertools.islice to get the line:

from itertools import takewhile,islice

with open(file) as f:
    start = sum(1 for _ in takewhile(lambda x: x[0] == "#",f)) -1
    f.seek(0)
    data = next(islice(f,start, start+1))
    print(data)

The first arg to takewhile is a predicate which while the predicate is True takewhile will take elements from the iterable passed in as the second argument, because a file object returns it's own iterator when we consume the takewhile object using sum the file pointer is now pointing to the very next line after the header line you want so it is just a matter of seeking back and getting the line with islice. You can obviously also seek much less if you just want to go back to the previous line and take a few lines with islice filtering out until you reach the last line starting with a #.

file:

###
##
# i am the header
blah
blah
blah

Output:

 # i am the header

The only memory efficient way I could think of if the line could be anywhere would mean reading the file once always updating an index variable when you had a line starting with #, then you could pass the to islice as in the answer above or use linecache.getline as in this answer:

import linecache

with open(file) as f:
    index = None
    for ind, line in enumerate(f, 1):
        if line[0] == "#":
            index = ind
    data = linecache.getline(file, index)
    print(data)

We use a starting index of 1 with enumerate as getline counts starting from 1.

Or simply update a variable data which will hold each line starting with a # if you only want that particular line and don't care about the position or the other lines:

with open(file) as f:
     data = None
    for line in f:
        if line[0] == "#":
            data = line
    print(data) # will be last occurrence of line starting with `#`

Or using file.tell, keeping tack of the previous pointer location and using that to seek then call next on the file object to get the line/lines we want:

with open(file) as f:
    curr_tell, prev_tell = None, None
    for line in iter(f.readline, ""):
        if line[0] == "#":
            curr_tell = prev_tell
        prev_tell = f.tell()
    f.seek(curr_tell)
    data  = next(f)
    print(data)
    # i am the header

There is also the consume recipe from the itertools code that you could use to consume the file iterator up to your header line index -1 then simply call next on the file object:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

这篇关于从for循环Python中的if语句中提取最后一行的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆