计算文件中前两个“字符串"出现之间的跳转(行数) [英] Counting jump(no of lines) between first two 'String' occurrences in a file

查看:63
本文介绍了计算文件中前两个“字符串"出现之间的跳转(行数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的数据文件,其中包含在定义的行数后重复的特定字符串.

I have a huge data file with a specific string being repeated after a defined number of lines.

计算前两个排名"出现之间的跳跃.例如文件看起来像这样:

counting jump between first two 'Rank' occurrences. For example the file looks like this:

  1 5 6 8 Rank                     line-start
  2 4 8 5
  7 5 8 6
  5 4 6 4
  1 5 7 4 Rank                     line-end  
  4 8 6 4
  2 4 8 5
  3 6 8 9
  5 4 6 4 Rank

您可以注意到字符串 Rank 每 3 行重复一次.因此,对于上述示例,块中的行数为 4.我的问题是如何使用 python readline() 获取行数.

You can notice that the string Rank is repeated every 3rd line. So the number of lines in a block is 4 for the above example. My Question is how do i get the number of lines using python readline().

我目前关注这个:

data = open(filename).readlines()
count = 0
for j in range(len(data)):
  if(data[j].find('Rank') != -1): 
    if count == 0: line1 = j
    count = count +1 
  if(count == 2):
    no_of_lines = j - line1
    break

欢迎提出任何改进或建议.

Any improvements or suggestions welcome.

推荐答案

我假设您想查找块中的行数,其中每个块都以包含Rank"的行开头,例如,您的块中有 3 个块示例:第一个有 4 行,第二个有 4 行,第三个有 1 行:

I assume you want to find the number of lines in a block where each block starts with a line that contains 'Rank' e.g., there are 3 blocks in your sample: 1st has 4 lines, 2nd has 4 lines, 3rd has 1 line:

from itertools import groupby

def block_start(line, start=[None]):
    if 'Rank' in line:
       start[0] = not start[0]
    return start[0]

with open(filename) as file:
     block_sizes = [sum(1 for line in block) # find number of lines in a block
                    for _, block in groupby(file, key=block_start)] # group
print(block_sizes)
# -> [4, 4, 1]

如果所有块的行数相同,或者您只想查找以 'Rank' 开头的第一个块中的行数:

If all blocks have the same number of lines or you just want to find number of lines in the first block that starts with 'Rank':

count = None
with open(filename) as file:
     for line in file:
         if 'Rank' in line:
             if count is None: # found the start of the 1st block
                count = 1
             else: # found the start of the 2nd block
                break
         elif count is not None: # inside the 1st block
             count += 1
print(count) # -> 4

这篇关于计算文件中前两个“字符串"出现之间的跳转(行数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆