逐行解析文本文件,跳过某些行 [英] Parse Text File Line by Line, Skipping Certain Lines

查看:87
本文介绍了逐行解析文本文件,跳过某些行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的文件(但更大):

I have a file that looks like this (but is much bigger):

>some text
ABC
DEF
GHI
>some more text
JKL
MNO
PQR

我在Java中使用它已经有一段时间了,并且能够用行等构建数组.带'>'的行通常是一行,但有时可能是2、3或更多行.不以'>'开头的行的字符长度相同,但其中可能有10、20或30或更多行.现在,我要创建一个字符串数组,该数组中的每个字符串都包含一个不以'>'开头的行的字符串,如下所示:

I have been playing around with it in Java for some time and have been able to build arrays with the lines, etc. The lines with '>' are usually one line but sometimes could be 2, 3 or more lines. The lines that don't begin with '>' are the same length in characters but there may be 10, 20 or 30 or more of these lines. I am at the point now where I want to create an string array, where each string in the array contains a string of the lines that don't begin with '>' like so:

array element 1 = ABCDEFGHI
array element 2 = JKLMONPQR

我感觉自己已经接近了,但是需要踢个小脚才能使我前进.我确信这对于专业人士来说很容易,但是我还是Java的新手.

I feel like I am close but need a small kick in the butt to get me going. I'm sure this is easy for a pro, but I am still new to Java.

特定问题与我在此板上发布的其他帖子有关.这是一个FASTA文件:

Specific problem is related to other posts I made on this board. It's a FASTA file:

>3BHS_BOVIN (P14893) 3 beta-hydroxysteroid
AGWSCLVTGGGGFLGQRIICLLVEEKDLQEIRVLDKVFRPEVREEFSKLQSKIKLTLLEG
DILDEQCLKGACQGTSVVIHTASVIDVRNAVPRETIMNVNVKGTQLLLEACVQASVPVFI
>41_BOVIN (Q9N179) Protein 4.1 
MHCKVSLLDDTVYECVVEKHAKGQDLLKRVCEHLNLLEEDYFGLAIWDNATSKTWLDSAK
EIKKQVRGVPWNFTFNVKFYPPDPAQLTEDITRYYLCLQLRQDIVSGRLPCSFATLALLG
SYTIQSELGDYDPELHGADYVSDFKLAPNQTKELEEKVMELHKSYRSMTPAQADLEFLEN
>5NTD_BOVIN (Q05927) 5'-nucleotidase 
MNPGAARTPALRILPLGALLWPAARPWELTILHTNDVHSRLEQTSEDSSKCVNASRCVGG
VARLATKVHQIRRAEPHVLLLDAGDQYQGTIWFTVYKGTEVAHFMNALGYESMALGNHEF
DNGVEGLIDPLLKEVNFPILSANIKAKGPLASKISGLYSPYKILTVGDEVVGIVGYTSKE
TPFLSNPGTNLVFEDEITALQPEVDKLKTLNVNKIIALGHSGFEVDKLIAQKVKGVDVVV

最终,我需要它们自己的数组元素中的序列,以便以后可以对其进行操作.

I ultimately need the sequences in their own array element so that I can manipulate them later.

推荐答案

假设您可以遍历所有行:

Assuming you can iterate over the lines:

List<String> array = new ArrayList<String>();
StringBuilder buf = new StringBuilder();
for (String line : lines) {
  if (line.startsWith(">")) {
    if (buf.length() > 0) {
      array.add(buf.toString());
      buf.setLength(0);
    }
  } else {
    buf.append(line);
  }
}
if (buf.length() > 0) { // Add the final text element(s).
  array.add(buf.toString());
}

这篇关于逐行解析文本文件,跳过某些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆