逐行解析文本文件,跳过某些行 [英] Parse Text File Line by Line, Skipping Certain Lines
问题描述
我有一个看起来像这样的文件(但更大):
I have a file that looks like this (but is much bigger):
>some text
ABC
DEF
GHI
>some more text
JKL
MNO
PQR
我在Java中使用它已经有一段时间了,并且能够用行等构建数组.带'>'的行通常是一行,但有时可能是2、3或更多行.不以'>'开头的行的字符长度相同,但其中可能有10、20或30或更多行.现在,我要创建一个字符串数组,该数组中的每个字符串都包含一个不以'>'开头的行的字符串,如下所示:
I have been playing around with it in Java for some time and have been able to build arrays with the lines, etc. The lines with '>' are usually one line but sometimes could be 2, 3 or more lines. The lines that don't begin with '>' are the same length in characters but there may be 10, 20 or 30 or more of these lines. I am at the point now where I want to create an string array, where each string in the array contains a string of the lines that don't begin with '>' like so:
array element 1 = ABCDEFGHI
array element 2 = JKLMONPQR
我感觉自己已经接近了,但是需要踢个小脚才能使我前进.我确信这对于专业人士来说很容易,但是我还是Java的新手.
I feel like I am close but need a small kick in the butt to get me going. I'm sure this is easy for a pro, but I am still new to Java.
特定问题与我在此板上发布的其他帖子有关.这是一个FASTA文件:
Specific problem is related to other posts I made on this board. It's a FASTA file:
>3BHS_BOVIN (P14893) 3 beta-hydroxysteroid
AGWSCLVTGGGGFLGQRIICLLVEEKDLQEIRVLDKVFRPEVREEFSKLQSKIKLTLLEG
DILDEQCLKGACQGTSVVIHTASVIDVRNAVPRETIMNVNVKGTQLLLEACVQASVPVFI
>41_BOVIN (Q9N179) Protein 4.1
MHCKVSLLDDTVYECVVEKHAKGQDLLKRVCEHLNLLEEDYFGLAIWDNATSKTWLDSAK
EIKKQVRGVPWNFTFNVKFYPPDPAQLTEDITRYYLCLQLRQDIVSGRLPCSFATLALLG
SYTIQSELGDYDPELHGADYVSDFKLAPNQTKELEEKVMELHKSYRSMTPAQADLEFLEN
>5NTD_BOVIN (Q05927) 5'-nucleotidase
MNPGAARTPALRILPLGALLWPAARPWELTILHTNDVHSRLEQTSEDSSKCVNASRCVGG
VARLATKVHQIRRAEPHVLLLDAGDQYQGTIWFTVYKGTEVAHFMNALGYESMALGNHEF
DNGVEGLIDPLLKEVNFPILSANIKAKGPLASKISGLYSPYKILTVGDEVVGIVGYTSKE
TPFLSNPGTNLVFEDEITALQPEVDKLKTLNVNKIIALGHSGFEVDKLIAQKVKGVDVVV
最终,我需要它们自己的数组元素中的序列,以便以后可以对其进行操作.
I ultimately need the sequences in their own array element so that I can manipulate them later.
推荐答案
假设您可以遍历所有行:
Assuming you can iterate over the lines:
List<String> array = new ArrayList<String>();
StringBuilder buf = new StringBuilder();
for (String line : lines) {
if (line.startsWith(">")) {
if (buf.length() > 0) {
array.add(buf.toString());
buf.setLength(0);
}
} else {
buf.append(line);
}
}
if (buf.length() > 0) { // Add the final text element(s).
array.add(buf.toString());
}
这篇关于逐行解析文本文件,跳过某些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!