一种过滤文本文件的算法 [英] An algorithm for filtering text files
问题描述
想象一下,您有以下结构的 .txt
文件:
Imagine you have a .txt
file of the following structure:
>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...
我想阅读除>>>
表示的行以外的所有数据,以及>>>下面的行。文件结尾
行。
到目前为止,我已经使用 read.table解决了这个问题(comment.char =>,skip = x,nrow = y)
( x
和 y
目前已修复)。这将读取标题和>>>之间的数据。文件结尾
。
I would like to read all the data except lines denoted by >>>
and lines below the >>> end of file
line.
So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y)
(x
and y
are currently fixed). This reads the data between the header and >>> end of file
.
但是,我想让我的函数在行数上更加可塑。数据的值可能大于800,因此行数也会更多。
However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.
我可以扫描
或 readLines
该文件,并查看哪一行对应>>>文件结尾
并计算要读取的行数。你会用什么方法?
I could scan
or readLines
the file and see which row corresponds to the >>> end of file
and calculate the number of lines to be read. What approach would you use?
推荐答案
这是一种方法:
Lines <- readLines("foo.txt")
markers <- grepl(">", Lines)
want <- rle(markers)$lengths[1:2]
want <- seq.int(want[1] + 1, sum(want), by = 1)
read.table(textConnection(Lines[want]), sep = " ", header = TRUE)
给出:
> read.table(textConnection(Lines[want]), sep = " ", header = TRUE)
K L M
1 200 0.10 1
2 201 0.80 1
3 202 0.01 3
4 800 0.40 2
在您提供的数据片段中(在文件<$ c $中) c> foo.txt ,并在删除......行之后。
On the data snippet you provide (in file foo.txt
, and after removing the ... lines).
这篇关于一种过滤文本文件的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!