一种过滤文本文件的算法 [英] An algorithm for filtering text files

查看:96
本文介绍了一种过滤文本文件的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,您有以下结构的 .txt 文件:

Imagine you have a .txt file of the following structure:

>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...

我想阅读除>>> 表示的行以外的所有数据,以及>>>下面的行。文件结尾行。
到目前为止,我已经使用 read.table解决了这个问题(comment.char =>,skip = x,nrow = y) x y 目前已修复)。这将读取标题和>>>之间的数据。文件结尾

I would like to read all the data except lines denoted by >>> and lines below the >>> end of file line. So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y) (x and y are currently fixed). This reads the data between the header and >>> end of file.

但是,我想让我的函数在行数上更加可塑。数据的值可能大于800,因此行数也会更多。

However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.

我可以扫描 readLines 该文件,并查看哪一行对应>>>文件结尾并计算要读取的行数。你会用什么方法?

I could scan or readLines the file and see which row corresponds to the >>> end of file and calculate the number of lines to be read. What approach would you use?

推荐答案

这是一种方法:

Lines <- readLines("foo.txt")
markers <- grepl(">", Lines)
want <- rle(markers)$lengths[1:2]
want <- seq.int(want[1] + 1, sum(want), by = 1)
read.table(textConnection(Lines[want]), sep = " ", header = TRUE)

给出:

> read.table(textConnection(Lines[want]), sep = " ", header = TRUE)
    K    L M
1 200 0.10 1
2 201 0.80 1
3 202 0.01 3
4 800 0.40 2

在您提供的数据片段中(在文件<$ c $中) c> foo.txt ,并在删除......行之后。

On the data snippet you provide (in file foo.txt, and after removing the ... lines).

这篇关于一种过滤文本文件的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆