一种过滤文本文件的算法 [英] An algorithm for filtering text files

查看：96 发布时间：2018/8/1 11:16:16 r import

本文介绍了一种过滤文本文件的算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

想象一下，您有以下结构的 .txt 文件：

Imagine you have a .txt file of the following structure:

>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...

我想阅读除>>> 表示的行以外的所有数据，以及>>>下面的行。文件结尾行。
到目前为止，我已经使用 read.table解决了这个问题（comment.char =>，skip = x，nrow = y）（ x 和 y 目前已修复）。这将读取标题和>>>之间的数据。文件结尾。

I would like to read all the data except lines denoted by >>> and lines below the >>> end of file line. So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y) (x and y are currently fixed). This reads the data between the header and >>> end of file.

但是，我想让我的函数在行数上更加可塑。数据的值可能大于800，因此行数也会更多。

However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.

我可以扫描或 readLines 该文件，并查看哪一行对应>>>文件结尾并计算要读取的行数。你会用什么方法？

I could scan or readLines the file and see which row corresponds to the >>> end of file and calculate the number of lines to be read. What approach would you use?

推荐答案

这是一种方法：

Lines <- readLines("foo.txt")
markers <- grepl(">", Lines)
want <- rle(markers)$lengths[1:2]
want <- seq.int(want[1] + 1, sum(want), by = 1)
read.table(textConnection(Lines[want]), sep = " ", header = TRUE)

给出：

> read.table(textConnection(Lines[want]), sep = " ", header = TRUE)
    K    L M
1 200 0.10 1
2 201 0.80 1
3 202 0.01 3
4 800 0.40 2

在您提供的数据片段中（在文件<$ c $中） c> foo.txt ，并在删除......行之后。

On the data snippet you provide (in file foo.txt, and after removing the ... lines).

这篇关于一种过滤文本文件的算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

一种过滤文本文件的算法 [英] An algorithm for filtering text files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

一种过滤文本文件的算法 [英] An algorithm for filtering text files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭