R - 在特定行之后从.txt文件中读取行 [英] R - Reading lines from a .txt-file after a specific line
问题描述
我有一堆输出.txt文件,包含一个大参数列表和一个X-Y坐标集。我需要从所有文件中提取这些坐标,以便只将这些行导入到矢量中。这可以正常工作
I have a bunch of output .txt-files that consists of a large parameter list and a X-Y-coordinate set. I need to extract these coordinates from all files so that only those lines are imported to a vector. This would work fine with
impcoord<-read.table("file.txt",skip= ,nrow= ,...)
但文件在不同长度的支持参数后打印坐标集。
but the files print the coordinate sets after different lengths of supporting parameters.
幸运的是坐标总是在包含某些单词的行之后开始。
Luckily the coordinates always start after a line containing certain words.
因此我的问题是,如何开始阅读.txt -file后这些话?假设它们是:
Thus my question is, how do I start reading the .txt-file after these words? Let's say they are:
coordinatesXY
非常感谢您的时间和帮助!
Thanks alot for your time and help!
-Olli
- -Edit -
--Edit--
对于这种混淆感到抱歉。
Sorry for the confusion.
该文件的部分如下:
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043
... ...
... ...
第一行是 skip
应该结束,并且需要将以下坐标导入到矢量。如您所见,X坐标从131071开始并结束为0.
The first line being the one where skip
should end and the following coordinates need to be imported to a vector. As you can see the X-coordinates start from 131071 and end to 0.
推荐答案
1)read.pattern < gsubfn中的/ strong> read.pattern
可用于只读取与特定模式匹配的行。在此示例中,我们匹配行的开头,可选空格,1个或多个数字,1个或多个空格,可选减号后跟1个或多个数字,可选空格,行尾。匹配正则表达式的括号部分的部分作为data.frame中的列返回。此自包含示例中的 text = Lines
可以替换为myfile.txt
,例如,如果数据是来自一个文件。修改模式以适应。
1) read.pattern read.pattern
in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines
in this self contained example can be replaced with "myfile.txt"
, say, if the data is coming from a file. Modify the pattern to suit.
Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043"
library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")
给予:
> DF
V1 V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043
2)阅读两次仅使用基数R的另一种可能性是只读一次以确定<$ c $的值c> skip = 并第二次使用该值进行实际读取。从文件中读取 myfile.txt
替换 text = Lines
和 textConnection(Lines)
with myfile.txt
。
2) read twice Another possibility using only base R is simply to read it once to determine the value of skip=
and a second time to do the actual read using that value. To read from a file myfile.txt
replace text = Lines
and textConnection(Lines)
with "myfile.txt"
.
read.table(text = Lines,
skip = grep("##XYDATA=", readLines(textConnection(Lines))))
已添加部分修订并添加了第二种方法。
Added Some revisions and added second approach.
这篇关于R - 在特定行之后从.txt文件中读取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!