如何从文本文件中读取信息? [英] How do I read information from text files?
问题描述
我有数百个文本文件,每个文件中都有以下信息:
I have hundreds of text files with the following information in each file:
*****Auto-Corelation Results******
1 .09 -.19 .18 non-Significant
*****STATISTICS FOR MANN-KENDELL TEST******
S= 609
VAR(S)= 162409.70
Z= 1.51
Random : No trend at 95%
*****SENs STATISTICS ******
SEN SLOPE = .24
现在,我想读取所有这些文件,并从每个文件(例如.24
)收集" Sen的统计信息,并将其与相应的文件名一起编译为一个文件.我必须在R中这样做.
Now, I want to read all these files, and "collect" Sen's Statistics from each file (eg. .24
) and compile into one file along with the corresponding file names. I have to do it in R.
我使用过CSV文件,但不确定如何使用文本文件.
I have worked with CSV files but not sure how to use text files.
这是我现在正在使用的代码:
This is the code I am using now:
require(gtools)
GG <- grep("*.txt", list.files(), value = TRUE)
GG<-mixedsort(GG)
S <- sapply(seq(GG), function(i){
X <- readLines(GG[i])
grep("SEN SLOPE", X, value = TRUE)
})
spl <- unlist(strsplit(S, ".*[^.0-9]"))
SenStat <- as.numeric(spl[nzchar(spl)])
SenStat<-data.frame( SenStat,file = GG)
write.table(SenStat, "sen.csv",sep = ", ",row.names = FALSE)
当前代码无法正确读取所有值并给出此错误:
The current code is not able to read all values correctly and giving this error:
Warning message:
NAs introduced by coercion
我也没有在输出的另一列中得到文件名.请帮忙!
Also I am not getting the file names the other column of Output. Please help!
代码也正在读取=符号.这是print(spl)
The code is reading the = sign as well. This is the output of print(spl)
[1] "" "5.55" "" "-.18" "" "3.08" "" "3.05" "" "1.19" "" "-.32"
[13] "" ".22" "" "-.22" "" ".65" "" "1.64" "" "2.68" "" ".10"
[25] "" ".42" "" "-.44" "" ".49" "" "1.44" "" "=-1.07" "" ".38"
[37] "" ".14" "" "=-2.33" "" "4.76" "" ".45" "" ".02" "" "-.11"
[49] "" "=-2.64" "" "-.63" "" "=-3.44" "" "2.77" "" "2.35" "" "6.29"
[61] "" "1.20" "" "=-1.80" "" "-.63" "" "5.83" "" "6.33" "" "5.42"
[73] "" ".72" "" "-.57" "" "3.52" "" "=-2.44" "" "3.92" "" "1.99"
[85] "" ".77" "" "3.01"
诊断2
发现了我认为的问题.负号有些棘手.在某些文件中是
Diagnosis 2
Found the problem I think. The negative sign is a bit tricky. In some files it is
SEN SLOPE =-1.07
SEN SLOPE = -.11
由于=后的空白,我得到了第一个的NA,但是代码正在读取第二个.如何修改正则表达式以解决此问题?谢谢!
Because of the gap after =, I am getting NAs for the first one, but the code is reading the second one. How can I modify the regex to fix this? Thanks!
推荐答案
假设"text.txt"
是您的文本文件之一.用readLines
读入R,可以使用grep
查找包含SEN SLOPE
的行.在没有其他参数的情况下,grep
返回找到正则表达式的元素的索引号.在这里,我们发现这是第11行.添加value = TRUE
参数以获取行的内容.
Assume "text.txt"
is one of your text files. Read into R with readLines
, you can use grep
to find the line containing SEN SLOPE
. With no further arguments, grep
returns the index number(s) for the element where the regular expression was found. Here we find that it's the 11th line. Add the value = TRUE
argument to get the line as it reads.
x <- readLines("text.txt")
grep("SEN SLOPE", x)
## [1] 11
( gg <- grep("SEN SLOPE", x, value = TRUE) )
## [1] "SEN SLOPE = .24"
要在工作目录中查找所有.txt
文件,我们可以将list.files
与正则表达式结合使用.
To find all the .txt
files in the working directory we can use list.files
with a regular expression.
list.files(pattern = "*.txt")
## [1] "text.txt"
围绕多个文件
我创建了第二个文本文件text2.txt
,具有不同的SEN SLOPE
值,以说明如何将这种方法应用于多个文件.我们可以使用sapply
,然后使用strsplit
,以获得所需的spl
值.
I created a second text file, text2.txt
with a different SEN SLOPE
value to illustrate how I might apply this method over multiple files. We can use sapply
, followed by strsplit
, to get the spl
values that are desired.
GG <- list.files(pattern = "*.txt")
S <- sapply(seq_along(GG), function(i){
X <- readLines(GG[i])
ifelse(length(X) > 0, grep("SEN SLOPE", X, value = TRUE), NA)
## added 04/23/14 to account for empty files (as per comment)
})
spl <- unlist(strsplit(S, split = ".*((=|(\\s=))|(=\\s|\\s=\\s))"))
## above regex changed to capture up to and including "=" and
## surrounding space, if any - 04/23/14 (as per comment)
SenStat <- as.numeric(spl[nzchar(spl)])
然后我们可以将结果放入数据框中,并使用write.table
Then we can put the results into a data frame and send it to a file with write.table
( SenStatDf <- data.frame(SenStat, file = GG) )
## SenStat file
## 1 0.46 text2.txt
## 2 0.24 text.txt
我们可以使用
write.table(SenStatDf, "myFile.csv", sep = ", ", row.names = FALSE)
2014年7月21日更新:
由于将结果写入文件,因此可以使用
Since the result is being written to a file, this can be made much more simple (and faster) with
( SenStatDf <- cbind(
SenSlope = c(lapply(GG, function(x){
y <- readLines(x)
z <- y[grepl("SEN SLOPE", y)]
unlist(strsplit(z, split = ".*=\\s+"))[-1]
}), recursive = TRUE),
file = GG
) )
# SenSlope file
# [1,] ".46" "test2.txt"
# [2,] ".24" "test.txt"
然后用R写入并读入R
And then written and read into R with
write.table(SenStatDf, "myFile.txt", row.names = FALSE)
read.table("myFile.txt", header = TRUE)
# SenSlope file
# 1 1.24 test2.txt
# 2 0.24 test.txt
这篇关于如何从文本文件中读取信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!