如何从文本文件中读取信息? [英] How do I read information from text files?

查看:143
本文介绍了如何从文本文件中读取信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数百个文本文件,每个文件中都有以下信息:

I have hundreds of text files with the following information in each file:

*****Auto-Corelation Results******
1     .09    -.19     .18     non-Significant

*****STATISTICS FOR MANN-KENDELL TEST******
S=  609
VAR(S)=      162409.70
Z=           1.51
Random : No trend at 95%

*****SENs STATISTICS ******
SEN SLOPE =  .24

现在,我想读取所有这些文件,并从每个文件(例如.24)收集" Sen的统计信息,并将其与相应的文件名一起编译为一个文件.我必须在R中这样做.

Now, I want to read all these files, and "collect" Sen's Statistics from each file (eg. .24) and compile into one file along with the corresponding file names. I have to do it in R.

我使用过CSV文件,但不确定如何使用文本文件.

I have worked with CSV files but not sure how to use text files.

这是我现在正在使用的代码:

This is the code I am using now:

require(gtools)
GG <- grep("*.txt", list.files(), value = TRUE)
GG<-mixedsort(GG)
S <- sapply(seq(GG), function(i){
X <- readLines(GG[i])
grep("SEN SLOPE", X, value = TRUE)
})
spl <- unlist(strsplit(S, ".*[^.0-9]"))
SenStat <- as.numeric(spl[nzchar(spl)])
SenStat<-data.frame( SenStat,file = GG)
write.table(SenStat, "sen.csv",sep = ", ",row.names = FALSE)

当前代码无法正确读取所有值并给出此错误:

The current code is not able to read all values correctly and giving this error:

Warning message:
NAs introduced by coercion 

我也没有在输出的另一列中得到文件名.请帮忙!

Also I am not getting the file names the other column of Output. Please help!

代码也正在读取=符号.这是print(spl)

The code is reading the = sign as well. This is the output of print(spl)

 [1] ""       "5.55"   ""       "-.18"   ""       "3.08"   ""       "3.05"   ""       "1.19"   ""       "-.32"  
[13] ""       ".22"    ""       "-.22"   ""       ".65"    ""       "1.64"   ""       "2.68"   ""       ".10"   
[25] ""       ".42"    ""       "-.44"   ""       ".49"    ""       "1.44"   ""       "=-1.07" ""       ".38"   
[37] ""       ".14"    ""       "=-2.33" ""       "4.76"   ""       ".45"    ""       ".02"    ""       "-.11"  
[49] ""       "=-2.64" ""       "-.63"   ""       "=-3.44" ""       "2.77"   ""       "2.35"   ""       "6.29"  
[61] ""       "1.20"   ""       "=-1.80" ""       "-.63"   ""       "5.83"   ""       "6.33"   ""       "5.42"  
[73] ""       ".72"    ""       "-.57"   ""       "3.52"   ""       "=-2.44" ""       "3.92"   ""       "1.99"  
[85] ""       ".77"    ""       "3.01"

诊断2

发现了我认为的问题.负号有些棘手.在某些文件中是

Diagnosis 2

Found the problem I think. The negative sign is a bit tricky. In some files it is

SEN SLOPE =-1.07
SEN SLOPE = -.11

由于=后的空白,我得到了第一个的NA,但是代码正在读取第二个.如何修改正则表达式以解决此问题?谢谢!

Because of the gap after =, I am getting NAs for the first one, but the code is reading the second one. How can I modify the regex to fix this? Thanks!

推荐答案

假设"text.txt"是您的文本文件之一.用readLines读入R,可以使用grep查找包含SEN SLOPE的行.在没有其他参数的情况下,grep返回找到正则表达式的元素的索引号.在这里,我们发现这是第11行.添加value = TRUE参数以获取行的内容.

Assume "text.txt" is one of your text files. Read into R with readLines, you can use grep to find the line containing SEN SLOPE. With no further arguments, grep returns the index number(s) for the element where the regular expression was found. Here we find that it's the 11th line. Add the value = TRUE argument to get the line as it reads.

x <- readLines("text.txt")
grep("SEN SLOPE", x)
## [1] 11
( gg <- grep("SEN SLOPE", x, value = TRUE) )
## [1] "SEN SLOPE =  .24"

要在工作目录中查找所有.txt文件,我们可以将list.files与正则表达式结合使用.

To find all the .txt files in the working directory we can use list.files with a regular expression.

list.files(pattern = "*.txt")
## [1] "text.txt"

围绕多个文件

我创建了第二个文本文件text2.txt,具有不同的SEN SLOPE值,以说明如何将这种方法应用于多个文件.我们可以使用sapply,然后使用strsplit,以获得所需的spl值.

I created a second text file, text2.txt with a different SEN SLOPE value to illustrate how I might apply this method over multiple files. We can use sapply, followed by strsplit, to get the spl values that are desired.

GG <- list.files(pattern = "*.txt")
S <- sapply(seq_along(GG), function(i){
    X <- readLines(GG[i])
    ifelse(length(X) > 0, grep("SEN SLOPE", X, value = TRUE), NA)
    ## added 04/23/14 to account for empty files (as per comment)
})
spl <- unlist(strsplit(S, split = ".*((=|(\\s=))|(=\\s|\\s=\\s))"))
## above regex changed to capture up to and including "=" and 
## surrounding space, if any - 04/23/14 (as per comment)
SenStat <- as.numeric(spl[nzchar(spl)])

然后我们可以将结果放入数据框中,并使用write.table

Then we can put the results into a data frame and send it to a file with write.table

( SenStatDf <- data.frame(SenStat, file = GG) )
##   SenStat      file
## 1    0.46 text2.txt
## 2    0.24  text.txt

我们可以使用

write.table(SenStatDf, "myFile.csv", sep = ", ", row.names = FALSE)


2014年7月21日更新:

由于将结果写入文件,因此可以使用

Since the result is being written to a file, this can be made much more simple (and faster) with

( SenStatDf <- cbind(
      SenSlope = c(lapply(GG, function(x){
          y <- readLines(x)
          z <- y[grepl("SEN SLOPE", y)]
          unlist(strsplit(z, split = ".*=\\s+"))[-1]
          }), recursive = TRUE),
      file = GG
 ) )
#      SenSlope file       
# [1,] ".46"   "test2.txt"
# [2,] ".24"   "test.txt" 

然后用R写入并读入R

And then written and read into R with

write.table(SenStatDf, "myFile.txt", row.names = FALSE)
read.table("myFile.txt", header = TRUE)
#   SenSlope      file
# 1     1.24 test2.txt
# 2     0.24  test.txt

这篇关于如何从文本文件中读取信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆