从文本文件中提取表格 [英] extracting table from text file

查看:138
本文介绍了从文本文件中提取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从文本文件中提取表格,并在这里发现了几个较早的帖子,这些帖子解决了类似的问题。然而,似乎没有人能够有效解决我的问题。我发现的最有帮助的答案是我之前的问题之一:



一个示例文本文件包含:

p>

 > 
>
> ################################################## #############################
>
> #显示
上方模型的AICc表格>
>
> collect.models(,adjust = FALSE)
型号npar AICc DeltaAICc重量偏差
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9
>
>
> ################################################## #############################
>
> #下面三行代表
>以上代码中的错误数目,
> cat(ERROR COUNT:,.error.count,\\\

错误计数:0
> options(error = old.error.fun)
> rm(.error.count,old.error.fun,new.error.fun)
>
> ##########
>
>

我写了下面的代码来提取所需的表格:

  my.data<  -  readLines('c:/ users / mmiller21 / simple R programs / dummy.log')

top< ; - '> collect.models \\(,adjust = FALSE)'
bottom< - '> #下面三行代码计算上面代码中错误的数量'

my.data< - my.data [-c(grep(bottom,my.data):length(my.data ))]
my.data< - my.data [-c(1:grep(top,my.data))]
my.data< - my.data [c(1: (my.data)-4))]
aa< - as.data.frame(my.data)
aa

write.table(my.data ,'c:/ users / mmiller21 / simple R programs / dummy.log.extraction.txt',quote = F,col.names = F,row.name = F)
my.data2< - read。 table('c:/ users / mmiller21 / simple R programs / dummy.log.extraction.txt',header = TRUE,row.names = c(1))
my.data2
model npar AICc DeltaAICc重量偏差
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9

我宁愿避免写入然后读取 my.data 来获得所需的数据帧。在该步骤之前,当前代码为 my.data 返回一个字符串向量:

  [1]model npar AICc DeltaAICc重量偏差13 P1 19 94 0.00 0.78 9
[3]12 P2 21 94 2.64 0.20 910 P3 15 94 9.44 0.02 9
[5]2 P4 11 94 619.26 0.00 9

有什么方法可以将上面的字符串向量转换为类似于 dummy.log.extraction.txt 中的数据框,而不写入,然后读取 my.data



pre $ aa - as.data.frame(my.data)

返回以下内容,它看起来像我想要的:

 #my.data 
#1模型npar AICc DeltaAICc权重偏差
#2 13 P1 19 94 0.00 0.78 9
#3 12 P2 21 94 2.64 0.20 9
#4 10 P3 15 94 9.44 0.02 9
#5 2 P4 11 94 619.26 0.00 9

然而:

  dim(aa)
#[ 1] 5 1

如果我可以分割 aa 列,然后我想我会得到我想要的,而不必写,然后阅读 my.data



我发现这篇文章:从文本文件中提取数据然而,在发布的答案中问题表似乎有固定的行数。在我的情况下,行数可以在1到20之间变化。另外,我更喜欢使用 base R 。在我的例子中,我认为 bottom 和表的最后一行之间的行数是一个常量(这里是4)。



我也发现了这篇文章:如何使用R或PowerShell从文本文件中提取数据?但是,在我的情况下,列的宽度不固定,我不知道如何分割字符串(或行),因此只有七列。

考虑到上述所有问题,或许我的问题是如何将对象 aa 分割为多列。感谢您的任何建议或协助。

编辑:



实际的日志是由超级计算机产生的,包含多达90,000行。但是,日志中的行数差别很大。这就是为什么我使用顶部底部

解决方案

可能是您的真实日志文件完全不同且更复杂,但使用此文件,您可以直接使用 read.table ,你只需要玩正确的参数。

  data<  -  read.table(c:/ users / mmiller21 / simple R programs / dummy.log,
comment.char =>,
nrows = 4,
skip = 1,
header = TRUE,
row.names = 1)

str(data)
##'data.frame':4 obs。 6个变量:
## $ model:因子w / 4级别P1,P2,P3,..:1 2 3 4
## $ npar:int 19 21 15 11
## $ AICc:int 94 94 94 94
## $ DeltaAICc:num 0 2.64 9.44 619.26
## $ weight:num 0.78 0.2 0.02 0
## $偏差:int 9 9 9 9

数据
##模型npar AICc DeltaAICc权重偏差
## 13 P1 19 94 0.00 0.78 9
## 12 P2 21 94 2.64 0.20 9
## 10 P3 15 94 9.44 0.02 9
## 2 P4 11 94 619.26 0.00 9


I am trying to extract tables from text files and have found several earlier posts here that address similar questions. However, none seem to work efficiently with my problem. The most helpful answer I have found is to one of my earlier questions here: R: removing header, footer and sporadic column headings when reading csv file

An example dummy text file contains:

> 
> 
> ###############################################################################
> 
> # Display AICc Table for the models above
> 
> 
> collect.models(, adjust = FALSE)
      model npar  AICc  DeltaAICc weight  Deviance
13      P1   19    94      0.00     0.78      9
12      P2   21    94      2.64     0.20      9
10      P3   15    94      9.44     0.02      9
2       P4   11    94    619.26     0.00      9
> 
> 
> ###############################################################################
> 
> # the three lines below count the number of errors in the code above
> 
> cat("ERROR COUNT:", .error.count, "\n")
ERROR COUNT: 0 
> options(error = old.error.fun)
> rm(.error.count, old.error.fun, new.error.fun)
> 
> ##########
> 
> 

I have written the following code to extract the desired table:

my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')

top    <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'

my.data <- my.data[-c(grep(bottom, my.data):length(my.data))]
my.data <- my.data[-c(1:grep(top, my.data))]
my.data <- my.data[c(1:(length(my.data)-4))]
aa      <- as.data.frame(my.data)
aa

write.table(my.data, 'c:/users/mmiller21/simple R programs/dummy.log.extraction.txt', quote=F, col.names=F, row.name=F)
my.data2 <- read.table('c:/users/mmiller21/simple R programs/dummy.log.extraction.txt', header = TRUE, row.names = c(1))
my.data2
   model npar AICc DeltaAICc weight Deviance
13    P1   19   94      0.00   0.78        9
12    P2   21   94      2.64   0.20        9
10    P3   15   94      9.44   0.02        9
2     P4   11   94    619.26   0.00        9

I would prefer to avoid having to write and then read my.data to obtain the desired data frame. Prior to that step the current code returns a vector of strings for my.data:

[1] "      model npar  AICc  DeltaAICc weight  Deviance" "13      P1   19    94      0.00     0.78      9"   
[3] "12      P2   21    94      2.64     0.20      9"    "10      P3   15    94      9.44     0.02      9"   
[5] "2       P4   11    94    619.26     0.00      9"

Is there some way I can convert the above vector of strings into a data frame like that in dummy.log.extraction.txt without writing and then reading my.data?

The line:

aa <- as.data.frame(my.data)

returns the following, which looks like what I want:

#                                              my.data
# 1       model npar  AICc  DeltaAICc weight  Deviance
# 2    13      P1   19    94      0.00     0.78      9
# 3    12      P2   21    94      2.64     0.20      9
# 4    10      P3   15    94      9.44     0.02      9
# 5    2       P4   11    94    619.26     0.00      9

However:

dim(aa)
# [1] 5 1

If I can split aa into columns then I think I will have what I want without having to write and then read my.data.

I found the post: Extracting Data from Text Files However, in the posted answer the table in question seems to have a fixed number of rows. In my case the number of rows can vary between 1 and 20. Also, I would prefer to use base R. In my case I think the number of rows between bottom and the last row of the table is a constant (here 4).

I also found the post: How to extract data from a text file using R or PowerShell? However, in my case the column widths are not fixed and I do not know how to split the strings (or rows) so there are only seven columns.

Given all of the above perhaps my question is really how to split the object aa into columns. Thank you for any advice or assistance.

EDIT:

The actual logs are produced by a supercomputer and contain up to 90,000 lines. However, the number of lines varies greatly among logs. That is why I was making use of top and bottom.

解决方案

May be your real log file is totally different and more complex but with this one, you can use read.table directly, you just have to play with the right parameters.

data <- read.table("c:/users/mmiller21/simple R programs/dummy.log",
                   comment.char = ">",
                   nrows = 4,
                   skip = 1,
                   header = TRUE,
                   row.names = 1)

str(data)
## 'data.frame':    4 obs. of  6 variables:
##  $ model    : Factor w/ 4 levels "P1","P2","P3",..: 1 2 3 4
##  $ npar     : int  19 21 15 11
##  $ AICc     : int  94 94 94 94
##  $ DeltaAICc: num  0 2.64 9.44 619.26
##  $ weight   : num  0.78 0.2 0.02 0
##  $ Deviance : int  9 9 9 9

data
##    model npar AICc DeltaAICc weight Deviance
## 13    P1   19   94      0.00   0.78        9
## 12    P2   21   94      2.64   0.20        9
## 10    P3   15   94      9.44   0.02        9
## 2     P4   11   94    619.26   0.00        9

这篇关于从文本文件中提取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆