将数据导入到具有未知列数的 R 中? [英] Import data into R with an unknown number of columns?
问题描述
我正在尝试读取具有不同行长的文本文件:
I'm trying to read a text file with different row lengths:
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
为了克服这个问题,我在 read.table 中使用了参数 fill=TRUE,所以:
To overcome this problem, I'm using the argument fill=TRUE in read.table, so:
data<-read.table("test",sep=" ",fill=TRUE)
不幸的是,为了评估最大行长度,read.table 仅读取文件的前 5 行,并生成如下所示的对象:
Unfortunately, to assess the maximum row length, read.table reads only the first 5 lines of the file, and generates an object looking like this:
data
V1 V2 V3 V4 V5
1 1 NA NA NA NA
2 1 2 NA NA NA
3 1 2 3 NA NA
4 1 2 3 4 NA
5 1 2 3 4 5
6 1 2 3 4 5
7 6 NA NA NA NA
8 1 2 3 4 5
9 6 7 NA NA NA
10 1 2 3 4 5
11 6 7 8 NA NA
有没有办法强制 read.table 滚动整个文件以评估最大行长度?我知道一个可能的解决方案是提供列号,例如:
Is there a way to force read.table to scroll over the whole file to assess the maximum row length? I know a possible solution would be to provide the column number, like:
data<-read.table("test",sep=" ",fill=TRUE,col.names=c(1:8))
但是由于我有很多文件,我想在 R 中自动评估这个.有什么建议吗?:-)
But since I have a lot of files, I wanted to assess this automatically within R. Any suggestion? :-)
原始文件不包含累进数字,因此这不是解决方案:
the original file doesn't contain progressive numbers, so this is not a solution:
data1<-read.table("test",sep=" ",fill=TRUE)
data2<-read.table("test",sep=" ",fill=TRUE,col.names=c(1:max(data1))
推荐答案
有一个很好的函数 count.fields
(查看帮助),它计算每行的列数:
There is nice function count.fields
(see help) which counts number of column per row:
count.fields("test", sep = " ")
#[1] 1 2 3 4 5 6 7 8
因此,使用您的第二个解决方案:
So, using your second solution:
no_col <- max(count.fields("test", sep = " "))
data <- read.table("test",sep=" ",fill=TRUE,col.names=1:no_col)
data
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 1 NA NA NA NA NA NA NA
# 2 1 2 NA NA NA NA NA NA
# 3 1 2 3 NA NA NA NA NA
# 4 1 2 3 4 NA NA NA NA
# 5 1 2 3 4 5 NA NA NA
# 6 1 2 3 4 5 6 NA NA
# 7 1 2 3 4 5 6 7 NA
# 8 1 2 3 4 5 6 7 8
这篇关于将数据导入到具有未知列数的 R 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!