文本文件到具有列表列的数据帧 [英] text file to dataframe with a list column
问题描述
我正在读这样的文本文件:
exp1 sample1 2 5
exp2 sample1 2 3 5 7
exp1 sample2 1 2 6
到具有列表列的数据框:
tibble(exp = c(exp1,exp2,exp3),
sample = c(sample1,sample1,sample2),
listdata = list(list(2,5),list(2,3,5,7),list(1,2,6)) )
#A tibble:3 x 3
exp sample listdata
< chr> < chr> < list>
1 exp1 sample1< list [2]>
2 exp2 sample1< list [4]>
3 exp3 sample2< list [3]>
目的是使用前两列中的元数据来选择和操作列表。 / p>
我可以在列表中阅读列表,但不知道如何分离元数据:
listdata< - read_lines(list_c_data.txt)%>%strsplit(。,)%>%tibble()
/ pre>
任何建议?我可能需要逐行读取文件,因为观察次数可以> 100000,每行列表的长度可能> 1000
解决方案我们使用
read.table / read.csv
使用fill = TRUE
,然后收集
(从tidyr
)数据集的第3列到最后一列,将其重新形成长格式,按V1和V2分组,我们总结
'Val'作为列表
然后rename
列(如有必要)library(dplyr)
library tidyr)
df1< - read.table(yourfile.txt,header = FALSE,fill = TRUE)
gather(df1,Var,Val,V3:ncol(df1),na.rm = TRUE)%>%
group_by(V1,V2)%>%
总结(Val = list(Val))%>%
重命名(exp = V1,sample = V2,listdata = Val)
或者我们可以
scan
来阅读将空格中的行$strsplit 转换为'lst'中的元素(过滤出第1和第2个)到 numeric
而rbind
将第一和第二个元素添加到data.frame
中,并创建lst2作为第三列。l1< - trimws(scan(yourfile.txt,what =,sep =\\\
,quiet = TRUE))
lst < - strsplit(l1,)
lst2< - lapply(lst,function(x)as.numeric(x [ - (1:2)] ))
d1< - setNames(do.call(rbind.data.frame,lapply(lst,
function(x)x [1:2])),c(exp,样本))
d1 $ listdata < - lst2
I am trying to read in a text file like this:
exp1 sample1 2 5 exp2 sample1 2 3 5 7 exp1 sample2 1 2 6
to a dataframe with a list column like this:
tibble(exp = c("exp1", "exp2", "exp3"), sample = c("sample1","sample1","sample2"), listdata = list(list(2,5), list(2,3,5,7), list(1,2,6))) # A tibble: 3 x 3 exp sample listdata <chr> <chr> <list> 1 exp1 sample1 <list [2]> 2 exp2 sample1 <list [4]> 3 exp3 sample2 <list [3]>
The purpose is to use the metadata in the first two columns to select and operate on the lists.
I can read in the lines as lists, but don't know how to separate the metadata:
listdata <- read_lines("list_c_data.txt") %>% strsplit(., " ") %>% tibble()
Any suggestions? I may need to read in the file line by line since the number of observations could be >100000 and the length of the list in each row could be >1000
解决方案We read the file using
read.table/read.csv
withfill = TRUE
, thengather
(fromtidyr
) the 3rd to last column of the dataset to reshape it to 'long' format, grouped by 'V1' and 'V2', wesummarise
the 'Val' as alist
and thenrename
the columns if necessary.library(dplyr) library(tidyr) df1 <- read.table("yourfile.txt", header=FALSE, fill = TRUE) gather(df1, Var, Val, V3:ncol(df1), na.rm = TRUE) %>% group_by(V1, V2) %>% summarise(Val = list(Val)) %>% rename(exp=V1, sample = V2, listdata = Val)
Or we can
scan
to read the rows,strsplit
by space, convert the elements in 'lst' (filtering out the 1st and 2nd) tonumeric
while werbind
the 1st and 2nd elements to adata.frame
and create the 'lst2' as the third column.l1 <- trimws(scan("yourfile.txt", what ="", sep="\n", quiet=TRUE)) lst <- strsplit(l1, " ") lst2 <- lapply(lst, function(x) as.numeric(x[-(1:2)])) d1 <- setNames(do.call(rbind.data.frame, lapply(lst, function(x) x[1:2])), c("exp", "sample")) d1$listdata <- lst2
这篇关于文本文件到具有列表列的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!