文本文件到具有列表列的数据帧 [英] text file to dataframe with a list column

查看:93
本文介绍了文本文件到具有列表列的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读这样的文本文件:

  exp1 sample1 2 5 
exp2 sample1 2 3 5 7
exp1 sample2 1 2 6

到具有列表列的数据框:

  tibble(exp = c(exp1,exp2,exp3),
sample = c(sample1,sample1,sample2),
listdata = list(list(2,5),list(2,3,5,7),list(1,2,6)) )

#A tibble:3 x 3
exp sample listdata
< chr> < chr> < list>
1 exp1 sample1< list [2]>
2 exp2 sample1< list [4]>
3 exp3 sample2< list [3]>

目的是使用前两列中的元数据来选择和操作列表。 / p>

我可以在列表中阅读列表,但不知道如何分离元数据:

  listdata<  -  read_lines(list_c_data.txt)%>%strsplit(。,)%>%tibble()
/ pre>

任何建议?我可能需要逐行读取文件,因为观察次数可以> 100000,每行列表的长度可能> 1000

解决方案

我们使用 read.table / read.csv 使用 fill = TRUE ,然后收集(从 tidyr )数据集的第3列到最后一列,将其重新形成长格式,按V1和V2分组,我们总结'Val'作为列表然后 rename 列(如有必要)

  library(dplyr)
library tidyr)
df1< - read.table(yourfile.txt,header = FALSE,fill = TRUE)
gather(df1,Var,Val,V3:ncol(df1),na.rm = TRUE)%>%
group_by(V1,V2)%>%
总结(Val = list(Val))%>%
重命名(exp = V1,sample = V2,listdata = Val)






或者我们可以 scan 来阅读将空格中的行$ strsplit 转换为'lst'中的元素(过滤出第1和第2个)到 numeric rbind 将第一和第二个元素添加到 data.frame 中,并创建lst2作为第三列。

  l1<  -  trimws(scan(yourfile.txt,what =,sep =\\\
,quiet = TRUE))
lst < - strsplit(l1,)
lst2< - lapply(lst,function(x)as.numeric(x [ - (1:2)] ))
d1< - setNames(do.call(rbind.data.frame,lapply(lst,
function(x)x [1:2])),c(exp,样本))
d1 $ listdata < - lst2


I am trying to read in a text file like this:

exp1 sample1 2 5  
exp2 sample1 2 3 5 7
exp1 sample2 1 2 6

to a dataframe with a list column like this:

tibble(exp = c("exp1", "exp2", "exp3"), 
       sample = c("sample1","sample1","sample2"), 
       listdata = list(list(2,5), list(2,3,5,7), list(1,2,6)))

# A tibble: 3 x 3
    exp  sample   listdata
  <chr>   <chr>     <list>
1  exp1 sample1 <list [2]>
2  exp2 sample1 <list [4]>
3  exp3 sample2 <list [3]>

The purpose is to use the metadata in the first two columns to select and operate on the lists.

I can read in the lines as lists, but don't know how to separate the metadata:

listdata <- read_lines("list_c_data.txt") %>% strsplit(., " ") %>% tibble()

Any suggestions? I may need to read in the file line by line since the number of observations could be >100000 and the length of the list in each row could be >1000

解决方案

We read the file using read.table/read.csv with fill = TRUE, then gather (from tidyr) the 3rd to last column of the dataset to reshape it to 'long' format, grouped by 'V1' and 'V2', we summarise the 'Val' as a list and then rename the columns if necessary.

library(dplyr)
library(tidyr)
df1 <- read.table("yourfile.txt", header=FALSE, fill = TRUE)
gather(df1, Var, Val, V3:ncol(df1), na.rm = TRUE) %>%
         group_by(V1, V2) %>% 
         summarise(Val = list(Val)) %>%
         rename(exp=V1, sample = V2, listdata = Val)             


Or we can scan to read the rows, strsplit by space, convert the elements in 'lst' (filtering out the 1st and 2nd) to numeric while we rbind the 1st and 2nd elements to a data.frame and create the 'lst2' as the third column.

l1 <- trimws(scan("yourfile.txt", what ="", sep="\n", quiet=TRUE))
lst <- strsplit(l1, " ")
lst2 <- lapply(lst, function(x) as.numeric(x[-(1:2)]))
d1 <- setNames(do.call(rbind.data.frame, lapply(lst, 
                function(x) x[1:2])), c("exp", "sample"))
d1$listdata <- lst2

这篇关于文本文件到具有列表列的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆