在R数据框中解压缩列表 [英] Unpacking a list in an R dataframe
问题描述
我有一个 dataframe
,其中一个字段包含不同长度的列表.我想将此字段中列表的每个元素提取到其自己的字段中,以便将结果与每个id的每个列表元素一起收集到一个长的 dataframe
中.
I have a dataframe
of which one field comprises lists of varying lengths. I would like to extract each element of the list in this field to its own field so that I can gather the results into a long dataframe
with each list element per id.
这是一个示例 dataframe
dat <- structure(list(id = c("509935", "727889", "864607", "1234243",
"1020959", "221975"), some_date = c("2/09/1967", "28/04/1976",
"22/12/2017", "7/02/2006", "10/03/2019", "21/10/1935"), df_list = list(
"018084131", c("062197171", "062171593"), c("064601923",
"068994009", "069831651"), c("071141584", "073129537"), c("061498574",
"065859718", "067251995", "069447806"), "064623976")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
我已经提供了代码来实现我想要的最终结果,但是,我还没有这样做.这是我尝试过的.
I have come with code to achieve what I want the final result to look like, however, I have not done this the DRY way. Here is what I have tried.
res_n
是如下功能:
res_n <- function(field, n) {
field[n]
}
dat <- dat %>% mutate(res1 = map(df_list, res_n, 1))
dat <- dat %>% mutate(res2 = map(df_list, res_n, 2))
dat <- dat %>% mutate(res3 = map(df_list, res_n, 3))
这将返回一个数据帧,其中包含 df_list
中的三个列表元素中的每个元素在其各自的列中.
This returns a data frame with each of the three list elements from df_list
in their own columns.
由此,我可以实现我打算要做的事情,并生成最终的结果 dataframe
,如下所示:
From this I can achieve what I set out to do and produce a final dataframe
of results, as follows:
dat_final <- gather(dat, test, labno, -df_list, -some_date, -id) %>%
select(-df_list) %>%
mutate(labno = as.integer(labno)) %>%
filter(!is.na(labno))
为了避免我使用的DRY方法,我求助于for循环来尝试消除重复的代码.我正在努力以达到最终结果的方式来实现这一目标.这是我尝试过的for循环.
To avoid the DRY approach I used I resorted to a for loop to try and eliminate the repetitive code. I'm struggling to get this to work in the way I need to achieve the final result. This is the for loop I tried.
for (i in 3) {
dat %>% mutate(paste(res, i, sep = '_') = map(results, res_n, i)) }
有人可以帮助我完善代码以消除产生结果的重复行.
Can someone help me to refine the code to elimiate the repeitive lines that produce the result.
推荐答案
如果最终目标是获取长格式的数据,则可以使用 tidyr
If the final goal is to get data in long format, we can use unnest
from tidyr
tidyr::unnest(dat, cols = df_list)
# id some_date df_list
# <chr> <chr> <chr>
# 1 509935 2/09/1967 018084131
# 2 727889 28/04/1976 062197171
# 3 727889 28/04/1976 062171593
# 4 864607 22/12/2017 064601923
# 5 864607 22/12/2017 068994009
# 6 864607 22/12/2017 069831651
# 7 1234243 7/02/2006 071141584
# 8 1234243 7/02/2006 073129537
# 9 1020959 10/03/2019 061498574
#10 1020959 10/03/2019 065859718
#11 1020959 10/03/2019 067251995
#12 1020959 10/03/2019 069447806
#13 221975 21/10/1935 064623976
这篇关于在R数据框中解压缩列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!