lapply()输出为具有多个功能的数据框-R [英] lapply() output as a dataframe of multiple functions - R
问题描述
我一直在尝试使用lapply()
通过多次计算创建一个新的数据框.到目前为止,我已经阅读了几个问题( 1 , 3 ):
I have been trying to create a new dataframe from several computations with lapply()
. I have reached this so far reading several questions (1, 2, 3):
lapply(mtcars, function(x) c(colnames(x),
NROW(unique(x)),
sum(is.na(x)),
round(sum(is.na(x))/NROW(x),2)
)
)
但是,colnames(x)
并没有给出x
的名字,它是一个向量.其次,我想不出一种将输出转换为数据框的方法:
However, colnames(x)
doesn't give the colname as x
it's a vector. Second, I can't figure out a way to transform this output into a dataframe:
lapply(mtcars, function(x) data.frame(NROW(unique(x)), # if I put colnames(x) here it gives an error
sum(is.na(x)),
round(sum(is.na(x))/NROW(x),2)
)
)
如您在上面看到的,最终数据框应遵循以下结构:
As you might see above, the final dataframe should follow a structure like:
| Variable_name | sum_unique | NA_count | NA_percent |
推荐答案
以下方法将起作用.首先,创建一个列表,将每个元素作为数据框,然后组合所有数据框以获取最终输出.
The following will work. First, create a list with each element as a data frame, and then combine all data frames to get the final output.
lst <- lapply(1:ncol(mtcars), function(i){
x <- mtcars[[i]]
data.frame(
Variable_name = colnames(mtcars)[[i]],
sum_unique = NROW(unique(x)),
NA_count = sum(is.na(x)),
NA_percent = round(sum(is.na(x))/NROW(x),2))
})
do.call(rbind, lst)
# Variable_name sum_unique NA_count NA_percent
# 1 mpg 25 0 0
# 2 cyl 3 0 0
# 3 disp 27 0 0
# 4 hp 22 0 0
# 5 drat 22 0 0
# 6 wt 29 0 0
# 7 qsec 30 0 0
# 8 vs 2 0 0
# 9 am 2 0 0
# 10 gear 3 0 0
# 11 carb 6 0 0
由于您用tidyverse
标记了此帖子,因此在这里,我提供了另一个使用map_dfr
的替代方法,它可以使代码更简洁.
Since you tagged this post with tidyverse
, here I provided another alternative that uses map_dfr
, which leads to a more concise code.
library(tidyverse)
map_dfr(mtcars, function(x){
tibble(sum_unique = NROW(unique(x)),
NA_count = sum(is.na(x)),
NA_percent = round(sum(is.na(x))/NROW(x),2))
}, .id = "Variable_name")
# # A tibble: 11 x 4
# Variable_name sum_unique NA_count NA_percent
# <chr> <int> <int> <dbl>
# 1 mpg 25 0 0
# 2 cyl 3 0 0
# 3 disp 27 0 0
# 4 hp 22 0 0
# 5 drat 22 0 0
# 6 wt 29 0 0
# 7 qsec 30 0 0
# 8 vs 2 0 0
# 9 am 2 0 0
# 10 gear 3 0 0
# 11 carb 6 0 0
最后,使用dplyr
和tidyr
中的功能的另一种解决方案.
Finally, another solution using functions from dplyr
and tidyr
.
mtcars %>%
summarize_all(
list(
sum_unique = function(x) NROW(unique(x)),
NA_count = function(x) sum(is.na(x)),
NA_percent = function(x) round(sum(is.na(x))/NROW(x),2)
)
) %>%
pivot_longer(everything(),
names_to = "column",
values_to = "value") %>%
separate(column, into = c("Variable_name", "parameter"), sep = "_", extra = "merge") %>%
pivot_wider(names_from = "parameter", values_from = "value")
# # A tibble: 11 x 4
# Variable_name sum_unique NA_count NA_percent
# <chr> <int> <int> <dbl>
# 1 mpg 25 0 0
# 2 cyl 3 0 0
# 3 disp 27 0 0
# 4 hp 22 0 0
# 5 drat 22 0 0
# 6 wt 29 0 0
# 7 qsec 30 0 0
# 8 vs 2 0 0
# 9 am 2 0 0
# 10 gear 3 0 0
# 11 carb 6 0 0
这篇关于lapply()输出为具有多个功能的数据框-R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!