lapply() 输出为多个函数的数据帧 - R [英] lapply() output as a dataframe of multiple functions - R

查看:23
本文介绍了lapply() 输出为多个函数的数据帧 - R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用 lapply() 从几个计算中创建一个新的数据帧.到目前为止,我已经阅读了几个问题(123):

I have been trying to create a new dataframe from several computations with lapply(). I have reached this so far reading several questions (1, 2, 3):

lapply(mtcars, function(x) c(colnames(x), 
                             NROW(unique(x)), 
                             sum(is.na(x)), 
                             round(sum(is.na(x))/NROW(x),2)   
                        )
       )

然而,colnames(x) 没有将 colname 作为 x 给出,它是一个向量.其次,我想不出将这个输出转换成数据帧的方法:

However, colnames(x) doesn't give the colname as x it's a vector. Second, I can't figure out a way to transform this output into a dataframe:

lapply(mtcars, function(x) data.frame(NROW(unique(x)), # if I put colnames(x) here it gives an error
                                      sum(is.na(x)), 
                                      round(sum(is.na(x))/NROW(x),2)   
                        )
       )

正如您在上面看到的,最终的数据帧应该遵循如下结构:

As you might see above, the final dataframe should follow a structure like:

| Variable_name | sum_unique | NA_count | NA_percent |

推荐答案

以下将起作用.首先创建一个列表,每个元素作为一个数据框,然后将所有数据框组合起来得到最终的输出.

The following will work. First, create a list with each element as a data frame, and then combine all data frames to get the final output.

lst <- lapply(1:ncol(mtcars), function(i){
  x <- mtcars[[i]]
  data.frame(
    Variable_name = colnames(mtcars)[[i]],
    sum_unique = NROW(unique(x)), 
    NA_count = sum(is.na(x)), 
    NA_percent = round(sum(is.na(x))/NROW(x),2))  
  })

do.call(rbind, lst)
#    Variable_name sum_unique NA_count NA_percent
# 1            mpg         25        0          0
# 2            cyl          3        0          0
# 3           disp         27        0          0
# 4             hp         22        0          0
# 5           drat         22        0          0
# 6             wt         29        0          0
# 7           qsec         30        0          0
# 8             vs          2        0          0
# 9             am          2        0          0
# 10          gear          3        0          0
# 11          carb          6        0          0

因为你用 tidyverse 标记了这篇文章,这里我提供了另一种使用 map_dfr 的替代方法,这会导致更简洁的代码.

Since you tagged this post with tidyverse, here I provided another alternative that uses map_dfr, which leads to a more concise code.

library(tidyverse)

map_dfr(mtcars, function(x){
  tibble(sum_unique = NROW(unique(x)), 
         NA_count = sum(is.na(x)), 
         NA_percent = round(sum(is.na(x))/NROW(x),2))
}, .id = "Variable_name")
# # A tibble: 11 x 4
#    Variable_name sum_unique NA_count NA_percent
#    <chr>              <int>    <int>      <dbl>
#  1 mpg                   25        0          0
#  2 cyl                    3        0          0
#  3 disp                  27        0          0
#  4 hp                    22        0          0
#  5 drat                  22        0          0
#  6 wt                    29        0          0
#  7 qsec                  30        0          0
#  8 vs                     2        0          0
#  9 am                     2        0          0
# 10 gear                   3        0          0
# 11 carb                   6        0          0

最后,另一个使用 dplyrtidyr 函数的解决方案.

Finally, another solution using functions from dplyr and tidyr.

mtcars %>%
  summarize_all(
    list(
      sum_unique = function(x) NROW(unique(x)), 
      NA_count = function(x) sum(is.na(x)), 
      NA_percent = function(x) round(sum(is.na(x))/NROW(x),2)
    )
  ) %>%
  pivot_longer(everything(), 
               names_to = "column", 
               values_to = "value") %>%
  separate(column, into = c("Variable_name", "parameter"), sep = "_", extra = "merge") %>%
  pivot_wider(names_from = "parameter", values_from = "value")
# # A tibble: 11 x 4
#    Variable_name sum_unique NA_count NA_percent
#    <chr>              <int>    <int>      <dbl>
#  1 mpg                   25        0          0
#  2 cyl                    3        0          0
#  3 disp                  27        0          0
#  4 hp                    22        0          0
#  5 drat                  22        0          0
#  6 wt                    29        0          0
#  7 qsec                  30        0          0
#  8 vs                     2        0          0
#  9 am                     2        0          0
# 10 gear                   3        0          0
# 11 carb                   6        0          0

这篇关于lapply() 输出为多个函数的数据帧 - R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆