为堆叠频率表中的每个组添加总计为n的列 [英] Adding a column of total n for each group in a stacked frequency table

查看:18
本文介绍了为堆叠频率表中的每个组添加总计为n的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据:

id    animal    color     shape
1      bear     orange    circle
2.     dog      NA        triangle
3.     NA       yellow    square
4.     cat      yellow    square
5.     NA       yellow    rectangle

如果我运行此代码:

df1 <- df %>% 
  pivot_longer(
    -id,
    names_to = "Variable",
    values_to = "Level"
  ) %>% 
  group_by(Variable, Level) %>% 
  summarise(freq = n()) %>% 
  mutate(percent = freq/sum(freq)*100) %>% 
  mutate(Variable = ifelse(duplicated(Variable), NA, Variable)) %>% 
  ungroup()

我可以获得以下输出:

Variable     Level       freq(n=5)   percent

animal        bear          1           33.3
              dog           1           33.3
              cat           1           33.3
              

color         orange        1           25.0
              yellow        3           75.0
             

shape         circle        1           20.0
              triangle      1           20.0
              square        2           40.0
              rectangle     1           20.0
             

不过,我还想在每个变量后面添加一行总计:

Variable     Level       freq(n=5)   percent

animal        bear          1           33.3
              dog           1           33.3
              cat           1           33.3
              total         3           100.0

color         orange        1           25.0
              yellow        3           75.0
              total         4           100.0

shape         circle        1           20.0
              triangle      1           20.0
              square        2           40.0
              rectangle     1           20.0
              total         5           100.0

我尝试了不同变体的MARTICATE和SUMMARY,但始终收到参数&QOOT;的&INVALID‘TYPE’(闭包)错误。

推荐答案

如果在定义df1时少了一步,

df1 <- df %>%
  pivot_longer( -id, names_to = "Variable", values_to = "Level" ) %>%
  group_by(Variable, Level) %>%
  summarise(freq = n()) %>%
  mutate(percent = freq/sum(freq)*100)

df1
# # A tibble: 11 x 4
# # Groups:   Variable [3]
#    Variable Level      freq percent
#    <chr>    <chr>     <int>   <dbl>
#  1 animal   bear          1      20
#  2 animal   cat           1      20
#  3 animal   dog           1      20
#  4 animal   <NA>          2      40
#  5 color    orange        1      20
#  6 color    yellow        3      60
#  7 color    <NA>          1      20
#  8 shape    circle        1      20
#  9 shape    rectangle     1      20
# 10 shape    square        2      40
# 11 shape    triangle      1      20

然后我们可以使用组摘要对其进行扩充(并重新排序):

df1 %>%
  group_by(Variable) %>%
  summarize(Level = "total", across(freq:percent, sum)) %>%
  bind_rows(df1) %>%
  arrange(Variable, !is.na(Level), Level == "total", Level) %>%
  mutate(Variable = ifelse(duplicated(Variable), NA, Variable))
# # A tibble: 14 x 4
#    Variable Level      freq percent
#    <chr>    <chr>     <int>   <dbl>
#  1 animal   <NA>          2      40
#  2 <NA>     bear          1      20
#  3 <NA>     cat           1      20
#  4 <NA>     dog           1      20
#  5 <NA>     total         5     100
#  6 color    <NA>          1      20
#  7 <NA>     orange        1      20
#  8 <NA>     yellow        3      60
#  9 <NA>     total         5     100
# 10 shape    circle        1      20
# 11 <NA>     rectangle     1      20
# 12 <NA>     square        2      40
# 13 <NA>     triangle      1      20
# 14 <NA>     total         5     100

这篇关于为堆叠频率表中的每个组添加总计为n的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆