R中的条件交叉制表 [英] Conditional Cross tabulation in R

查看:164
本文介绍了R中的条件交叉制表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找使用"expss"包完成以下任务的最快方法.

Looking for the quickest way to achieve below task using "expss" package.

有了大量的"expss"软件包,我们可以轻松地进行交叉列表(这具有交叉列表的其他优势和有用的功能.),我们可以轻松地对多个变量进行交叉列表,如下所示.

With a great package of "expss", we can easily do cross tabulation (which has other advantage and useful functions for cross-tabulations.), we can cross-tabulate multiple variables easily like below.

 #install.packages("expss")

 library("expss")
 data(mtcars)


  var1 <- "vs, am, gear, carb"
  var_names = trimws(unlist(strsplit(var1, split = ","))) 


  mtcars %>%
    tab_prepend_values %>%
    tab_cols(total(), ..[(var_names)]) %>%
    tab_cells(cyl) %>%
    tab_stat_cpct() %>%
    tab_pivot()

上方给出的输出为:(列%)

Above gives an output as: (column %)

                      #Total    vs          am          gear            carb                        
                                0     1     0     1     3     4   5     1   2    3   4    6    8 

  cyl             4    34.4   5.6  71.4  15.8  61.5   6.7  66.7  40  71.4  60                    
                  6    21.9  16.7  28.6  21.1  23.1  13.3  33.3  20  28.6           40  100      
                  8    43.8  77.8        63.2  15.4  80.0        40        40  100  60       100 
       #Total cases    32.0  18.0  14.0  19.0  13.0  15.0  12.0   5   7.0  10    3  10    1    1 

但是,正在寻找一种创建表的方法,如下所示:

However, looking for an approach to create a table like below:

 CYL    |  VS = 0   |  AM = 1   |   Gear = 4 or Gear = 5    |  Carb (All)
   4        5.56        61.54               58.82                34.38
   6        16.67       23.08               29.41                21.88
   8        77.78       15.38               11.76                43.75

Total(col%) 100.00      100.00              100.00               100.00

尽管我可以使用dplyr和join函数来实现这一点,但是如果我们必须在运行时或动态地传递变量,这太复杂了.

Though i can achive this using dplyr and join functions but that is too complex incase we have to pass variables in runtime or dynamically.

任何帮助都将适用.谢谢!

Any help will be appriciable. Thanks!!

推荐答案

您可以尝试以下方法:

1)制作一个可以在总和之外创建比例的函数.

1) Making a function which can create proportion out of the sum.

myprop_tbl <- function(x){
    return(round(x*100/sum(x),2))
}

2)使用purrr的映射,将函数应用于数据框,然后绑定结果.

2) Using purrr's map, applying the function on your data frame and then binding the result.

library(tidyverse)
tab <- mtcars %>% 
    group_by(cyl) %>% 
    summarise(vs_sum = sum(vs==0), am_sum = sum(am==1), 
              gear_sum = sum(gear == 4|gear==5), carb_sum= n())

finaltab <- bind_cols(tab[,1],map_df(tab[,2:length(tab)], myprop_tbl))

输出:

# * cyl vs_sum am_sum gear_sum carb_sum
#  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
#1  4.00   5.56   61.5     58.8     34.4
#2  6.00  16.7    23.1     29.4     21.9
#3  8.00  77.8    15.4     11.8     43.8**

与OP讨论后,似乎他也想传递函数字符串,

After had a discussion with OP, it seems he also wanted to pass string of functions,

我在这里使用软件包seplyr

tab <- mtcars %>% 
    group_by(cyl) %>% 
    summarise_se(c("vs_sum = sum(vs==0)",
              "am_sum = sum(am==1)",
              "gear_sum = sum(gear == 4|gear==5)", 
              "carb_sum = n()"))

它也可以使用,但是您会得到奇怪的名称,以解决您可以执行以下操作:

It works also, but weired names you will get, to fix that you can do this:

这非常适合作为我发布的原始答案:

This works perfectly as original answer which I have posted:

tab <- mtcars %>% 
    group_by(cyl) %>% 
    summarise_se(c("vs_sum" := "sum(vs==0)",
              "am_sum" := "sum(am==1)",
              "gear_sum" := "sum(gear == 4|gear==5)", 
              "carb_sum" := "n()"))

您可以在此处@此链接

这篇关于R中的条件交叉制表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆