r:由多个列组成计数 [英] r: group by multiple columns and count

查看:104
本文介绍了r:由多个列组成计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框, df

LeftOrRight SpeedCategory   NumThruLanes
R           25to45          3             
L           45to62          2           
R           Gt62            1           

我想通过SpeedCategory进行分组,并循环遍历其他列,以获取每个速度类别中每个唯一代码的频率,如下所示:

I want to group it by SpeedCategory and loop through the other columns to get the frequency of each unique code in each speed category-- something like this:

                 25to45 45to62 Gt62
LeftOrRight    L      0      1    0
               R      1      0    1
NumThruLanes   1      0      0    1
               2      0      1    0
               3      1      0    0

最近我能够来的是这样的:

The closest I have been able to come to is this:

for (col in df){
tbl <- table(col, df$SpeedCategory)
print(tbl)
}

其中打印出以下内容(第一个SpeedCategory,然后NumThruLanes):

Which prints out the following (first SpeedCategory, then NumThruLanes):

col   25to45 45to62 Gt62
  L        0      1    0
  R        1      0    1

col   25to45 45to62 Gt62
  1        0      0    1
  2        0      1    0
  3        1      0    0

我很确定我可以用 aggregate()或者可以从 dplyr的group_by完成我的目标,但我是R的新手,无法弄清楚语法。在 pandas 中,我将使用 MultiIndex ,但我不知道R等同物是什么,所以很难google。

I am pretty sure i can accomplish my goal with aggregate() or maybe group_by from dplyr, but I am new to R and can't figure out the syntax. In pandas I would use a MultiIndex but I don't know what the R equivalent is so it's difficult to google.

我想尝试在一次通过或循环中做所有事情,因为我有十几个列可以通过。

I'd like to try to do everything in one pass, or with a loop, since I have over a dozen columns to get through.

推荐答案

包使得以非常具体的方式格式化表格变得容易。语法需要一些习惯,但是对于这个问题,它很简单:

The tables package makes it easy to format tables in very specific ways. The syntax takes some getting used to, but for this problem it's pretty straight-forward:

exd <- read.table(text = "LeftOrRight SpeedCategory   NumThruLanes
R           25to45          3             
L           45to62          2           
R           Gt62            1", header = TRUE)       

## to get counts by default we need everything to be categorical
exd$SpeedCategory <- factor(exd$SpeedCategory)

library(tables)
tabular(LeftOrRight + NumThruLanes ~ SpeedCategory, data = exd)

##                SpeedCategory            
##                25to45        45to62 Gt62
## LeftOrRight  L 0             1      0   
##              R 1             0      1   
## NumThruLanes 1 0             0      1   
##              2 0             1      0   
##              3 1             0      0

如果你有很多列可以迭代,你c一个程式化的公式,例如,

If you have a lot of columns to iterate over, you can construct the formula programatically, e.g.,

tabular(as.formula(paste(paste(names(exd)[-2], collapse = " + "),
                         names(exd)[2], sep = " ~ ")),
        data = exd)

作为奖金,有 html latex ,可以轻松地将您的表格标记为包含在文章或报告中。

As a bonus there are html and latex methods, making it easy to mark your table up for inclusion in an article or report.

这篇关于r:由多个列组成计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆