r:由多个列组成计数 [英] r: group by multiple columns and count
问题描述
我有以下数据框, df
:
LeftOrRight SpeedCategory NumThruLanes
R 25to45 3
L 45to62 2
R Gt62 1
我想通过SpeedCategory进行分组,并循环遍历其他列,以获取每个速度类别中每个唯一代码的频率,如下所示:
I want to group it by SpeedCategory and loop through the other columns to get the frequency of each unique code in each speed category-- something like this:
25to45 45to62 Gt62
LeftOrRight L 0 1 0
R 1 0 1
NumThruLanes 1 0 0 1
2 0 1 0
3 1 0 0
最近我能够来的是这样的:
The closest I have been able to come to is this:
for (col in df){
tbl <- table(col, df$SpeedCategory)
print(tbl)
}
其中打印出以下内容(第一个SpeedCategory,然后NumThruLanes):
Which prints out the following (first SpeedCategory, then NumThruLanes):
col 25to45 45to62 Gt62
L 0 1 0
R 1 0 1
col 25to45 45to62 Gt62
1 0 0 1
2 0 1 0
3 1 0 0
我很确定我可以用 aggregate()
或者可以从 dplyr的group_by完成我的目标
,但我是R的新手,无法弄清楚语法。在 pandas
中,我将使用 MultiIndex
,但我不知道R等同物是什么,所以很难google。
I am pretty sure i can accomplish my goal with aggregate()
or maybe group_by from dplyr
, but I am new to R and can't figure out the syntax. In pandas
I would use a MultiIndex
but I don't know what the R equivalent is so it's difficult to google.
我想尝试在一次通过或循环中做所有事情,因为我有十几个列可以通过。
I'd like to try to do everything in one pass, or with a loop, since I have over a dozen columns to get through.
推荐答案
表
包使得以非常具体的方式格式化表格变得容易。语法需要一些习惯,但是对于这个问题,它很简单:
The tables
package makes it easy to format tables in very specific ways. The syntax takes some getting used to, but for this problem it's pretty straight-forward:
exd <- read.table(text = "LeftOrRight SpeedCategory NumThruLanes
R 25to45 3
L 45to62 2
R Gt62 1", header = TRUE)
## to get counts by default we need everything to be categorical
exd$SpeedCategory <- factor(exd$SpeedCategory)
library(tables)
tabular(LeftOrRight + NumThruLanes ~ SpeedCategory, data = exd)
## SpeedCategory
## 25to45 45to62 Gt62
## LeftOrRight L 0 1 0
## R 1 0 1
## NumThruLanes 1 0 0 1
## 2 0 1 0
## 3 1 0 0
如果你有很多列可以迭代,你c一个程式化的公式,例如,
If you have a lot of columns to iterate over, you can construct the formula programatically, e.g.,
tabular(as.formula(paste(paste(names(exd)[-2], collapse = " + "),
names(exd)[2], sep = " ~ ")),
data = exd)
作为奖金,有 html
和 latex
,可以轻松地将您的表格标记为包含在文章或报告中。
As a bonus there are html
and latex
methods, making it easy to mark your table up for inclusion in an article or report.
这篇关于r:由多个列组成计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!