计算每列中出现的频率 [英] calculating the frequency of occurrences in every column

查看:88
本文介绍了计算每列中出现的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算每一列中特定值的出现频率.

I'm trying to count the frequency of a specific value in every column.

基本上,我正在研究不同的细菌分离株(每行代表)对使用不同抗生素(每列代表)的治疗反应如何. "1"表示分离物对抗生素具有抗性,而"0"表示分离物对抗生素敏感.

Basically, I am looking at how different bacterial isolates (represented by each row) respond to treatment with different antibiotics (represented each column). "1" means the isolate is resistant to the antibiotic, while "0" means the isolate is susceptible to the antibiotic.

antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)

ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)

ab
       antibiotic1 antibiotic2 antibiotic3
1           1           0           0
2           1           0           1
3           0          NA           1
4           1           0           0
5           0           1           0
6           1           1          NA
7          NA           0           1
8           0           0           0
9           1           0           0

因此,从第一行开始,分离株1对抗生素1耐药,对抗生素2敏感,对抗生素3敏感.

So looking at the first row, isolate 1 is resistant to antibiotic 1, sensitive to antibiotic 2, and sensitive to antibiotic 3.

我想计算出对每种抗生素有抗性的分离株的百分比.也就是说,将每列中的"1"总数相加,然后除以每列中的隔离数(不包括我的分母中的NA).

I want to calculate the % of isolates resistant to each antibiotic. i.e. sum the number of "1"s in each column and divide by the number of isolates in each column (excluding NAs from my denominator).

我知道如何计数:

apply(ab, 2, count)

$antibiotic1
   x   freq
1  0    3
2  1    5
3 NA    1

$antibiotic2
   x freq
1  0    6
2  1    2
3 NA    1

$antibiotic3
   x freq
1  0    5
2  1    3
3 NA    1

但是我的实际数据集包含许多不同的抗生素和数百种分离物,所以我希望能够同时在所有列上运行一个函数以产生一个数据框.

But my actual dataset contains many different antibiotics and hundreds of isolates, so I want to be able to run a function across all columns at the same time to yield a dataframe.

我尝试过

counts <- ldply(ab, function(x) sum(x=="1")/(sum(x=="1") +  sum(x=="0")))

但是会产生NA:

          .id V1
1 antibiotic1 NA
2 antibiotic2 NA
3 antibiotic3 NA

我也尝试过:

library(dplyr)
ab %>%
 summarise_each(n = n())) %>%
 mutate(prop.resis = n/sum(n))

但收到一条错误消息,内容为:

but get an error message that reads:

Error in n() : This function should not be called directly

任何建议将不胜感激.

Any advice would be much appreciated.

推荐答案

我只是使用colMeans

colMeans(ab, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3 
#       0.625       0.250       0.375 

请注意,可以很容易地将其概括为计算任何数字的频率.例如,如果您要在所有列中查找数字2的频率,则只需将其修改为colMeans(ab == 2, na.rm = TRUE)

As a side note, this can be easily generalized to calculate the frequency of any number. If, for instance, you were looking for the frequency of the number 2 in all columns, you could simply modify to colMeans(ab == 2, na.rm = TRUE)

或者类似地,只是(这避免了通过列评估进行折衷的矩阵转换)

Or similarly, just (this avoids to matrix conversion with a trade off with by column evaluation)

sapply(ab, mean, na.rm = TRUE)
# antibiotic1 antibiotic2 antibiotic3 
#       0.625       0.250       0.375

这篇关于计算每列中出现的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆