在r中使用dplyr for循环 [英] using dplyr in for loop in r
问题描述
我有一个示例数据,如下所示:
I have a sample data df set like :
issue_1 issue_2 issue_3 check cat_1 cat_2 cat_3
a - - 0 1 0 0
- b - 1 0 1 0
- - c 1 0 0 1
p - - 0 1 0 0
- - q 1 0 0 1
- r - 0 0 1 0
a - - 1 1 0 0
a b - 1 1 1 0
来解释,它有多次发生issue_1,issue_2和issue_3,而每一行的支票值为0或1
to explain, it has multiple occurances of issue_1, issue_2 and issue_3 and for each row value of check is either 0 or 1
我需要计算总计每个问题的每个值的出现次数和每个问题的每个值的总计数为1。因此,对于发布1的给定样本,我们有3次出现a和2例,其中a = 1,1例p和0计数为1。对于其他两个问题,同样的。
I need to calculate total occurances of each values for each issue and total count of 1's for each value of each issues. So for given sample for issue_1 we have 3 occurances of a and 2 cases where a = 1 and one case of p and 0 count of 1's for p. Similarly for other two issues.
我使用嵌套for循环,而不是在分组级别计数,它总计行数。有人可以提出一些更好的方法吗?
I used nested for loop but instead of counting at grouped level it is giving total count of rows. Can someone suggest some better way?
示例代码:
abc <- c('issue_1', 'issue_2', 'issue_3')
qwe <- c('cat_1', 'cat_2', 'cat_3')
for(i in abc){
for(j in qwe){
temp <- df[, c(i, j, 'check')]
temp <- subset(temp, temp[[j]] != 0)
temp <- temp %>%
group_by(temp[[i]]) %>%
mutate(total_issue = length(temp[[i]]) %>%
mutate(check_again = length(check[check == 1])) %>%
mutate(percentage = (check_again/total_issue)*100)
temp <- subset(temp, !(duplicated(temp[[i]])))
temp <- temp[, c(i, 'total_issue', 'check_again', 'percentage')]
assign(paste(i, 'stats', sep = '_'), temp)
write.csv(temp, paste('path', i, j, '_stats', '.csv'))
}
}
所以对于这个,对于issue_1和cat_1应该: p>
So for this one, for issue_1 and cat_1 it should give:
issue_1 total_issue check_again percentage
a 3 2 2/3*100
p 1 0 0
推荐答案
这可能是你以后的事情。使用数据中的前四列,我使用melt()来获取长格式的数据。然后,我删除了 -
的行。将数据分组为变量
和值
,我计算每个值(每个字母)每个发行
,总结检查
和计算百分比。
This is probably what you are after. Using the first four columns in the data, I used melt() to have the data in a long format. Then, I removed rows with -
. Grouping the data by variable
and value
, I counted how many times each value (each letter) occurred for each issue
, summed up check
, and calculated percentage.
library(reshape2)
library(dplyr)
melt(mydf[,1:4], id.vars = "check") %>%
filter(value != "-") %>%
group_by(variable, value) %>%
summarise(total = n(), check = sum(check), percent = check / total * 100)
# variable value total check percent
# (fctr) (chr) (int) (int) (dbl)
#1 issue_1 a 3 2 66.66667
#2 issue_1 p 1 0 0.00000
#3 issue_2 b 2 2 100.00000
#4 issue_2 r 1 0 0.00000
#5 issue_3 c 1 1 100.00000
#6 issue_3 q 1 1 100.00000
DATA
mydf <- structure(list(issue_1 = structure(c(2L, 1L, 1L, 3L, 1L, 1L,
2L, 2L), .Label = c("-", "a", "p"), class = "factor"), issue_2 = structure(c(1L,
2L, 1L, 1L, 1L, 3L, 1L, 2L), .Label = c("-", "b", "r"), class = "factor"),
issue_3 = structure(c(1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L), .Label = c("-",
"c", "q"), class = "factor"), check = c(0L, 1L, 1L, 0L, 1L,
0L, 1L, 1L), cat_1 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L), cat_2 = c(0L,
1L, 0L, 0L, 0L, 1L, 0L, 1L), cat_3 = c(0L, 0L, 1L, 0L, 1L,
0L, 0L, 0L)), .Names = c("issue_1", "issue_2", "issue_3",
"check", "cat_1", "cat_2", "cat_3"), class = "data.frame", row.names = c(NA,
-8L))
这篇关于在r中使用dplyr for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!