如何向量化长度-频率计算? [英] How to vectorize length-frequency calculation?
问题描述
目前,我有一个很长的代码,带有for循环,用于计算数据集不同成熟度下各种长度的频率,我想对代码进行矢量化处理/找到一个更优雅的解决方案,但是到目前为止,我已经无法解决该怎么做.频率计算是一个相对简单的方法:
(count of occurances of a specific length at a certain maturity/total number of females or males)*100
At the moment I have a quite long code with a for loop calculating the frequency of the various lengths at different maturities of a dataset, I would like to vectorize the code/find a more elegant solution, however so far I've not been able to work out how to do that. The frequency calculation is a relatively simple one:
(count of occurances of a specific length at a certain maturity/total number of females or males)*100
示例数据:
Species Sex Maturity Length
1 HAK M 1 7
2 HAK M 2 24
3 HAK F 2 10
4 HAK M 3 25
5 HAK F 5 25
6 HAK F 4 12
我当前正在使用的代码:
Code that I'm currently using:
reps <- seq(min(Length), max(Length), by = 1)
m1 <- m2 <- m3 <- m4 <- m5 <- rep(NA, length(reps))
f1 <- f2 <- f3 <- f4 <- f5 <- rep(NA, length(reps))
# Makes vectors for each maturity stage for both sexes
# same length as the reps vector filled with NA for the loop:
# Loop:
for (i in 1:length(reps)) # repeats for each value of the x axis
{
m1[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 1])/total.m*100
m2[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 2])/total.m*100
m3[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 3])/total.m*100
m4[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 4])/total.m*100
m5[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 5])/total.m*100
f1[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 1])/total.f*100
f2[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 2])/total.f*100
f3[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 3])/total.f*100
f4[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 4])/total.f*100
f5[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 5])/total.f*100
}
#Stitching together the output of the loop.
males_all<-rbind(m1, m2, m3, m4, m5)
females_all<-rbind(f1, f2, f3, f4, f5)
这是我通常从循环中获得的输出:
This is the output I usually get from the loop:
mat X8 X9 X10 X11 X12 X14 X15
1 m1 0.104712 0.104712 0.6282723 1.3612565 1.884817 0.1047120 0.2094241
2 m2 0.000000 0.000000 0.3141361 0.8376963 2.198953 2.4083770 1.3612565
3 m3 0.000000 0.000000 0.0000000 0.0000000 0.104712 0.2094241 0.1047120
4 m4 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
5 m5 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.2094241
mat
之后的列是长度,为了简洁起见,我没有全部包括在内,它们最多可以增加30个左右. females_all
看起来与mat
列中的f1, f2
等相同.
The columns after mat
are the lengths, for the sake of brevity I've not included all of them, they would go up to 30 or so. The females_all
looks the same, just with f1, f2
etc. in the mat
column.
推荐答案
据我所知,这就是您想要的:
Near as I can tell, this is what you want:
library(dplyr)
counts = count(df, Sex, Maturity, Length)
totals = count(df, Sex, name = "total")
counts = counts %>% left_join(totals) %>%
mutate(prop = n / total)
# # Joining, by = "Sex"
# # A tibble: 6 x 6
# Sex Maturity Length n total prop
# <fct> <int> <int> <int> <int> <dbl>
# 1 F 2 10 1 3 0.333
# 2 F 4 12 1 3 0.333
# 3 F 5 25 1 3 0.333
# 4 M 1 7 1 3 0.333
# 5 M 2 24 1 3 0.333
# 6 M 3 25 1 3 0.333
counts %>% select(Sex, Maturity, Length, prop) %>%
tidyr::spread(key = Length, value = prop, fill = 0)
# # A tibble: 6 x 7
# Sex Maturity `7` `10` `12` `24` `25`
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 F 2 0 0.333 0 0 0
# 2 F 4 0 0 0.333 0 0
# 3 F 5 0 0 0 0 0.333
# 4 M 1 0.333 0 0 0 0
# 5 M 2 0 0 0 0.333 0
# 6 M 3 0 0 0 0 0.333
使用此数据:
df = read.table(text = " Species Sex Maturity Length
1 HAK M 1 7
2 HAK M 2 24
3 HAK F 2 10
4 HAK M 3 25
5 HAK F 5 25
6 HAK F 4 12", header = T)
这篇关于如何向量化长度-频率计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!