遍历数据集以计算资产剥离 [英] Loop through dataset to calculate diveristy

查看:37
本文介绍了遍历数据集以计算资产剥离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据集:

I have a dataset like so:

 set.seed(1345)
 df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
           species=sample(LETTERS[1:10], 30, replace= TRUE))

我想遍历每个月并计算物种多样性.我知道 library("vegan")中的 diversity 之类的功能,并且知道使用该路线解决我的问题的方法(下面提供的代码),但作为我自己的练习对于循环,我试图创建一个 for循环或函数,以显示Shannons分集和Simpsons分集的特定计算,以使每个索引的计算都不神秘.它们使用以下公式计算:

I would like to loop through each month and calculate species diversity. I am aware of functions like diversity in library("vegan"), and know solutions to my question using that route (code provided below), but as an exercise for myself with loops I am trying to create a for loop or function that shows the specific calculations for Shannons diversity and Simpsons Diversity so that the calculations for each index are not mysterious. They are calculated using the following formulas:

到目前为止,我已经为Simpsons尝试了以下方法:

Thus far I have tried the following for Simpsons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 p.sqrd<-p*p
 div[i]<-1/sum(p.sqrd)
   }}

对于香农,还有以下内容:

And the following for Shannons:

df <- 
 df %>% 
  group_by(month, species) %>% 
  summarise(freq = n()) 

div<-NA
 for (i in length(unique(df$month))) {
 sum<- sum(df$freq)
 for (i in unique (df$freq)){
 p<- df$freq /sum
 log.p<-ln(p)
 div[i]<- sum(p[i]*ln(p[i]))
   }}

我不是在创建成功的循环,而是希望帮助正确索引该循环并创建效率最高的循环(即合并 df<-df%&%; group_by(month,species)%&%;%summarise(freq = n())放入循环中)和一个for循环,可以清楚地说明循环中的方程式.

I am not creating a successful loop and would like help indexing this loop correctly and creating one that is most efficient (i.e. incorporating df <- df %>% group_by(month, species) %>% summarise(freq = n()) into the loop) and a for loop that clearly illustrates the equation within the loop.

使用 diversity 函数,以下是Simpson多样性的答案:

Using the the diversity function, here are the answers for Simpson's diversity:

library("tidyverse")
df <- 
 df %>% 
 group_by(month, species) %>% 
 summarise(freq = n()) 

# Cast dataframe of interaction frequencies into a matrix
library("reshape2")
ph_mat<- dcast(df,  month~ species)
ph_mat[is.na(ph_mat)] <- 0 #changes 

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="simpson"), 
               month=unique(ph_mat$month))

对于香农:

library("vegan")
df<- data.frame(div=diversity(ph_mat, index="shannon"), 
               month=unique(ph_mat$month))

推荐答案

我在这里有一个不包含for循环的解决方案,但是在这里我定义和解释了一个计算每个索引的函数(没有奥秘!),它可以计算每个多样性指标.它使用 dplyr 中的 group_by() summarize()函数.

I have a solution here that does not incorporate for loops, but where I define and explain a function to calculate each index (no mystery!) It calculates each diversity metric for each month. It uses the group_by() and summarize() functions from dplyr.

set.seed(1345)
df<-data.frame(month= c(rep(1,10), rep(2, 10), rep(3, 10)), 
               species=sample(LETTERS[1:10], 30, replace= TRUE))

calc_shannon <- function(community) {
  p <- table(community)/length(community) # Find proportions
  p <- p[p > 0] # Get rid of zero proportions (log zero is undefined)
  -sum(p * log(p)) # Calculate index
}

calc_simpson <- function(community) {
  p <- table(community)/length(community) # Find proportions
  1 / sum(p^2) # Calculate index
}

diversity_metrics <- 
  df %>% 
  group_by(month) %>% 
  summarize(shannon = calc_shannon(species),
            simpson = calc_simpson(species))

这篇关于遍历数据集以计算资产剥离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆