R 中多样性的 Blau 指数 [英] Blau index of Diversity in R
问题描述
我正在尝试在我的数据框中计算 R 中的 Blau 多样性指数 (gini-simpson).我为一组中的每个人设置了 6 列,值范围从学生"、教师"、校友"不适用".如果组小于 6,则列中也有 NA.
我想计算跨行的 Blau 指数(整个组的多样性),而不是在每列中,na.rm= TRUE.
有谁知道如何在 R 中做到这一点?
非常感谢!
其中 R 表示类型总数, 是 ith 类型的比例丰度.
我们定义了一个函数,它接受一个字符串向量并返回 GS 索引:
#定义计算基尼-辛普森指数的函数# 我们确保 x 的相同级别(存在或不存在)# 按因子(x,级别 = 类型)# 请注意,默认情况下不会考虑 NAget.GS.index <- function(x, type) {x <-因子(x,水平=类型);return(1 - sum(prop.table(table(x))^2));}
我们现在可以将 get.GS.index
应用到数据帧的所有行
apply(df, 1, get.GS.index, type)#[1] 0.6250000 0.4444444 0.6250000 0.0000000 0.6400000 0.6666667 0.5000000#[8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000#[15] 0.0000000 0.0000000 0.6111111 0.4444444 0.6666667 0.6400000
<小时>
更新
如果一组中只有一种类型,我们可以修改函数get.GS.index
以返回NA
.
get.GS.index <- function(x, type) {x <-因子(x,水平=类型);t <-表(x);if (length(t[t>0]) == 1) return(NA) else return(1 - sum(prop.table(t)^2));}应用(df,1,get.GS.index,类型);# [1] 0.6250000 0.4444444 0.6250000 不适用 0.6400000 0.6666667 0.5000000# [8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000#[15] 不适用 不适用 0.6111111 0.4444444 0.6666667 0.6400000
I am trying to calculate the Blau index of diversity (gini-simpson) in R on my data frame. I have 6 columns for each person in a group, with values ranging from "Student", "Faculty", "Alumni" "Not Applicable". There are also NA's within the columns if a group is smaller than 6.
I would like to calculate the Blau index across the rows (the diversity across the entire group) not within each column, with na.rm= TRUE.
Does anyone know how to do this in R?
Thanks so much!
See here for a picture of data frame
We can calculate the Gini-Simpson index quite easily by hand.
First off, I'll generate some sample data:
# Generate sample data
set.seed(2017);
type <- c("Student", "Faculty", "Alumni");
data <- sample(type, 6 * 20, replace = TRUE);
# Replace 40 entries with NAs
set.seed(2017);
data[sample(6 * 20, 40)] <- NA;
# Reformat as 6 column dataframe
df <- as.data.frame(matrix(data, ncol = 6), stringsAsFactors = FALSE);
names(df) <- paste0("e", seq(1:6), "_affiliation");
head(df);
#e1_affiliation e2_affiliation e3_affiliation e4_affiliation e5_affiliation
#1 <NA> Faculty <NA> Student Student
#2 <NA> <NA> <NA> Faculty Alumni
#3 <NA> Alumni Student Faculty Faculty
#4 Student <NA> <NA> <NA> <NA>
#5 <NA> Student Alumni Alumni Student
#6 Alumni Alumni Faculty Faculty Student
# e6_affiliation
#1 Alumni
#2 Alumni
#3 <NA>
#4 Student
#5 Faculty
#6 Student
The Gini-Simpson (= Gibbs-Martin = Blau) index of diversity is given by
where R denotes the total number of types, and is the proportional abundance of the ith type.
We define a function that takes a vector of strings and returns the GS index:
# Define function to calculate the Gini-Simpson index
# We ensure the same levels (present or absent) of x
# by factor(x, levels = type)
# Note that NAs will not be considered by default
get.GS.index <- function(x, type) {
x <- factor(x, levels = type);
return(1 - sum(prop.table(table(x))^2));
}
We can now apply get.GS.index
to all rows of the dataframe
apply(df, 1, get.GS.index, type)
#[1] 0.6250000 0.4444444 0.6250000 0.0000000 0.6400000 0.6666667 0.5000000
#[8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000
#[15] 0.0000000 0.0000000 0.6111111 0.4444444 0.6666667 0.6400000
Update
We can modify the function get.GS.index
to return NA
if there is only one type present in a group.
get.GS.index <- function(x, type) {
x <- factor(x, levels = type);
t <- table(x);
if (length(t[t>0]) == 1) return(NA) else return(1 - sum(prop.table(t)^2));
}
apply(df, 1, get.GS.index, type);
# [1] 0.6250000 0.4444444 0.6250000 NA 0.6400000 0.6666667 0.5000000
# [8] 0.6250000 0.6400000 0.5000000 0.4444444 0.6400000 0.3750000 0.3750000
#[15] NA NA 0.6111111 0.4444444 0.6666667 0.6400000
这篇关于R 中多样性的 Blau 指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!