如何使用dplyr获取具有多个样本的站点的物种丰富度和丰度 [英] How to obtain species richness and abundance for sites with multiple samples using dplyr

查看:245
本文介绍了如何使用dplyr获取具有多个样本的站点的物种丰富度和丰度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我有多个站点,每个站点有10个采样点.

I have a number of sites, with 10 sampling points at each site.

Site Time Sample Species1 Species2 Species3 etc
Home    A      1        1        0        4 ...
Home    A      2        0        0        2 ...
Work    A      1        0        1        1 ...
Work    A      2        1        0        1 ...
Home    B      1        1        0        4 ...
Home    B      2        0        0        2 ...
Work    B      1        0        1        1 ...
Work    B      2        1        0        1 ...
...

我想获得每个站点的丰富性和丰富性.丰富度是一个站点中物种的总数,而丰度是该站点中所有物种的所有个体的总数,例如:

I would like to obtain the richness and abundance of each site. Richness is the total number of species at a site, and abundance is the total number of all individuals of all species at a site, like this:

Site Time Richness Abundance
Home    A        2         7
Work    A        3         4
Home    B        2         7
Work    B        3         4

我可以通过以下两个功能到达那里.但是,我希望两者都在一个dplyr函数中.范围7:34是指我的物种矩阵(每行一个站点/一个样本,物种为一列).

I can get there with two functions (below). However, I would like both in one dplyr function. The range 7:34 refers to my species matrix (each row a site/sample, species as columns).

df1 <- df %>% mutate(Abundance = rowSums(.[,4:30])) %>%
group_by(Site,Time) %>%   
    summarise_all(sum)

df1$Richness <- apply(df1[,4:30]>0, 1, sum)

如果我尝试同时执行一项功能,则会收到以下错误消息

If I try to do both in one function, I get the following error

df1 <- df  %>% mutate(Abundance = rowSums(.[,4:30]) ) %>%
   group_by(Site, Time) %>%   
   summarise_all(sum) %>% 
   mutate(Richness = apply(.[,4:30]>0, 1, sum))

Error in mutate_impl(.data, dots) : 
  Column `Richness` must be length 5 (the group size) or one, not 19

丰富度"部分必须位于汇总功能之后,因为它必须对汇总和分组的数据进行运算.

The Richness part has to come after the summarise function, since it has to operate on summed and grouped data.

如何使此功能起作用?

(注意:先前已将其标记为该问题的重复项: 将分离的物种数量数据处理为物种丰度矩阵

(Note: This was previously marked as a duplicate of this question: Manipulating seperated species quantity data into a species abundance matrix

但是,这是一个完全不同的问题-这个问题本质上是关于转置数据集并在单个物种/列中求和.这是关于跨列(多列)对 all 种进行求和. 另外,我实际上认为该问题的答案非常有帮助-像我这样的生态学家一直都在计算丰富度和丰富度,我相信他们会喜欢这个专门的问题.

It is a completely different question, however - that question is essentially about transposing a dataset and summing within a single species/column. This is about summing all species across columns (multiple columns). In addition, I actually think the answer to this question is very helpful - ecologists like me calculate richness and abundance all the time, and I'm sure they'll appreciate a dedicated question.)

推荐答案

summarise之后,我们需要ungroup

library(tidyverse)
df %>% 
  mutate(Abundance = rowSums(.[4:ncol(.)])) %>% 
  group_by(Site, Time) %>% 
  summarise_all(sum) %>%
  ungroup %>% 
  mutate(Richness = apply(.[4:(ncol(.)-1)] > 0, 1, sum)) %>%
  #or
  #mutate(Richness = rowSums(.[4:(ncol(.)-1)] > 0)) %>%
  select(Site, Time, Abundance, Richness)
# A tibble: 4 x 4
#  Site  Time  Abundance Richness
#  <chr> <chr>     <dbl>    <int>
#1 Home  A             7        2
#2 Home  B             7        2
#3 Work  A             4        3
#4 Work  B             4        3


也可以通过先执行group_by sum然后执行transmute


It can also be written by first doing the group_by sum and then transmute

df %>% 
  group_by(Site, Time) %>%
  summarise_at(vars(matches("Species")), sum)  %>% 
  ungroup %>%
  transmute(Site, Time, Abundance = rowSums(.[3:ncol(.)]), 
                        Richness =  rowSums(.[3:ncol(.)] > 0))


或者另一个选择是summap

df %>% 
   group_by(Site, Time) %>%
   summarise_at(vars(matches("Species")), sum) %>% 
   group_by(Time, add = TRUE) %>%
   nest %>% 
   mutate(data = map(data, ~ 
                 tibble(Richness = sum(.x > 0), 
                        Abundance = sum(.x)))) %>% 
   unnest

数据

df <- structure(list(Site = c("Home", "Home", "Work", "Work", "Home", 
"Home", "Work", "Work"), Time = c("A", "A", "A", "A", "B", "B", 
"B", "B"), Sample = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), Species1 = c(1L, 
0L, 0L, 1L, 1L, 0L, 0L, 1L), Species2 = c(0L, 0L, 1L, 0L, 0L, 
0L, 1L, 0L), Species3 = c(4L, 2L, 1L, 1L, 4L, 2L, 1L, 1L)), 
class = "data.frame", row.names = c(NA, 
 -8L))

这篇关于如何使用dplyr获取具有多个样本的站点的物种丰富度和丰度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆