dplyr过滤所有行的值为0的列以及其他列的唯一组合 [英] dplyr filter columns with value 0 for all rows with unique combinations of other columns

查看:70
本文介绍了dplyr过滤所有行的值为0的列以及其他列的唯一组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a dataframe that looks like this:

df <- tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01), 
             site = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), 
             treatment = c("a", "a", "b", "b", "a", "a", "b", "b"),
             species = c("vetch", "clover", "vetch", "clover", "vetch", "clover", "vetch", "clover"),
             frequency = c(0, 1, 1, 1 1, 0, 1, 0))

但是有很多约会,地点和待遇.我想要过滤掉该物种的所有频率(在所有处理方式和日期范围内)都为0的观测值.因此,在上文中,我想删除站点"Z"上的三叶草.因为三叶草没有在该地点的任何处理或日期发生,但我想将三叶草留在"X"网站中.因为它确实发生在其中一种治疗方法中.所以我想要:

But with lots of dates and sites and treatments. What I want is to filter out observations where all frequencies of that species (across all treatments and dates) is 0 for that site. So in the above I want to remove clover at site "Z" because it did not occur at any treatment or date at that site, but I want to leave clover in site "X" because it did occur in one of the treatments. So I want:

tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
       site = c("X", "X", "X" "X", "Z", "Z"),
       treatment = c("a", "a", "b", "b", "a", "b"),
       species = c("vetch", "clover", "vetch", "clover", "vetch", "vetch")
       frequency = c(0, 1, 1, 1, 1, 1))

我首先想到的是pivot_wider,选择列,然后再次pivot_longer,但这没有用,因为三叶草列仍然是通过在站点"X"中设置为1来选择的:

My first thought was to pivot_wider, select columns then pivot_longer again, but this didn't work because the clover column was still selected by having a 1 in site "X":

  df %>%
    pivot_wider(names_from = species, names_prefix = "spp.", values_from = frequency, values_fill = 0) %>%
    group_by(site) %>%
    select_if(~ !is.numeric(.) || sum(.) != 0) %>%
    pivot_longer(starts_with("spp."), names_to = "species", names_prefix = "spp.", values_to = "frequency") -> df

所以我想我需要过滤,但是我不知道该怎么做.

So I guess I need to filter instead, but I can't figure out how to do that.

推荐答案

可以通过创建另一列来实现一个简单的解决方案,该列包含按日期,位置和物种分组的每种物种的频率(忽略处理).然后,您可以使用此新列轻松进行过滤,然后将其消除.

An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.

library(tidyverse)
df %>%
    # Group by date site and species
    group_by(date, site, species) %>%
    # Create new column that sums frequency values by grouping variables
    mutate(appears = sum(frequency)) %>%
    # ignore rows where appears = 0
    filter(appears != 0) %>%
    # Eliminate appears column
    select(-appears)

这篇关于dplyr过滤所有行的值为0的列以及其他列的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆