如何创建一个列来标识汇总数据框中仅包含新数据的行? [英] How can I create a column identifying rows consisting only of new data in a summarized data frame?

查看:49
本文介绍了如何创建一个列来标识汇总数据框中仅包含新数据的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基于先前研究建立的数据集,但其中包含许多全新的条目.清理后,数据集由研究中包括的每个物种的平均值组成,这些平均值是我使用 tidyverse 中的 summarise 函数创建的.

I have a dataset that is built off of a prior study but includes a number of entirely new entries. The dataset, when cleaned, consists of the mean value for each of the species included in the study which I have been creating using the summarise function in tidyverse.

df<-data.frame(species = c("Species1","Species1","Species2","Species2","Species3","Species3"),
               new=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE),var=c(1,1,2,2,3,3))
df2<-df%>%
  group_by(species)%>%
  summarise(var=mean(var))

我有一列列出了观察结果对于该研究是新的还是从父研究中得出的.我想做的是在清除的数据框中创建一个向量,以便我可以轻松显示和总结此研究添加了多少新物种.有一些观测值是已经存在的物种的其他数据,但是其他物种对于本次分析而言是全新的.我正在尝试找到一种创建矢量列的方法,以列出 (如果该物种对本研究而言是全新的),从而产生一个数据框/像下面这样摇摆.

I have a column listing whether the observations are new to the study or are drawn from the parent study. What I am trying to do is create a vector in the cleaned data frame so that I can easily show and summarize how many new species have been added by this study. There are some observations which are additional data for already present species, but other species are entirely novel to the present analysis. I am trying to figure out a way to create a vector column to list if a only if this species is entirely new to this study, such that it produces a data frame/tibble like the following.

data.frame(species=c("Species1","Species2","Species3"),new=c("TRUE","FALSE","FALSE"),var=c(1,2,3))

在此数据框中,物种1是全新的,物种2同时具有新的和旧的观测值,而物种3则具有完整的旧观测值.因此,对于新"物种,仅物种1为真.我正在尝试创建的向量.

In this data frame, Species 1 is entirely new, Species 2 has both old and new observations, and Species 3 has entirely old observations. Thus only Species 1 is true for the "new" vector I am trying to create.

我知道如何使用&"替换基于和/或语句的列.和"|"对于另一列中的某些级别,但是如果用于创建汇总值的各个条目均不具有特定级别或字符串,则我不确定如何创建new = TRUE的列.我认为使用 ifelse()语句可能是可行的,但是我不确定如何考虑每个 species 级别的情况下如何编写代码.

I know how to mutate columns based on and/or statements using "&" and "|" for certain levels in another column, but I'm not sure how to create a column where new = TRUE if none of the respective entries used to create the summarized value have a certain level or character string. I think it might be possible using an ifelse() statement but I'm not sure how to write the code to it considers each level of species.

推荐答案

仅当所有值均为 TRUE all()返回 TRUE >

Use all() to return TRUE only if ALL values are TRUE

df %>%
  group_by(species) %>%
  summarize(new = all(new), var = mean(var))

这篇关于如何创建一个列来标识汇总数据框中仅包含新数据的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆