您如何将行汇总为三个级别的因子变量? [英] How do you aggregate rows to a factor variable with three levels?
问题描述
我有一个数据集,其中一些参与者有多行,并且我需要以每个参与者只有一行的方式聚合数据.数据集包含不同的变量类型(例如,因子,日期,年龄等).我编写了一个有效的代码,看起来像这样:
I have a dataset where some participants have multiple rows and I need to aggregate the data in a way that every participant has only one row. The dataset contains different variable types (e.g., factors, date, age etc.) I have made a code that works and looks like this:
example4 <- SMARTdata_50j_diagc_2016 %>%
group_by( Patient_Id ) %>%
summarise( Groep = first( Groep ),
Ziekenhuis_Nr = first( Ziekenhuis_Nr ),
Ziekenhuistype = first( Ziekenhuistype ),
aantalDBC = n(),
aantalVervolg = sum( as.numeric( Zorgtype_Code ) ),
Leeftijd = mean( Lft_patient_openenDBC ),
MRI_nee_ja = max( ifelse( MRI_nee_ja == 0, 0, 1 ) ),
aantalMRI = sum( MRI_Aantal ),
Artroscopie_nee_ja = max( ifelse( Artroscopie_nee_jaz_jam == 0, 0, 1 ) ),
aantalArtroscopie = sum( Artroscopie_aantal ),
overigDBC = mean( Aantal_overigeDBC_bijopenen ),
DBC_open = min( open_DBC ),
DBC_sluiten = max( sluiten_DBC ) ) %>%
as.data.frame()
此代码给我每个参与者一行.但是,我还有一个需要在新数据框中包含的变量,但是我不知道该怎么做.我需要添加的变量称为"Diagnose_Code",它具有两个级别的因数,即0(代表1801)和1(代表1805).
This code gives me a single row for each participant. However, I have one more variable that I need to include in the new dataframe, but I do not know how to do that. The variable that I need to add is called 'Diagnose_Code' and is factor with two levels, namely 0 (standing for 1801) and 1 (standing for 1805).
对于具有多个行的参与者(在原始数据帧中),该变量的参与者都具有0和1.现在,在我的新数据框中,我想为"Diagnose_Code"创建一个变量,该变量具有三个级别:如果该参与者的所有行均为0,则为0;如果该参与者的所有行均为1,则为1;如果如果该参与者的所有行,则为2.该参与者同时具有0和1.
For the participants that have multiple rows (in the original dataframe), there are participants that have both a 0 and a 1 for that variable. Now, in my new dataframe, I want to make a variable for 'Diagnose_Code' with three levels: 0 for if all rows of that participant are 0, 1 for if all rows of that participant are 1, and 2 for if the rows of that participant have both a 0 and a 1.
我不知道该如何做.我对ifelse代码有点挣扎,但是没有成功.有谁知道我该如何在我的代码中完成这项工作?先感谢您!
I do not know how to make this work. I struggled a bit with the ifelse code, but without success. Does anyone know how I can make this work in my code? Thank you in advance!
推荐答案
使用玩具数据集可以像这样实现:
Using a toy dataset this can be achieved like so:
library(dplyr)
df <- data.frame(
id = rep(1:3, each = 3),
diagnosis_code = c(rep(1,3), rep(0, 3), c(1, 0, 1)),
stringsAsFactors = FALSE
)
df %>%
group_by(id) %>%
summarise(diagnosis_code = case_when(
all(diagnosis_code == 1) ~ 1,
all(diagnosis_code == 0) ~ 0,
TRUE ~ 2
))
#> # A tibble: 3 x 2
#> id diagnosis_code
#> <int> <dbl>
#> 1 1 1
#> 2 2 0
#> 3 3 2
由 reprex软件包(v0.3.0)创建于2020-03-29
这篇关于您如何将行汇总为三个级别的因子变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!