根据R中的组解析值 [英] parse values based on groups in R

查看:87
本文介绍了根据R中的组解析值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的数据集,并且它的一个样本看起来类似于以下内容:

I have a very large dataset and a sample of that looks something like the one below:

| Id | Name    | Start_Date | End_Date   |
|----|---------|------------|------------|
| 10 | Mark    | 4/2/1999   | 7/5/2018   |
| 10 |         | 1/1/2000   | 9/24/2018  |
| 25 |         | 5/3/1968   | 6/3/2000   |
| 25 |         | 6/6/2009   | 4/23/2010  |
| 25 | Anthony | 2/20/2010  | 7/21/2016  |
| 25 |         | 9/12/2014  | 11/26/2019 |

我需要根据其Id解析Name列中的名称,以使输出表如下所示:

I need to parse the names from Name column based on their Id such that the output table looks like:

| Id | Name    | Start_Date | End_Date   |
|----|---------|------------|------------|
| 10 | Mark    | 4/2/1999   | 7/5/2018   |
| 10 | Mark    | 1/1/2000   | 9/24/2018  |
| 25 | Anthony | 5/3/1968   | 6/3/2000   |
| 25 | Antony  | 6/6/2009   | 4/23/2010  |
| 25 | Anthony | 2/20/2010  | 7/21/2016  |
| 25 | Anthony | 9/12/2014  | 11/26/2019 |

如何获得如上所述的输出?我经历了替换和解析功能,但无法理解它们如何应用于此问题.

How can I achieve an output as shown above? I went through the substitute and parse functions, but was unable to understand how they apply to this problem.

我的数据集将是:

df=data.frame(Id=c("10","10","25","25","25","25"),Name=c("Mark","","","","Anthony",""),
              Start_Date=c("4/2/1999", "1/1/2000","5/3/1968","6/6/2009","2/20/2010","9/12/2014"),
              End_Date=c("7/5/2018","9/24/2018","6/3/2000","4/23/2010","7/21/2016","11/26/2019"))

推荐答案

我们可以将空格("")更改为NA,并使用fill将NA元素替换为先前的非NA元素

We can change the blanks ("") to NA and use fill to replace the NA elements with the previous non-NA element

library(dplyr)
library(tidyr)
df1 %>%      
   mutate(Name = na_if(Name, "")) %>%
   group_by(Id) %>%
   fill(Name, .direction = "down") %>%
   fill(Name, .direction = "up)
# A tibble: 6 x 4
# Groups:   Id [2]
#  Id    Name    Start_Date End_Date  
#  <chr> <chr>   <chr>      <chr>     
#1 10    Mark    4/2/1999   7/5/2018  
#2 10    Mark    1/1/2000   9/24/2018 
#3 25    Anthony 5/3/1968   6/3/2000  
#4 25    Anthony 6/6/2009   4/23/2010 
#5 25    Anthony 2/20/2010  7/21/2016 
#6 25    Anthony 9/12/2014  11/26/2019

tidyr(‘0.8.3.9000’)的devel版本中,这可以在单个fill语句中完成,因为.direction = "downup"也是一个选择

In the devel version of tidyr (‘0.8.3.9000’), this can be done in a single fill statement as .direction = "downup" is also an option

df1 %>%      
   mutate(Name = na_if(Name, "")) %>%
   group_by(Id) %>%
   fill(Name, .direction = "downup") 


或者另一种选择是按'Id'分组,然后将'c10>'名称'分组为first非空白元素


Or another option is to group by 'Id', and mutate the 'Name' as the first non-blank element

df1 %>%
    group_by(Id) %>%        
    mutate(Name = first(Name[Name!=""])) 
# A tibble: 6 x 4
# Groups:   Id [2]
#  Id    Name    Start_Date End_Date  
#  <chr> <chr>   <chr>      <chr>     
#1 10    Mark    4/2/1999   7/5/2018  
#2 10    Mark    1/1/2000   9/24/2018 
#3 25    Anthony 5/3/1968   6/3/2000  
#4 25    Anthony 6/6/2009   4/23/2010 
#5 25    Anthony 2/20/2010  7/21/2016 
#6 25    Anthony 9/12/2014  11/26/2019

数据

df1 <- structure(list(Id = c("10", "10", "25", "25", "25", "25"), Name = c("Mark", 
"", "", "", "Anthony", ""), Start_Date = c("4/2/1999", "1/1/2000", 
"5/3/1968", "6/6/2009", "2/20/2010", "9/12/2014"), End_Date = c("7/5/2018", 
"9/24/2018", "6/3/2000", "4/23/2010", "7/21/2016", "11/26/2019"
)), class = "data.frame", row.names = c(NA, -6L))

这篇关于根据R中的组解析值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆