具有相同名称但数据不完整的合并/合并列 [英] merge/combine columns with same name but incomplete data

查看:70
本文介绍了具有相同名称但数据不完整的合并/合并列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,它们的某些列具有相同的名称,而另一些具有不同的名称.数据帧如下所示:

I have two data frames that have some columns with the same names and others with different names. The data frames look something like this:

df1
      ID hello world hockey soccer
    1  1    NA    NA      7      4
    2  2    NA    NA      2      5
    3  3    10     8      8     23
    4  4     4    17      5     12
    5  5    NA    NA      3     43

df2    
      ID hello world football baseball
    1  1     2     3       43        6
    2  2     5     1       24       32
    3  3    NA    NA        2       23
    4  4    NA    NA        5       15
    5  5     9     7       12       23

如您所见,在两个共享列("hello"和"world")中,某些数据位于其中一个数据帧中,其余数据位于另一个数据帧中.

As you can see, in 2 of the shared columns ("hello" and "world"), some of the data is in one of the data frames and the rest is in the other.

我想做的是(1)通过"id"合并2个数据帧,(2)将两个帧中"hello"和"world"列中的所有数据合并为1个"hello"列, 1个世界"列,以及(3)的最终数据帧还包含2个原始帧中的所有其他列(曲棍球",足球",足球",棒球").所以,我希望最终结果是这样:

What I am trying to do is (1) merge the 2 data frames by "id", (2) combine all the data from the "hello" and "world" columns in both frames into 1 "hello" column and 1 "world" column, and (3) have the final data frame also contain all of the other columns in the 2 original frames ("hockey", "soccer", "football", "baseball"). So, I want the final result to be this:

  ID hello world hockey soccer football baseball
1  1     2     3      7      4        43       6
2  2     5     3      2      5        24      32
3  3    10     8      8     23         2      23
4  4     4    17      5     12         5      15
5  5     9     7      3     43        12      23

我在R上很新,所以我尝试的唯一代码是merge的变体,而且我尝试了在这里找到的答案,该答案基于类似的问题:R:合并同一变量的副本.但是,我的数据集实际上比这里显示的要大得多(大约有20个匹配的列(例如"hello"和"world")和100多个不匹配的列(例如"hockey"和"football"))所以我正在寻找不需要我手动将它们全部写出来的东西.

I'm pretty new at R so the only codes I've tried are variations on merge and I've tried the answer I found here, which was based on a similar question: R: merging copies of the same variable. However, my data sets are actually much bigger than what I'm showing here (there's about 20 matching columns (like "hello" and "world") and 100s of non-matching ones (like "hockey" and "football")) so I'm looking for something that won't require me to write them all out manually.

知道是否可以做到这一点?很抱歉,我无法提供我的工作示例,但是我真的不知道该从哪里开始:

Any idea if this can be done? I'm sorry I can't provide a sample of my efforts, but I really don't know where to start besides:

mydata <- merge(df1, df2, by=c("ID"), all = TRUE)

要重现数据帧:

df1 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(2, 5, NA, NA, 9), 
       world = c(3, 1, NA, NA, 7), football = c(43, 24, 2, 5, 12), 
       baseball = c(6, 32, 23, 15, 23)), .Names = c("ID", "hello", "world", 
       "football", "baseball"), class = "data.frame", row.names = c(NA, -5L)) 

df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(NA, NA, 10, 4, NA), 
       world = c(NA, NA, 8, 17, NA), hockey = c(7, 2, 8, 5, 3), 
       soccer = c(4, 5, 23, 12, 43)), .Names = c("ID", "hello", "world", "hockey", 
       "soccer"), class = "data.frame", row.names = c(NA, -5L))

推荐答案

这里涉及一种方法,包括melt数据,合并的数据并使用dcast将其恢复为较宽的格式.我添加了评论以帮助了解发生了什么.

Here's an approach that involves melting your data, merging the molten data, and using dcast to get it back to a wide form. I've added comments to help understand what is going on.

## Required packages
library(data.table)
library(reshape2)

dcast.data.table(
  merge(
    ## melt the first data.frame and set the key as ID and variable
    setkey(melt(as.data.table(df1), id.vars = "ID"), ID, variable), 
    ## melt the second data.frame
    melt(as.data.table(df2), id.vars = "ID"), 
    ## you'll have 2 value columns...
    all = TRUE)[, value := ifelse(
      ## ... combine them into 1 with ifelse
      is.na(value.x), value.y, value.x)], 
  ## This is your reshaping formula
  ID ~ variable, value.var = "value")
#    ID hello world football baseball hockey soccer
# 1:  1     2     3       43        6      7      4
# 2:  2     5     1       24       32      2      5
# 3:  3    10     8        2       23      8     23
# 4:  4     4    17        5       15      5     12
# 5:  5     9     7       12       23      3     43

这篇关于具有相同名称但数据不完整的合并/合并列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆