合并/合并具有相同名称但数据不完整的列 [英] merge/combine columns with same name but incomplete data

查看:12
本文介绍了合并/合并具有相同名称但数据不完整的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,其中一些列名称相同,而另一些列名称不同.数据框看起来像这样:

I have two data frames that have some columns with the same names and others with different names. The data frames look something like this:

df1
      ID hello world hockey soccer
    1  1    NA    NA      7      4
    2  2    NA    NA      2      5
    3  3    10     8      8     23
    4  4     4    17      5     12
    5  5    NA    NA      3     43

df2    
      ID hello world football baseball
    1  1     2     3       43        6
    2  2     5     1       24       32
    3  3    NA    NA        2       23
    4  4    NA    NA        5       15
    5  5     9     7       12       23

如您所见,在 2 个共享列(hello"和world")中,一些数据位于其中一个数据框中,其余数据位于另一个中.

As you can see, in 2 of the shared columns ("hello" and "world"), some of the data is in one of the data frames and the rest is in the other.

我想要做的是(1)通过id"合并2个数据帧,(2)将两个帧中hello"和world"列的所有数据合并到1个hello"列中,1 个世界"列,并且 (3) 使最终数据帧还包含 2 个原始帧(曲棍球"、足球"、足球"、棒球")中的所有其他列.所以,我希望最终的结果是这样的:

What I am trying to do is (1) merge the 2 data frames by "id", (2) combine all the data from the "hello" and "world" columns in both frames into 1 "hello" column and 1 "world" column, and (3) have the final data frame also contain all of the other columns in the 2 original frames ("hockey", "soccer", "football", "baseball"). So, I want the final result to be this:

  ID hello world hockey soccer football baseball
1  1     2     3      7      4        43       6
2  2     5     3      2      5        24      32
3  3    10     8      8     23         2      23
4  4     4    17      5     12         5      15
5  5     9     7      3     43        12      23

我是 R 的新手,所以我尝试过的唯一代码是 merge 的变体,我尝试了我在这里找到的答案,该答案基于类似的问题:R:合并同一变量的副本.然而,我的数据集实际上比我在这里显示的要大得多(大约有 20 个匹配的列(如hello"和world")和 100 个不匹配的列(如曲棍球"和足球"))所以我正在寻找不需要我手动将它们全部写出来的东西.

I'm pretty new at R so the only codes I've tried are variations on merge and I've tried the answer I found here, which was based on a similar question: R: merging copies of the same variable. However, my data sets are actually much bigger than what I'm showing here (there's about 20 matching columns (like "hello" and "world") and 100s of non-matching ones (like "hockey" and "football")) so I'm looking for something that won't require me to write them all out manually.

知道这是否可以做到吗?很抱歉,我无法提供我努力的样本,但除此之外我真的不知道从哪里开始:

Any idea if this can be done? I'm sorry I can't provide a sample of my efforts, but I really don't know where to start besides:

mydata <- merge(df1, df2, by=c("ID"), all = TRUE)

重现数据框:

df1 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(2, 5, NA, NA, 9), 
       world = c(3, 1, NA, NA, 7), football = c(43, 24, 2, 5, 12), 
       baseball = c(6, 32, 23, 15, 23)), .Names = c("ID", "hello", "world", 
       "football", "baseball"), class = "data.frame", row.names = c(NA, -5L)) 

df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(NA, NA, 10, 4, NA), 
       world = c(NA, NA, 8, 17, NA), hockey = c(7, 2, 8, 5, 3), 
       soccer = c(4, 5, 23, 12, 43)), .Names = c("ID", "hello", "world", "hockey", 
       "soccer"), class = "data.frame", row.names = c(NA, -5L))

推荐答案

这里有一种方法,涉及melt数据,合并熔化的数据,并使用dcast把它恢复到一个广泛的形式.我添加了评论以帮助了解正在发生的事情.

Here's an approach that involves melting your data, merging the molten data, and using dcast to get it back to a wide form. I've added comments to help understand what is going on.

## Required packages
library(data.table)
library(reshape2)

dcast.data.table(
  merge(
    ## melt the first data.frame and set the key as ID and variable
    setkey(melt(as.data.table(df1), id.vars = "ID"), ID, variable), 
    ## melt the second data.frame
    melt(as.data.table(df2), id.vars = "ID"), 
    ## you'll have 2 value columns...
    all = TRUE)[, value := ifelse(
      ## ... combine them into 1 with ifelse
      is.na(value.x), value.y, value.x)], 
  ## This is your reshaping formula
  ID ~ variable, value.var = "value")
#    ID hello world football baseball hockey soccer
# 1:  1     2     3       43        6      7      4
# 2:  2     5     1       24       32      2      5
# 3:  3    10     8        2       23      8     23
# 4:  4     4    17        5       15      5     12
# 5:  5     9     7       12       23      3     43

这篇关于合并/合并具有相同名称但数据不完整的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆