将缺少的行从R中的其他数据框添加到数据框 [英] Add missing rows to dataframe from other dataframe in R

查看:50
本文介绍了将缺少的行从R中的其他数据框添加到数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据框中填写缺失的数据。

I'm trying to fill in missing data in data frames.

我有两个数据框,但是每个数据框都包含另一个数据框所缺少的信息。它们看起来像这样,其中x是数字:

I've got two data frames, however each one includes information that is missing in the other one. They look something like this, where the x are numbers:

           DATA FRAME 1                                      DATA FRAME 2    
    Headword   Spelling   Freq                    Headword     Spelling   Freq
     Word1       Sp1a      x                        Word1         Sp1a      x
     Word1       Sp1b      x                        Word1         Sp1c      x
     Word1       Sp1d      x                        Word2         Sp2a      x
     Word2       Sp2a      x                        Word2         Sp2b      x     etc

因此,DF1的Word 1的拼写为1a,1b和1d。DF2的Word1的拼写为1a和1c。但是,DF1的Word2仅具有拼写2a,而DF2的Word2具有拼写2a和2b。

So, DF1 has spellings 1a, 1b, and 1d for Word 1. DF2 has spellings 1a and 1c for Word1. However, DF1 has only spelling 2a for Word2, but DF2 has spellings 2a and 2b for Word2.

我需要的是两个数据框都包括所有拼写,即使这些拼写不存在于该数据中也是如此。因此,在数据帧1中缺少Sp1c的地方,我希望它在那里并且频率= 0。

What I need is for both data frames to include all the spellings, even if they're not present in that data. So where Sp1c is missing in data frame 1, I'd like it to be there and the frequency = 0.

所以我希望它看起来像这样:

So I'd like it to look like this:

    DATA FRAME 1                               DATA FRAME 2
Headword Spelling Freq                    Headword Spelling Freq
  Word1    Sp1a    x                        Word1    Sp1a     x
  Word1    Sp1b    x                        Word1    Sp1b     0
  Word1    Sp1c    0                        Word1    Sp1c     x
  Word1    Sp1d    x                        Word1    Sp1d     0
  Word2    Sp2a    x                        Word2    Sp2a     x
  Word2    Sp2b    0                        Word2    Sp2b     x

我认为我需要使用多个连接并将它们组合在一起才能完成这项工作,但是我我不确定如何。

I think I need to use more than one join and combine them together to make this work but I'm not sure how.

如何将数据帧2中的所有缺失值添加到数据帧1中的词首和拼写列,然后将频率设置为0? (反之亦然,将缺失的值添加到数据框1。)

How do I add any missing values from data frame 2 to the headword and spelling columns in data frame 1, and then set the frequency to be 0? (And vice versa to add missing values to data frame 1.)

推荐答案

使用dplyr库,首先创建数据框

using dplyr library, Firstly creating the data frames

library("dplyr")
df1<- data.frame(headword = c("word1","word1","word1","word2"),
                spelling = c("sp1a","sp1b","sp1d","sp2a"),
                freq = runif(1:4))
df2 <-data.frame(headword = c("word1","word1","word2","word2"),
                 spelling = c("sp1a","sp1c","sp2a","sp2b"),
                 freq = runif(1:4))

现在,在df1中找到不在df2中的值

Now, find the values in df1 that are not in df2

sub_res1 <- anti_join(df1,df2,by=c("headword","spelling"))
#       headword spelling      freq
#  1    word1     sp1b 0.6738556
#  2    word1     sp1d 0.4972938
sub_res1$freq <- 0
df2 <- full_join(df2,sub_res1,by=c("headword","spelling","freq"))

#    headword spelling       freq
#  1    word1     Sp1a 0.50293511
#  2    word1     sp1c 0.67857973
#  3    word2     sp2a 0.05604982
#  4    word2     sp2b 0.83378253
#  5    word1     sp1b 0.00000000
#  6    word1     sp1d 0.00000000

反之亦然,将不在df1中的df2值与频率合并为0

the vice versa will merge the values of df2 that are not in df1 with the freq as 0

说明:在您的问题中,您曾使用 x 来表示任意数字,所以,我使用 runif 生成一些任意数字,而不是使用 x

clarification: In your question, you had used x to mean some arbitrary number, So, I used runif to generate some arbitrary numbers instead of using x.

这篇关于将缺少的行从R中的其他数据框添加到数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆