合并两个具有不同大小和缺失值的数据框 [英] Merging two data frames with different sizes and missing values

查看:147
本文介绍了合并两个具有不同大小和缺失值的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在合并R中的两个数据帧时遇到问题.

I'm having a problem merging two data frames in R.

第一个变量包含6个变量的103731 obs.我必须用于合并的变量具有77111个唯一值,其余为NA s,其值为0.第二个变量包含这些变量的频率加上NA s的频率,因此为77112的帧obs 2个变量.

The first one consists of 103731 obs of 6 variables. The variable that I have to use to merge has 77111 unique values and the rest are NAs with a value of 0. The second one contains the frequency of those variables plus the frequency of the NAs so a frame of 77112 obs for 2 variables.

我需要得到的结果帧是第一个与合并变量频率相关联的帧,因此对于合并变量的每个值,其df为103731 obs(因此,如果freq> 1则重复,并且对于每个NA(或0)).

The resulting frame I need to get is the first one joined with the frequency for the merging variable, so a df of 103731 obs with the frequency for each value of the merging variable (so with duplicates if freq > 1 and also for each NA (or 0)).

有人可以帮助我吗?

我现在得到的结果包含一个1 894 919 obs的数据框,并且我使用了:

The result I'm getting now contains a data frame of 1 894 919 obs and I used:

tot = merge(df1, df2, by = "mergingVar", all= F, sort = F);  

我也经常在'all='上打过球,但所有的变体都没有赋予正确的df.

Also I played a lot with 'all=' and none of the variations gave the right df.

推荐答案

为什么不只使用第一个表的频率表?

why don't you just take the frequency table of your first table?

a <- data.frame(a = c(NA, NA, 2,2,3,3,3))
data.frame(table(a, useNA = 'ifany'))

     a Freq
1    2    2
2    3    3
3 <NA>    2

plyr来自plyr

ddply(a, .(a), mutate, freq = length(a))

   a freq
1  2    2
2  2    2
3  3    3
4  3    3
5  3    3
6 NA    2
7 NA    2

这篇关于合并两个具有不同大小和缺失值的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆