如何在r中将缺失的国家包含到df中 [英] How to include missing countries to df in r

查看:35
本文介绍了如何在r中将缺失的国家包含到df中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是我之前的发布.

我有一个关于并购(M&A)的大数据框(90 万行).

I have a big data frame (900k rows) about mergers and acquisitions (M&As).

df 有四列:date(并购完成的时间)、target_nation(合并/收购哪个国家的公司)、acquiror_nation(收购方是哪个国家的公司)和 big_corp(是否为收购方)是否是一家大公司,其中 TRUE 表示公司很大).

The df has four columns: date (when the M&A was completed), target_nation (a company of which country was merged/acquired), acquiror_nation (corporation of which country was the acquiror), and big_corp (whether the acquiror was a big corporation or not, where TRUE means that corporation is big).

这是我的 df 示例:

Here is a sample of my df:

    df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L, 
2003L, 1999L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique", 
"Mozambique", "Mozambique", "Mozambique", "Mozambique", "Mozambique"
), acquiror_nation = c("France", "Germany", "France", "France", 
"Germany", "Germany", "Germany", "Germany", "France", "France", 
"Germany", "Japan"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, 
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)), row.names = c(NA, 
-12L), class = c("data.table", "data.frame"))

> df

  date target_nation acquiror_nation big_corp_TF
 1: 2000        Uganda          France        TRUE
 2: 2000        Uganda         Germany       FALSE
 3: 2001        Uganda          France        TRUE
 4: 2001        Uganda          France       FALSE
 5: 2001        Uganda         Germany       FALSE
 6: 2003        Uganda         Germany        TRUE
 7: 2003    Mozambique         Germany       FALSE
 8: 1999    Mozambique         Germany       FALSE
 9: 2001    Mozambique          France        TRUE
10: 2002    Mozambique          France       FALSE
11: 2002    Mozambique         Germany        TRUE
12: 2002    Mozambique           Japan        TRUE

根据这些数据,我想创建一个新列,表示特定收购国的大公司在特定目标国家的并购份额,计算 2 年的平均值.(对于我的实际练习,我将计算 5 年的平均值,但让我们在这里简化一下).

From these data, I want to create a new column that denotes the share of M&As done in specific target nations by big corporations of specific acquiror nations, counting the average for 2 years. (For my actual exercise, I will count the averages for 5 years, but let's keep things simpler here).

有一组我特别感兴趣的收购国(例如,法国、德国和日本).我希望有一个列来表示这些国家/地区的上述份额.

There is a group of acquiror nations that I am particularly interested in (for this example, let it be France, Germany, and Japan). I want there to be a column that would denote the abovementioned share for these countries.

@AnilGoyal 之前帮我写了一个代码.代码如下:

@AnilGoyal previously helped me with a code. Here is the code:

df_calc <- df %>%
  mutate(d = 1) %>%
  group_by(target_nation) %>%
  complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
           fill = list(d = 0, big_corp_TF = FALSE)) %>%
  group_by(date, target_nation) %>%
  mutate(total_MAs = sum(d)) %>%
  group_by(date, target_nation, acquiror_nation) %>%
  summarise(total_MAs = mean(total_MAs),
            total_MAs_bigcorp = sum(big_corp_TF), .groups = 'drop') %>%
  group_by(target_nation, acquiror_nation) %>%
  mutate(share = sum_run(total_MAs_bigcorp, k=2)/sum_run(total_MAs, k=2))

这是输出:

  date   targ_nat    acq_nat tot_MA big_MA  share
1   1999    Mozambique  France  1   0   0.0000000
2   1999    Mozambique  Germany 1   0   0.0000000
3   1999    Mozambique  Japan   1   0   0.0000000
4   2000    Mozambique  France  0   0   0.0000000
5   2000    Mozambique  Germany 0   0   0.0000000
6   2000    Mozambique  Japan   0   0   0.0000000
7   2001    Mozambique  France  1   1   1.0000000
8   2001    Mozambique  Germany 1   0   0.0000000
9   2001    Mozambique  Japan   1   0   0.0000000
10  2002    Mozambique  France  3   0   0.2500000
11  2002    Mozambique  Germany 3   1   0.2500000
12  2002    Mozambique  Japan   3   1   0.2500000
13  2003    Mozambique  France  1   0   0.0000000
14  2003    Mozambique  Germany 1   0   0.2500000
15  2003    Mozambique  Japan   1   0   0.2500000
16  2000    Uganda     France   2   1   0.5000000
17  2000    Uganda    Germany   2   0   0.0000000
18  2001    Uganda    France    3   1   0.4000000
19  2001    Uganda    Germany   3   0   0.0000000
20  2002    Uganda    France    0   0   0.3333333
21  2002    Uganda    Germany   0   0   0.0000000
22  2003    Uganda    France    1   0   0.0000000
23  2003    Uganda    Germany   1   1   1.0000000

所有数字都如您所愿.但是,我希望日本在乌干达的投资有结果,但无法成功实现.我该如何实现? 我了解日本在乌干达没有结果的原因是日本在任何一年都没有对乌干达进行任何投资(如上数据样本所示);但这种缺乏投资对我来说是一个有意义的结果,我希望日本作为收购国也有争议.就像这样(出于空间原因,我将莫桑比克排除为 targ_nat):

All the numbers are as desired. However, I would like there to be the results for Japan's investments in Uganda but cannot succeed in achieving that. How can I achieve this? I understand that the reason for there being no results for Japan in Uganda is that Japan has not done any investment in Uganda in any year (as shown in the data sample above); but this lack of investment is a meaningful result for me, and I would like there to be rows for Japan as acquiror nation as well. Just like so (I exclude Mozambique as targ_nat for space reasons):

  date   targ_nat    acq_nat tot_MA big_MA  share
16  2000    Uganda     France   2   1   0.5000000
17  2000    Uganda    Germany   2   0   0.0000000
18  2000    Uganda    Japan     2   0   0.0000000
19  2001    Uganda    France    3   1   0.4000000
20  2001    Uganda    Germany   3   0   0.0000000
21  2001    Uganda    Japan     3   0   0.0000000
22  2002    Uganda    France    0   0   0.3333333
22  2002    Uganda    Germany   0   0   0.0000000
23  2002    Uganda    Japan     0   0   0.0000000
24  2003    Uganda    France    1   0   0.0000000
25  2003    Uganda    Germany   1   1   1.0000000
26  2003    Uganda    Japan     1   0   0.0000000

关于如何实现这一目标的任何想法?就我的实际目的而言,我希望将 13 个国家/地区的结果视为收购国(因此不仅仅是法国、德国和日本).这些国家在数据集中显示为收购国(但并非所有 target_nations (!) --- 就像这里的乌干达和日本的例子一样).

Any ideas on how to achieve that? For my actual purposes, I have a group of 13 countries for which I want to see the results as acquiror nations (so not just France, Germany, and Japan). These countries appear in the data set as acquiror nations (but not for all target_nations (!) --- just like in the example of Uganda and Japan here).

非常感谢任何帮助.

推荐答案

需要 complete

library(dplyr)
library(tidyr)
out <- df_calc %>% 
   group_by(target_nation, date, total_MAs) %>%
   complete(acquiror_nation = unique(.$acquiror_nation),
   fill = list(total_MAs_bigcorp = 0, share = 0)) %>%
   ungroup

-检查乌干达"的输出

out %>% 
   filter(target_nation == 'Uganda')
# A tibble: 12 x 6
#   target_nation  date total_MAs acquiror_nation total_MAs_bigcorp share
#   <chr>         <dbl>     <dbl> <chr>                       <dbl> <dbl>
# 1 Uganda         2000         2 France                          1 0.5  
# 2 Uganda         2000         2 Germany                         0 0    
# 3 Uganda         2000         2 Japan                           0 0    
# 4 Uganda         2001         3 France                          1 0.4  
# 5 Uganda         2001         3 Germany                         0 0    
# 6 Uganda         2001         3 Japan                           0 0    
# 7 Uganda         2002         0 France                          0 0.333
# 8 Uganda         2002         0 Germany                         0 0    
# 9 Uganda         2002         0 Japan                           0 0    
#10 Uganda         2003         1 France                          0 0    
#11 Uganda         2003         1 Germany                         1 1    
#12 Uganda         2003         1 Japan                           0 0    

这篇关于如何在r中将缺失的国家包含到df中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆