如何在r中将缺失的国家包含到df中 [英] How to include missing countries to df in r
问题描述
这个问题是我之前的发布.
我有一个关于并购(M&A)的大数据框(90 万行).
I have a big data frame (900k rows) about mergers and acquisitions (M&As).
df 有四列:date(并购完成的时间)、target_nation(合并/收购哪个国家的公司)、acquiror_nation(收购方是哪个国家的公司)和 big_corp(是否为收购方)是否是一家大公司,其中 TRUE 表示公司很大).
The df has four columns: date (when the M&A was completed), target_nation (a company of which country was merged/acquired), acquiror_nation (corporation of which country was the acquiror), and big_corp (whether the acquiror was a big corporation or not, where TRUE means that corporation is big).
这是我的 df 示例:
Here is a sample of my df:
df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L,
2003L, 1999L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique",
"Mozambique", "Mozambique", "Mozambique", "Mozambique", "Mozambique"
), acquiror_nation = c("France", "Germany", "France", "France",
"Germany", "Germany", "Germany", "Germany", "France", "France",
"Germany", "Japan"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)), row.names = c(NA,
-12L), class = c("data.table", "data.frame"))
> df
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2003 Uganda Germany TRUE
7: 2003 Mozambique Germany FALSE
8: 1999 Mozambique Germany FALSE
9: 2001 Mozambique France TRUE
10: 2002 Mozambique France FALSE
11: 2002 Mozambique Germany TRUE
12: 2002 Mozambique Japan TRUE
根据这些数据,我想创建一个新列,表示特定收购国的大公司在特定目标国家的并购份额,计算 2 年的平均值.(对于我的实际练习,我将计算 5 年的平均值,但让我们在这里简化一下).
From these data, I want to create a new column that denotes the share of M&As done in specific target nations by big corporations of specific acquiror nations, counting the average for 2 years. (For my actual exercise, I will count the averages for 5 years, but let's keep things simpler here).
有一组我特别感兴趣的收购国(例如,法国、德国和日本).我希望有一个列来表示这些国家/地区的上述份额.
There is a group of acquiror nations that I am particularly interested in (for this example, let it be France, Germany, and Japan). I want there to be a column that would denote the abovementioned share for these countries.
@AnilGoyal 之前帮我写了一个代码.代码如下:
@AnilGoyal previously helped me with a code. Here is the code:
df_calc <- df %>%
mutate(d = 1) %>%
group_by(target_nation) %>%
complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
fill = list(d = 0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(total_MAs = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(total_MAs = mean(total_MAs),
total_MAs_bigcorp = sum(big_corp_TF), .groups = 'drop') %>%
group_by(target_nation, acquiror_nation) %>%
mutate(share = sum_run(total_MAs_bigcorp, k=2)/sum_run(total_MAs, k=2))
这是输出:
date targ_nat acq_nat tot_MA big_MA share
1 1999 Mozambique France 1 0 0.0000000
2 1999 Mozambique Germany 1 0 0.0000000
3 1999 Mozambique Japan 1 0 0.0000000
4 2000 Mozambique France 0 0 0.0000000
5 2000 Mozambique Germany 0 0 0.0000000
6 2000 Mozambique Japan 0 0 0.0000000
7 2001 Mozambique France 1 1 1.0000000
8 2001 Mozambique Germany 1 0 0.0000000
9 2001 Mozambique Japan 1 0 0.0000000
10 2002 Mozambique France 3 0 0.2500000
11 2002 Mozambique Germany 3 1 0.2500000
12 2002 Mozambique Japan 3 1 0.2500000
13 2003 Mozambique France 1 0 0.0000000
14 2003 Mozambique Germany 1 0 0.2500000
15 2003 Mozambique Japan 1 0 0.2500000
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2001 Uganda France 3 1 0.4000000
19 2001 Uganda Germany 3 0 0.0000000
20 2002 Uganda France 0 0 0.3333333
21 2002 Uganda Germany 0 0 0.0000000
22 2003 Uganda France 1 0 0.0000000
23 2003 Uganda Germany 1 1 1.0000000
所有数字都如您所愿.但是,我希望日本在乌干达的投资有结果,但无法成功实现.我该如何实现? 我了解日本在乌干达没有结果的原因是日本在任何一年都没有对乌干达进行任何投资(如上数据样本所示);但这种缺乏投资对我来说是一个有意义的结果,我希望日本作为收购国也有争议.就像这样(出于空间原因,我将莫桑比克排除为 targ_nat):
All the numbers are as desired. However, I would like there to be the results for Japan's investments in Uganda but cannot succeed in achieving that. How can I achieve this? I understand that the reason for there being no results for Japan in Uganda is that Japan has not done any investment in Uganda in any year (as shown in the data sample above); but this lack of investment is a meaningful result for me, and I would like there to be rows for Japan as acquiror nation as well. Just like so (I exclude Mozambique as targ_nat for space reasons):
date targ_nat acq_nat tot_MA big_MA share
16 2000 Uganda France 2 1 0.5000000
17 2000 Uganda Germany 2 0 0.0000000
18 2000 Uganda Japan 2 0 0.0000000
19 2001 Uganda France 3 1 0.4000000
20 2001 Uganda Germany 3 0 0.0000000
21 2001 Uganda Japan 3 0 0.0000000
22 2002 Uganda France 0 0 0.3333333
22 2002 Uganda Germany 0 0 0.0000000
23 2002 Uganda Japan 0 0 0.0000000
24 2003 Uganda France 1 0 0.0000000
25 2003 Uganda Germany 1 1 1.0000000
26 2003 Uganda Japan 1 0 0.0000000
关于如何实现这一目标的任何想法?就我的实际目的而言,我希望将 13 个国家/地区的结果视为收购国(因此不仅仅是法国、德国和日本).这些国家在数据集中显示为收购国(但并非所有 target_nations (!) --- 就像这里的乌干达和日本的例子一样).
Any ideas on how to achieve that? For my actual purposes, I have a group of 13 countries for which I want to see the results as acquiror nations (so not just France, Germany, and Japan). These countries appear in the data set as acquiror nations (but not for all target_nations (!) --- just like in the example of Uganda and Japan here).
非常感谢任何帮助.
推荐答案
需要 complete
library(dplyr)
library(tidyr)
out <- df_calc %>%
group_by(target_nation, date, total_MAs) %>%
complete(acquiror_nation = unique(.$acquiror_nation),
fill = list(total_MAs_bigcorp = 0, share = 0)) %>%
ungroup
-检查乌干达"的输出
out %>%
filter(target_nation == 'Uganda')
# A tibble: 12 x 6
# target_nation date total_MAs acquiror_nation total_MAs_bigcorp share
# <chr> <dbl> <dbl> <chr> <dbl> <dbl>
# 1 Uganda 2000 2 France 1 0.5
# 2 Uganda 2000 2 Germany 0 0
# 3 Uganda 2000 2 Japan 0 0
# 4 Uganda 2001 3 France 1 0.4
# 5 Uganda 2001 3 Germany 0 0
# 6 Uganda 2001 3 Japan 0 0
# 7 Uganda 2002 0 France 0 0.333
# 8 Uganda 2002 0 Germany 0 0
# 9 Uganda 2002 0 Japan 0 0
#10 Uganda 2003 1 France 0 0
#11 Uganda 2003 1 Germany 1 1
#12 Uganda 2003 1 Japan 0 0
这篇关于如何在r中将缺失的国家包含到df中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!