压缩并加入数据框 [英] Condense and join data frame
问题描述
我有一个要与另一个合并的数据框,并且我遇到一些问题,我认为这可以追溯到以下事实:每次观察都是一个事件,而不是累积量.我在下面有一个数据框,其中每一行都是一个单独的观察值,然后我将使用 week
和 code
将其与基于相同数据的另一个数据框合并变量 week
和 code
.
I have a data frame that I'm trying to merge with another, and I'm having some issues that I believe trace back to the fact that each observation is an incident rather than a cumulative amount. I have the data frame below where each row is an individual observation, and I'll then use the week
and the code
to merge it with another data frame based on the same variables week
and code
.
数据帧 a
将每一行都作为特定观察值,但是我需要它成为按代码/日期累积的观察值/行.我完全迷住了.
data frame a
has each row as a specific observation, but I need it to become a cumulative observation/row by code/date. I'm completely stumped.
date count code week
<date> <dbl> <dbl> <dbl>
1 2020-06-07 4 13309 23
2 2020-06-07 5 13309 23
3 2020-07-12 6 18099 28
4 2020-07-12 8 18099 28
需要成为
date count code week
<date> <dbl> <dbl> <dbl>
1 2020-06-07 9 13309 23
2 2020-07-12 14 18099 28
然后,它将能够与数据框 b
Then, it will be able to be merged with data frame b
date color name code week
<date> <char> <char> <dbl> <dbl>
1 2020-06-07 Blue A 13309 23
1 2020-06-07 Yellow B 13309 23
1 2020-06-07 Purple D 13309 23
3 2020-07-12 Yellow A 18099 28
3 2020-07-12 Blue E 18099 28
最终结果将是
date color name code week count
<date> <char> <char> <dbl> <dbl> <dbl>
1 2020-06-07 Blue A 13309 23 9
1 2020-06-07 Yellow B 13309 23 9
1 2020-06-07 Purple D 13309 23 9
3 2020-07-12 Yellow A 18099 28 14
3 2020-07-12 Blue E 18099 28 14
我最初使用下面的代码来执行此操作,但是它完全炸毁了我的数据框.我的尺寸从 dim(a)==(209807,86)
更改为 dim(merged)==(1367029,89)
.我尝试了多种类型的联接(右,左,内部等),但所有联接仍然炸毁了数据帧(相差几百个观察值,但仍然导致超过一百万行).这就是为什么我认为该问题是由于 a
是每个观察值而不是在特定日期针对特定代码的摘要观察值所致.
I originally used the code below to do this, but it completely blew up my data frame. My dimensions went from dim(a) == (209807, 86)
to dim(merged) == (1367029, 89)
. I tried multiple types of joins (right, left, inner, etc.) but all of them still blew up the data frame (varied by a few 100 or so observations, but still resulted in well over a million rows). That's why I'm thinking the issue is due to a
being each observation vs a summary observation for a specific code on a specific date.
merged <- right_join(x = b,
y = a,
by = c("code" = "code",
"week" = "week"))
推荐答案
a %>%
group_by(date, code, week) %>%
summarize(count = sum(count)) %>%
ungroup() %>%
left_join(b, ., by = c("date", "code", "week"))
# date color name code week count
# 1 2020-06-07 Blue A 13309 23 9
# 2 2020-06-07 Yellow B 13309 23 9
# 3 2020-06-07 Purple D 13309 23 9
# 4 2020-07-12 Yellow A 18099 28 14
# 5 2020-07-12 Blue E 18099 28 14
这篇关于压缩并加入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!