压缩并加入数据框 [英] Condense and join data frame

查看:54
本文介绍了压缩并加入数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要与另一个合并的数据框,并且我遇到一些问题,我认为这可以追溯到以下事实:每次观察都是一个事件,而不是累积量.我在下面有一个数据框,其中每一行都是一个单独的观察值,然后我将使用 week code 将其与基于相同数据的另一个数据框合并变量 week code .

I have a data frame that I'm trying to merge with another, and I'm having some issues that I believe trace back to the fact that each observation is an incident rather than a cumulative amount. I have the data frame below where each row is an individual observation, and I'll then use the week and the code to merge it with another data frame based on the same variables week and code.

数据帧 a 将每一行都作为特定观察值,但是我需要它成为按代码/日期累积的观察值/行.我完全迷住了.

data frame a has each row as a specific observation, but I need it to become a cumulative observation/row by code/date. I'm completely stumped.

  date       count       code  week
  <date>     <dbl>      <dbl> <dbl>
1 2020-06-07     4      13309    23
2 2020-06-07     5      13309    23
3 2020-07-12     6      18099    28
4 2020-07-12     8      18099    28

需要成为

  date       count       code  week
  <date>     <dbl>      <dbl> <dbl>
1 2020-06-07     9      13309    23
2 2020-07-12    14      18099    28

然后,它将能够与数据框 b

Then, it will be able to be merged with data frame b

  date       color     name       code  week
  <date>     <char>   <char>      <dbl> <dbl>
1 2020-06-07 Blue         A      13309    23
1 2020-06-07 Yellow       B      13309    23
1 2020-06-07 Purple       D      13309    23
3 2020-07-12 Yellow       A      18099    28
3 2020-07-12 Blue         E      18099    28

最终结果将是

  date       color     name     code   week    count
  <date>     <char>   <char>   <dbl>  <dbl>    <dbl>
1 2020-06-07 Blue         A    13309     23        9
1 2020-06-07 Yellow       B    13309     23        9
1 2020-06-07 Purple       D    13309     23        9
3 2020-07-12 Yellow       A    18099     28       14
3 2020-07-12 Blue         E    18099     28       14

我最初使用下面的代码来执行此操作,但是它完全炸毁了我的数据框.我的尺寸从 dim(a)==(209807,86)更改为 dim(merged)==(1367029,89).我尝试了多种类型的联接(右,左,内部等),但所有联接仍然炸毁了数据帧(相差几百个观察值,但仍然导致超过一百万行).这就是为什么我认为该问题是由于 a 是每个观察值而不是在特定日期针对特定代码的摘要观察值所致.

I originally used the code below to do this, but it completely blew up my data frame. My dimensions went from dim(a) == (209807, 86) to dim(merged) == (1367029, 89). I tried multiple types of joins (right, left, inner, etc.) but all of them still blew up the data frame (varied by a few 100 or so observations, but still resulted in well over a million rows). That's why I'm thinking the issue is due to a being each observation vs a summary observation for a specific code on a specific date.

merged <- right_join(x = b,
                     y = a, 
                     by = c("code" = "code",
                       "week" = "week"))

推荐答案

a %>%
  group_by(date, code, week) %>%
  summarize(count = sum(count)) %>%
  ungroup() %>%
  left_join(b, ., by = c("date", "code", "week"))
#         date  color name  code week count
# 1 2020-06-07   Blue    A 13309   23     9
# 2 2020-06-07 Yellow    B 13309   23     9
# 3 2020-06-07 Purple    D 13309   23     9
# 4 2020-07-12 Yellow    A 18099   28    14
# 5 2020-07-12   Blue    E 18099   28    14

这篇关于压缩并加入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆