处理变量以在R中产生新的数据集 [英] Manipulating variables to produce a new dataset in R

查看:238
本文介绍了处理变量以在R中产生新的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是相对较新的R用户。我非常感谢您对我的数据集的任何帮助。

I'm a relatively new R user. I would really appreciate any help with my dataset please.

我有一个包含2400万行的数据集。数据集中有3个变量:患者姓名,药房名称和该次就诊时从药房购买的药物数量。

I have a dataset with 24 million rows. There are 3 variables in the dataset: patient name, pharmacy name, and count of medications picked up from the pharmacy at that visit.

数据集中出现的某些患者

Some patients appear in the dataset more than once (ie. they have picked up medications from different pharmacies at different time points).

数据框如下所示:

df <- data.frame(name = c("Tom", "Rob", "Tom", "Tom",  "Amy"), 
                 pharmacy = c("A", "B", "B", "B", "C"), 
                 meds = c(3, 2, 5, 8, 2))

我想从此数据生成一个新数据集,该数据集为每个患者提供一个药房。该药房必须是患者获得最多药物的药房。

From this data I want to generate a new dataset, which has ONE pharmacy for each patient. This pharmacy needs to be the one where the patient has picked up the highest number of medications.

例如:对于汤姆,他最常去的药房是药房B,因为他从那儿捡了13种药物(5 + 8种药物)。我想生成的数据集:

For example: for Tom his most frequent pharmacy is Pharmacy B because he has picked up 13 medications from there (5+8 meds). The dataset I would like to generate:

data.frame(name = c("Tom", "Rob",  "Amy"), 
           pharmacy = c("B", "B", "C"), 
           meds = c(13, 2, 2))

有人可以帮助我编写代码吗?
我已经尝试过R中的各种功能,例如 dplyr tidyr gregate()没有成功。

Can someone please help me with writing a code to do this? I have tried various functions in R, such as dplyr, tidyr, aggregate() with no success. Any help would be genuinely appreciated.

非常感谢

Alex

推荐答案

如果我对您的理解正确,我认为您正在寻找类似的东西。

If I understood you correctly, I think you're looking for something like this.

require(tidyverse)
#Sample data. I copied yours. 
df <- data.frame(name = c("Tom", "Rob", "Tom", "Tom",  "Amy"), 
                 pharmacy = c("A", "B", "B", "B", "C"), 
                 meds = c(3, 2, 5, 8, 2))


编辑。我更改了group_by(),summarise()并添加了过滤器。


df %>% 
  group_by(name, pharmacy) %>%
  summarise(SumMeds = sum(meds, na.rm = TRUE)) %>% 
  filter(SumMeds == max(SumMeds))

结果:

  name  pharmacy SumMeds
  <fct> <fct>      <dbl>
1 Amy   C             2.
2 Rob   B             2.
3 Tom   B            13.

这篇关于处理变量以在R中产生新的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆