合并两个数据帧,在 R 中删除重复项和聚合 [英] Merging two dataframes, removing duplicates and aggregation in R

查看:66
本文介绍了合并两个数据帧,在 R 中删除重复项和聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有两个名为 house 和候选人的数据框.

I have two dataframes in R named house and candidates.

house

      House       Region                 Military_Strength
1 Stark           The North              20000
2 Targaryen       Slaver's Bay           110000
3 Lannister       The Westerlands        60000
4 Baratheon       The Stormlands         40000
5 Tyrell          The Reach              30000


candidates

  House               Name                  Region
1 Lannister           Jamie Lannister       Westros
2 Stark               Robb Stark            North
3 Stark               Arya Stark            Westros
4 Lannister           Cersi Lannister       Westros
5 Targaryen           Daenerys Targaryen    Mereene
6 Baratheon           Robert Baratheon      Westros
7 Mormont             Jorah Mormont         Mereene

我想在house的基础上合并两个数据框.为此我有完成:

I want to merge the two dataframes on the basis of house. For that I have done:

merge(candidates, house, by="House", sort=FALSE)

输出是:

       House        Name         Region.x        Region.y   Military_Strength
 1 Lannister    Jamie Lannister  Westros     The Westerlands             60000
 2 Lannister    Cersi Lannister  Westros     The Westerlands             60000
 3 Stark         Robb Stark      North       The North                   20000
 4 Stark         Arya Stark      Westros     The North                   20000
 5 Targaryen Daenerys Targaryen  Mereene     Slaver's Bay                110000
 6 Baratheon   Robert Baratheon  Westros     The Stormlands              40000

我想从每个房子(如果有)中删除第二个姓名候选人,但它的Military_Strength 应该加到同一个房子的第一个候选人.

I want to remove the second Name candidate from every house(if any), but its Military_Strength should be added up to the first candidate of the same house.

例如:

4 Stark         Arya Stark      Westros     The North                   20000

将被移除,但 20000 将被添加到第 3 行 Robb Stark Military_Strength.如何以适当的方式做到这一点?

would be removed but, 20000 would be added up to row3 Robb Stark Military_Strength. How to do it in appropriate way?

推荐答案

merge()后得到的data.framedf1开始,可以进行:

Starting from the data.frame df1 obtained after merge(), one could proceed with:

df1$Military_Strength <- with(df1, ave(Military_Strength, House, FUN=sum))
df1[!duplicated(df1$House),]
#      House               Name Region.x        Region.y Military_Strength
#1 Lannister    Jamie Lannister  Westros The Westerlands            120000
#3     Stark         Robb Stark    North       The North             40000
#5 Targaryen Daenerys Targaryen  Mereene    Slaver's Bay            110000
#6 Baratheon   Robert Baratheon  Westros  The Stormlands             40000

本示例中使用的数据:

df1 <- structure(list(House = structure(c(2L, 2L, 3L, 3L, 4L, 1L), 
                .Label = c("Baratheon", "Lannister", "Stark", "Targaryen"), 
                class = "factor"), Name = structure(c(4L, 2L, 5L, 1L, 3L, 6L), 
                .Label = c("Arya Stark", "Cersi Lannister", "Daenerys Targaryen", 
                "Jamie Lannister", "Robb Stark", "Robert Baratheon"), 
                class = "factor"), Region.x = structure(c(3L, 3L, 2L, 3L, 1L, 3L), 
                .Label = c("Mereene", "North", "Westros"), class = "factor"), 
                Region.y = structure(c(4L, 4L, 2L, 2L, 1L, 3L), 
                .Label = c("Slaver's Bay", "The North", "The Stormlands",
                  "The Westerlands"), class = "factor"), 
                Military_Strength = c(60000L, 60000L, 20000L, 20000L, 110000L, 
                40000L)), .Names = c("House", "Name", "Region.x", "Region.y", 
                "Military_Strength"), class = "data.frame", row.names = c("1", 
                "2", "3", "4", "5", "6"))

这篇关于合并两个数据帧,在 R 中删除重复项和聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆