查找一列的最大值(按组),然后将值插入R中的另一数据帧 [英] Finding maximum value of one column (by group) and inserting value into another data frame in R

查看:75
本文介绍了查找一列的最大值(按组),然后将值插入R中的另一数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部



我希望有人能找到解决我的问题的方法,但不一定会引起头痛,但截至目前,它邀请了在为我正在处理的项目创建数据集时可能会出现人为错误。



我现在正在使用的数据集是有向向的年份( 1950年至2010年之间每年针对特定国家/地区对设置的A对B,B对A)数据集。某些国家/地区(如我的示例中的A)将与世界上每个国家/地区配对,并且每个国家/地区都将与它。一些国家,例如我的示例中的B和C,将仅与少数几个国家配对。有些对将缺少数据,在我的示例中没有显示。



我想做的是使用R查找给定列的最大值,对于给定的国家/地区,在给定的年份,然后将该值插入另一个数据框。希望这个插图能阐明我的想法。

  country1国家2年x1 x2 x3 x4 
AB 2000 50 30 1 20
AC 2000 70 2 5 90
AD 2000 10 90 20 30
AE 2000 95 10 10 5
AF 2000 10 10 10 0
AG 2000 5 5 0 0
AH 2000 10 30 25 40

............................... ...........

BA 1998 5 10 30 2
BD 1998 30 6 9 0
BI 1998 10 9 7 0

........................................

CA 2005 10 15 2 6
CD 2005 90 0 0 40
CX 2005 49 90 5 0

例如,说我对2000年的国家A感兴趣。我想现在它在2000年的 x1 的最大值是多少(与国家E配对为95)。我还想知道 x2 x3 x4 在给定年份的任何配对中(分别与国家D,国家H和国家C分别为90、25和90)。



在1998年的国家B和2005年的国家C中也是如此。



在给定国家/地区隔离这些列的最大值之后,我会像这样将这些值转储到数据帧中。

 国家/地区x1max x2max x3max x4max 
A 2000 95 90 25 90
B 1998 30 10 30 2
C 2005 2005 90 90 5 40

我在这方面很灵活。将每个国家/地区的最大值最大值转储到自己的1x5尺寸的数据框中,然后使用 rbind 将它们堆叠在一起是最简单的。

有人对如何进行有任何建议吗?



可重复的代码如下,但是,由于我手动操作,因此比以往任何时候都更容易产生人为错误。这个问题确实取决于为特定国家/地区隔离特定年份(例如A国为2000,而不是2001),我不确定可复制的代码是否一定有帮助。我希望至少是我的问题很清楚。

  country1 <-c( A,  b, a, a, a, a, a, b, b, b, c, c, c)
country2<-c( B, C, D, E, F, G, H, A, D, I, A ,, D, X)
year<-c(2000,2000,2000,2000,2000,2000,2000,1998,1998,1998,2005,2005,2005,2005)
x1 <-c(50,70,10,95,10,5,10,5,30,10,10,90,49)
x2 <-c(30,2,90,10, 10、5、30、10、6、9、15、0、90)
x3<-c(1、5、20、10、10、0、25、30、9、7、2, 0,5)
x4<-c(20,90,30,5,0,0,40,2,0,0,6,40,0)

Data = data.frame(country1 = country1,country2 = country2,year = year,x1 = x1,x2 = x2,x3 = x3,x4 = x4)
数据


解决方案

听起来您只是在寻找汇总

 >合计(cbind(x1,x2,x3,x4)〜country1 +年,数据,最大值)
country1年x1 x2 x3 x4
1 B 1998 30 10 30 2
2 A 2000 95 90 25 90
3 C 2005 90 90 5 5 40

您的问题不太清楚不过,您想如何从那里继续。...


All,

I was hoping someone could find a solution to an issue of mine that isn't necessarily causing headaches, but, as of right now, invites the possibility for human error in creating a data set for a project on which I'm working.

The data set I'm using right now is a directed dyad-year (A vs. B, B vs. A) data set for select pairs of countries for every year between 1950 and 2010. Some countries, like A in my example, will be paired with every country in the world and every country will be paired with it. Some countries, like B and C in my example, will be paired with just a few countries. Some pairs will have missing data, which I don't show in my example.

What I would like to do is use R to find the maximum value of a given column, for a given country, in a given year, and insert that value into another data frame. Hopefully this illustration will clarify what I would like to do.

country1 country2 year    x1   x2   x3   x4
   A        B     2000    50   30   1    20
   A        C     2000    70    2   5    90
   A        D     2000    10   90   20   30
   A        E     2000    95   10   10   5
   A        F     2000    10   10   10   0
   A        G     2000    5     5   0    0
   A        H     2000    10   30   25   40

  ........................................

  B        A      1998    5    10   30   2
  B        D      1998    30   6    9    0
  B        I      1998    10   9    7    0

  ........................................

  C        A      2005    10   15   2    6
  C        D      2005    90   0    0    40
  C        X      2005    49   90   5    0

Say, for example, that I'm interested in Country A in the year 2000. I want to know what is its max value of x1 in 2000 (which is 95, in its pairing with Country E). I also want to know what is its max value for x2, x3, and x4 in any pairing in that given year (which are 90, 25, and 90 with Country D, Country H, and Country C respectively).

The same follows for Country B in 1998, and Country C in 2005.

After isolating the max value of those columns for a given country in a given year, I'd like to dump those values into a dataframe, like this.

country   year    x1max    x2max    x3max    x4max
  A       2000      95       90       25       90
  B       1998      30       10       30        2
  C       2005      90       90        5       40

I'm flexible on this part. It might just be easiest to dump those max values for each country into their own data frames of dimensions 1x5, and then use rbind to stack them together.

Does anyone have any advice on how to proceed? It'd save me the hassle of having to do it manually, which, more than anything, invites the possibility of human error.

Reproducible code follows, though, since my question does hinge on isolating a particular year for a particular country (e.g. 2000 for Country A instead of 2001), I'm not sure the reproducible code is necessarily helpful. I hope it is, or, at least, that my question is clear.

country1 <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "C", "C", "C")
country2 <- c("B","C","D","E","F","G","H","A","D","I","A","D","X")
year <- c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 1998, 1998, 1998, 2005, 2005, 2005)
x1 <- c(50, 70, 10, 95, 10, 5, 10, 5, 30, 10, 10, 90, 49)
x2 <- c(30, 2, 90, 10, 10, 5, 30, 10, 6, 9, 15, 0, 90)
x3 <- c(1, 5, 20, 10, 10, 0, 25, 30, 9, 7, 2, 0, 5)
x4 <- c(20, 90, 30, 5, 0,0,40,2,0,0,6,40,0)

Data=data.frame(country1=country1,country2=country2,year=year,x1=x1,x2=x2,x3=x3,x4=x4)
Data

解决方案

It sounds like you're just looking for aggregate:

> aggregate(cbind(x1, x2, x3, x4) ~ country1 + year, Data, max)
  country1 year x1 x2 x3 x4
1        B 1998 30 10 30  2
2        A 2000 95 90 25 90
3        C 2005 90 90  5 40

It's not very clear from your question how you want to proceed from there though....

这篇关于查找一列的最大值(按组),然后将值插入R中的另一数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆