查找一列的最大值(按组),然后将值插入R中的另一数据帧 [英] Finding maximum value of one column (by group) and inserting value into another data frame in R
问题描述
全部
我希望有人能找到解决我的问题的方法,但不一定会引起头痛,但截至目前,它邀请了在为我正在处理的项目创建数据集时可能会出现人为错误。
我现在正在使用的数据集是有向向的年份( 1950年至2010年之间每年针对特定国家/地区对设置的A对B,B对A)数据集。某些国家/地区(如我的示例中的A)将与世界上每个国家/地区配对,并且每个国家/地区都将与它。一些国家,例如我的示例中的B和C,将仅与少数几个国家配对。有些对将缺少数据,在我的示例中没有显示。
我想做的是使用R查找给定列的最大值,对于给定的国家/地区,在给定的年份,然后将该值插入另一个数据框。希望这个插图能阐明我的想法。
country1国家2年x1 x2 x3 x4
AB 2000 50 30 1 20
AC 2000 70 2 5 90
AD 2000 10 90 20 30
AE 2000 95 10 10 5
AF 2000 10 10 10 0
AG 2000 5 5 0 0
AH 2000 10 30 25 40
............................... ...........
BA 1998 5 10 30 2
BD 1998 30 6 9 0
BI 1998 10 9 7 0
........................................
CA 2005 10 15 2 6
CD 2005 90 0 0 40
CX 2005 49 90 5 0
例如,说我对2000年的国家A感兴趣。我想现在它在2000年的 x1
的最大值是多少(与国家E配对为95)。我还想知道 x2
, x3
和 x4
在给定年份的任何配对中(分别与国家D,国家H和国家C分别为90、25和90)。
在1998年的国家B和2005年的国家C中也是如此。
在给定国家/地区隔离这些列的最大值之后,我会像这样将这些值转储到数据帧中。
国家/地区x1max x2max x3max x4max
A 2000 95 90 25 90
B 1998 30 10 30 2
C 2005 2005 90 90 5 40
我在这方面很灵活。将每个国家/地区的最大值最大值转储到自己的1x5尺寸的数据框中,然后使用 rbind
将它们堆叠在一起是最简单的。
有人对如何进行有任何建议吗?
可重复的代码如下,但是,由于我手动操作,因此比以往任何时候都更容易产生人为错误。这个问题确实取决于为特定国家/地区隔离特定年份(例如A国为2000,而不是2001),我不确定可复制的代码是否一定有帮助。我希望至少是我的问题很清楚。
country1 <-c( A, b, a, a, a, a, a, b, b, b, c, c, c)
country2<-c( B, C, D, E, F, G, H, A, D, I, A ,, D, X)
year<-c(2000,2000,2000,2000,2000,2000,2000,1998,1998,1998,2005,2005,2005,2005)
x1 <-c(50,70,10,95,10,5,10,5,30,10,10,90,49)
x2 <-c(30,2,90,10, 10、5、30、10、6、9、15、0、90)
x3<-c(1、5、20、10、10、0、25、30、9、7、2, 0,5)
x4<-c(20,90,30,5,0,0,40,2,0,0,6,40,0)
Data = data.frame(country1 = country1,country2 = country2,year = year,x1 = x1,x2 = x2,x3 = x3,x4 = x4)
数据
听起来您只是在寻找汇总
:
>合计(cbind(x1,x2,x3,x4)〜country1 +年,数据,最大值)
country1年x1 x2 x3 x4
1 B 1998 30 10 30 2
2 A 2000 95 90 25 90
3 C 2005 90 90 5 5 40
您的问题不太清楚不过,您想如何从那里继续。...
All,
I was hoping someone could find a solution to an issue of mine that isn't necessarily causing headaches, but, as of right now, invites the possibility for human error in creating a data set for a project on which I'm working.
The data set I'm using right now is a directed dyad-year (A vs. B, B vs. A) data set for select pairs of countries for every year between 1950 and 2010. Some countries, like A in my example, will be paired with every country in the world and every country will be paired with it. Some countries, like B and C in my example, will be paired with just a few countries. Some pairs will have missing data, which I don't show in my example.
What I would like to do is use R to find the maximum value of a given column, for a given country, in a given year, and insert that value into another data frame. Hopefully this illustration will clarify what I would like to do.
country1 country2 year x1 x2 x3 x4
A B 2000 50 30 1 20
A C 2000 70 2 5 90
A D 2000 10 90 20 30
A E 2000 95 10 10 5
A F 2000 10 10 10 0
A G 2000 5 5 0 0
A H 2000 10 30 25 40
........................................
B A 1998 5 10 30 2
B D 1998 30 6 9 0
B I 1998 10 9 7 0
........................................
C A 2005 10 15 2 6
C D 2005 90 0 0 40
C X 2005 49 90 5 0
Say, for example, that I'm interested in Country A in the year 2000. I want to know what is its max value of x1
in 2000 (which is 95, in its pairing with Country E). I also want to know what is its max value for x2
, x3
, and x4
in any pairing in that given year (which are 90, 25, and 90 with Country D, Country H, and Country C respectively).
The same follows for Country B in 1998, and Country C in 2005.
After isolating the max value of those columns for a given country in a given year, I'd like to dump those values into a dataframe, like this.
country year x1max x2max x3max x4max
A 2000 95 90 25 90
B 1998 30 10 30 2
C 2005 90 90 5 40
I'm flexible on this part. It might just be easiest to dump those max values for each country into their own data frames of dimensions 1x5, and then use rbind
to stack them together.
Does anyone have any advice on how to proceed? It'd save me the hassle of having to do it manually, which, more than anything, invites the possibility of human error.
Reproducible code follows, though, since my question does hinge on isolating a particular year for a particular country (e.g. 2000 for Country A instead of 2001), I'm not sure the reproducible code is necessarily helpful. I hope it is, or, at least, that my question is clear.
country1 <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "C", "C", "C")
country2 <- c("B","C","D","E","F","G","H","A","D","I","A","D","X")
year <- c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 1998, 1998, 1998, 2005, 2005, 2005)
x1 <- c(50, 70, 10, 95, 10, 5, 10, 5, 30, 10, 10, 90, 49)
x2 <- c(30, 2, 90, 10, 10, 5, 30, 10, 6, 9, 15, 0, 90)
x3 <- c(1, 5, 20, 10, 10, 0, 25, 30, 9, 7, 2, 0, 5)
x4 <- c(20, 90, 30, 5, 0,0,40,2,0,0,6,40,0)
Data=data.frame(country1=country1,country2=country2,year=year,x1=x1,x2=x2,x3=x3,x4=x4)
Data
It sounds like you're just looking for aggregate
:
> aggregate(cbind(x1, x2, x3, x4) ~ country1 + year, Data, max)
country1 year x1 x2 x3 x4
1 B 1998 30 10 30 2
2 A 2000 95 90 25 90
3 C 2005 90 90 5 40
It's not very clear from your question how you want to proceed from there though....
这篇关于查找一列的最大值(按组),然后将值插入R中的另一数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!