通过R选择具有最大日期SQL的组 [英] select observation by group with max date SQL via R
问题描述
我拥有类似结构的数据,如下所示。这是在R代码,但如果你可以只写没有R的东西的查询这也是罚款。
我有多个组,每个观察有日期。我想从每个组中选择一个与该特定组的最大日期(最新日期)相对应的行。没有重复日期。
Df = data.frame(dates = c('2012-01-25','2012 '''','2013-07-31','2013-05-30'),
group = c('a','a','b','b'),
value = c(1,2,3,4))
Library(sqldf)
(Desiredresults = Df [2:3,])
#1 2012- 08-20 a 2
#2 2013-07-31 b 3
目前还不清楚你想要R或SQL的解决方案,所以这里都是。
首先,假设您的日期
列的类是 Date
,如
Df $日期< - as.Date(Df $日期)
SQL $ b 使用 sqldf
包你基本上有两个简单的解决方案,要么明确地选择
sqldf('select max(dates)as date,group,df group bygroup')
#日期组值
#1 2012-08-20 a 2
#2 2013-07-31 b 3
或者您可以选择所有栏目
sqldf('select * from Df where dates in(select max(dates)from Df group )
#日期组值
#1 2012-08-20 a 2
#2 2013-07-31 b 3
R
$ b $所以在R中可能有很多可能的解决方案ns
library(data.table)
setDT(Df)[,.SD [which.max(dates) ],by = group]
#组日期值
#1:a 2012-08-20 2
#2:b 2013-07-31 3
$或者
$ p $library(dplyr)
Df%>%
group_by(group)%>%
过滤器(日期== max(日期))
#来源:本地数据表[ 2 x 3]
#团体:团体
#
#日期团体价值
#1 2012-08-20 a 2
#2 2013-07-31 b 3
或
do.call(rbind,by(Df,Df $ group,function(x)x [which.max(x $ dates),]))
#日期组值
# 1:2012-08-20 a 2
#2:2013-07-31 b 3
I have data in a similar structure as shown below. This is in R code but if you can just write the query without the R stuff thats fine too.
I have multiple groups and there are dates for each observation. I want to select a single row from each group that corresponds to the max date (most recent date) for that particular group. There are no duplicate dates.
Df = data.frame(dates=c('2012-01-25','2012-08-20','2013-07-31','2013-05-30'),
group=c('a','a','b','b'),
value=c(1,2,3,4))
Library(sqldf)
(Desiredresults = Df[2:3,])
# 1 2012-08-20 a 2
# 2 2013-07-31 b 3
It's not clear what solution you want R or SQL, so here are both.
First, I'm assuming your dates
column is of class Date
as in
Df$dates <- as.Date(Df$dates)
SQL
Using the sqldf
package you basically have two simple solutions, either explicitly select the columns where dates
is maximum
sqldf('select max(dates) as dates, "group", value from Df group by "group"')
# dates group value
# 1 2012-08-20 a 2
# 2 2013-07-31 b 3
Or you can select all the columns
sqldf('select * from Df where dates in (select max(dates) from Df group by "group")')
# dates group value
# 1 2012-08-20 a 2
# 2 2013-07-31 b 3
R
So in R there could many possible solutions
library(data.table)
setDT(Df)[, .SD[which.max(dates)], by = group]
# group dates value
# 1: a 2012-08-20 2
# 2: b 2013-07-31 3
Or
library(dplyr)
Df %>%
group_by(group) %>%
filter(dates == max(dates))
# Source: local data table [2 x 3]
# Groups: group
#
# dates group value
# 1 2012-08-20 a 2
# 2 2013-07-31 b 3
Or
do.call(rbind, by(Df, Df$group, function(x) x[which.max(x$dates), ]))
# dates group value
# 1: 2012-08-20 a 2
# 2: 2013-07-31 b 3
这篇关于通过R选择具有最大日期SQL的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!