通过R选择具有最大日期SQL的组 [英] select observation by group with max date SQL via R

查看:142
本文介绍了通过R选择具有最大日期SQL的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有类似结构的数据,如下所示。这是在R代码,但如果你可以只写没有R的东西的查询这也是罚款。

我有多个组,每个观察有日期。我想从每个组中选择一个与该特定组的最大日期(最新日期)相对应的行。没有重复日期。

  Df = data.frame(dates = c('2012-01-25','2012 '''','2013-07-31','2013-05-30'),
group = c('a','a','b','b'),
value = c(1,2,3,4))

Library(sqldf)
(Desiredresults = Df [2:3,])
#1 2012- 08-20 a 2
#2 2013-07-31 b 3


解决方案

目前还不清楚你想要R或SQL的解决方案,所以这里都是。
首先,假设您的日期列的类是 Date ,如

中的

  Df $日期<  -  as.Date(Df $日期)

SQL $ b 使用 sqldf 包你基本上有两个简单的解决方案,要么明确地选择日期是最大值的列

  sqldf('select max(dates)as date,group,df group bygroup')
#日期组值
#1 2012-08-20 a 2
#2 2013-07-31 b 3

或者您可以选择所有栏目

  sqldf('select * from Df where dates in(select max(dates)from Df group )
#日期组值
#1 2012-08-20 a 2
#2 2013-07-31 b 3






R
$ b $所以在R中可能有很多可能的解决方案ns

  library(data.table)
setDT(Df)[,.SD [which.max(dates) ],by = group]
#组日期值
#1:a 2012-08-20 2
#2:b 2013-07-31 3


$或者

$ p $ library(dplyr)
Df%>%
group_by(group)%>%
过滤器(日期== max(日期))

#来源:本地数据表[ 2 x 3]
#团体:团体

#日期团体价值
#1 2012-08-20 a 2
#2 2013-07-31 b 3



  do.call(rbind,by(Df,Df $ group,function(x)x [which.max(x $ dates),]))
#日期组值
# 1:2012-08-20 a 2
#2:2013-07-31 b 3


I have data in a similar structure as shown below. This is in R code but if you can just write the query without the R stuff thats fine too.

I have multiple groups and there are dates for each observation. I want to select a single row from each group that corresponds to the max date (most recent date) for that particular group. There are no duplicate dates.

Df = data.frame(dates=c('2012-01-25','2012-08-20','2013-07-31','2013-05-30'), 
                group=c('a','a','b','b'), 
                value=c(1,2,3,4))

Library(sqldf)
(Desiredresults = Df[2:3,])
# 1 2012-08-20     a     2
# 2 2013-07-31     b     3

解决方案

It's not clear what solution you want R or SQL, so here are both. First, I'm assuming your dates column is of class Date as in

Df$dates <- as.Date(Df$dates)

SQL

Using the sqldf package you basically have two simple solutions, either explicitly select the columns where dates is maximum

sqldf('select max(dates) as dates, "group", value from Df group by "group"')
#        dates group value
# 1 2012-08-20     a     2
# 2 2013-07-31     b     3

Or you can select all the columns

sqldf('select * from Df where dates in (select max(dates) from Df group by "group")')
#        dates group value
# 1 2012-08-20     a     2
# 2 2013-07-31     b     3


R

So in R there could many possible solutions

library(data.table)
setDT(Df)[, .SD[which.max(dates)], by = group]
#    group      dates value
# 1:     a 2012-08-20     2
# 2:     b 2013-07-31     3

Or

library(dplyr)
Df %>%
  group_by(group) %>%
  filter(dates == max(dates))

# Source: local data table [2 x 3]
# Groups: group
# 
#        dates group value
# 1 2012-08-20     a     2
# 2 2013-07-31     b     3

Or

do.call(rbind, by(Df, Df$group, function(x) x[which.max(x$dates), ]))
#         dates group value
# 1: 2012-08-20     a     2
# 2: 2013-07-31     b     3

这篇关于通过R选择具有最大日期SQL的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆