根据月-年时间格式对数据框进行排序 [英] Sorting data frame based on month-year time format

查看:68
本文介绍了根据月-年时间格式对数据框进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决一些非常基本的问题:根据时间格式(在这种情况下为月-年或%B-%y")对数据帧进行排序.我的目标是从总和开始计算各种每月统计信息.

I'm struggling with something very basic: sorting a data frame based on a time format (month-year, or, "%B-%y" in this case). My goal is to calculate various monthly statistics, starting with sum.

数据框相关部分的部分看起来像这样*(这很顺利,并且符合我的目标.我将其包括在此处以显示问题可能的来源)* :

The part of relevant part of the data frame looks like this * (This goes well and in accordance of my goal. I'm including it here to show where the problem could originate from)*:

> tmp09
   Instrument AccountValue   monthYear   ExitTime
1         JPM         6997    april-07 2007-04-10
2         JPM         7261      mei-07 2007-05-29
3         JPM         7545     juli-07 2007-07-18
4         JPM         7614     juli-07 2007-07-19
5         JPM         7897 augustus-07 2007-08-22
10        JPM         7423 november-07 2007-11-02
11        KFT         6992      mei-07 2007-05-14
12        KFT         6944      mei-07 2007-05-21
13        KFT         7069     juli-07 2007-07-09
14        KFT         6919     juli-07 2007-07-16
# Order on the exit time, which corresponds with 'monthYear'
> tmp09.sorted <- tmp09[order(tmp09$ExitTime),]
> tmp09.sorted
   Instrument AccountValue   monthYear   ExitTime
1         JPM         6997    april-07 2007-04-10
11        KFT         6992      mei-07 2007-05-14
12        KFT         6944      mei-07 2007-05-21
2         JPM         7261      mei-07 2007-05-29
13        KFT         7069     juli-07 2007-07-09
14        KFT         6919     juli-07 2007-07-16
3         JPM         7545     juli-07 2007-07-18
4         JPM         7614     juli-07 2007-07-19
5         JPM         7897 augustus-07 2007-08-22
10        JPM         7423 november-07 2007-11-02

到目前为止,一切都很好,并且可以基于ExitTime进行排序. 当我尝试计算每月的总数,然后尝试对输出进行排序时,麻烦就开始了:

So far, so good, and sorting based on ExitTime works. The trouble starts when I try to calculate the totals per month, followed by an attempt to sort this output:

# Calculate the total results per month
> Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum)
> Tmp09Totals <- data.frame(Tmp09Totals)
> Tmp09Totals
            Tmp09Totals
april-07           6997
augustus-07        7897
juli-07           29147
mei-07            21197
november-07        7423

如何按时间顺序对输出进行排序?

我已经尝试过(除了尝试将monthYear转换为另一种日期格式外):排序,排序,sort.list,sort_df,重塑形状,并基于tapply,lapply,sapply,aggregate计算总和.甚至重写行名(通过给它们一个从1到长度的数字(tmp09.sorted2$AccountValue)都行不通.我还尝试根据我在另一个问题中所学到的知识,给每个月年一个不同的ID,但是R在区分各个月-年值时也遇到了困难.

I've already tried (besides various attempts to convert the monthYear to another date format): order, sort, sort.list, sort_df, reshape, and calculating the sum based on tapply, lapply, sapply, aggregate. And even rewriting the rownames (by giving them a number from 1 to length (tmp09.sorted2$AccountValue) didn't work. I also tried to give each month-year a different ID based on what I've learned in another question, but R also experienced difficulties in discriminating between the various month-year values.

此输出的正确顺序为april-07,mei-07,juli-07,augustus07, november-07:

apr-07  6997
mei-07  21197
jul-07  29147
aug-07  7897
nov-07  7423

推荐答案

以正确的顺序具有单独的MonthYear因素,并在两个变量的并集上使用tapply会更容易,例如:

It would be easier to have separate Month and Year factors, in the correct order, and use tapply on the union of both variables, e.g.:

## The Month factor
tmp09 <- within(tmp09,
                Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
                                                    levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
                Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
                                                    levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))

哪个给了我们(在我的语言环境中):

Which gives us (in my locale):

> head(tmp09)
   Instrument AccountValue   monthYear   ExitTime    Month Year
1         JPM         6997    april-07 2007-04-10    April 2007
2         JPM         7261      mei-07 2007-05-29      May 2007
3         JPM         7545     juli-07 2007-07-18     July 2007
4         JPM         7614     juli-07 2007-07-19     July 2007
5         JPM         7897 augustus-07 2007-08-22   August 2007
10        JPM         7423 november-07 2007-11-02 November 2007

然后将tapply与两个因素同时使用:

Then use tapply with both factors:

> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
          2007
April     6997
May      21197
July     29147
August    7897
November  7423

或通过aggregate:

> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
     Month Year     x
1    April 2007  6997
2      May 2007 21197
3     July 2007 29147
4   August 2007  7897
5 November 2007  7423

这篇关于根据月-年时间格式对数据框进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆