根据月-年时间格式对数据框进行排序 [英] Sorting data frame based on month-year time format
问题描述
我正在努力解决一些非常基本的问题:根据时间格式(在这种情况下为月-年或%B-%y")对数据帧进行排序.我的目标是从总和开始计算各种每月统计信息.
I'm struggling with something very basic: sorting a data frame based on a time format (month-year, or, "%B-%y" in this case). My goal is to calculate various monthly statistics, starting with sum.
数据框相关部分的部分看起来像这样*(这很顺利,并且符合我的目标.我将其包括在此处以显示问题可能的来源)* :
The part of relevant part of the data frame looks like this * (This goes well and in accordance of my goal. I'm including it here to show where the problem could originate from)*:
> tmp09
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
2 JPM 7261 mei-07 2007-05-29
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
# Order on the exit time, which corresponds with 'monthYear'
> tmp09.sorted <- tmp09[order(tmp09$ExitTime),]
> tmp09.sorted
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
2 JPM 7261 mei-07 2007-05-29
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
到目前为止,一切都很好,并且可以基于ExitTime进行排序. 当我尝试计算每月的总数,然后尝试对输出进行排序时,麻烦就开始了:
So far, so good, and sorting based on ExitTime works. The trouble starts when I try to calculate the totals per month, followed by an attempt to sort this output:
# Calculate the total results per month
> Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum)
> Tmp09Totals <- data.frame(Tmp09Totals)
> Tmp09Totals
Tmp09Totals
april-07 6997
augustus-07 7897
juli-07 29147
mei-07 21197
november-07 7423
如何按时间顺序对输出进行排序?
我已经尝试过(除了尝试将monthYear转换为另一种日期格式外):排序,排序,sort.list,sort_df,重塑形状,并基于tapply,lapply,sapply,aggregate计算总和.甚至重写行名(通过给它们一个从1到长度的数字(tmp09.sorted2$AccountValue
)都行不通.我还尝试根据我在另一个问题中所学到的知识,给每个月年一个不同的ID,但是R在区分各个月-年值时也遇到了困难.
I've already tried (besides various attempts to convert the monthYear to another date format): order, sort, sort.list, sort_df, reshape, and calculating the sum based on tapply, lapply, sapply, aggregate. And even rewriting the rownames (by giving them a number from 1 to length (tmp09.sorted2$AccountValue
) didn't work. I also tried to give each month-year a different ID based on what I've learned in another question, but R also experienced difficulties in discriminating between the various month-year values.
此输出的正确顺序为april-07,mei-07,juli-07,augustus07, november-07
:
apr-07 6997
mei-07 21197
jul-07 29147
aug-07 7897
nov-07 7423
推荐答案
以正确的顺序具有单独的Month
和Year
因素,并在两个变量的并集上使用tapply
会更容易,例如:
It would be easier to have separate Month
and Year
factors, in the correct order, and use tapply
on the union of both variables, e.g.:
## The Month factor
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))
哪个给了我们(在我的语言环境中):
Which gives us (in my locale):
> head(tmp09)
Instrument AccountValue monthYear ExitTime Month Year
1 JPM 6997 april-07 2007-04-10 April 2007
2 JPM 7261 mei-07 2007-05-29 May 2007
3 JPM 7545 juli-07 2007-07-18 July 2007
4 JPM 7614 juli-07 2007-07-19 July 2007
5 JPM 7897 augustus-07 2007-08-22 August 2007
10 JPM 7423 november-07 2007-11-02 November 2007
然后将tapply
与两个因素同时使用:
Then use tapply
with both factors:
> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
2007
April 6997
May 21197
July 29147
August 7897
November 7423
或通过aggregate
:
> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
Month Year x
1 April 2007 6997
2 May 2007 21197
3 July 2007 29147
4 August 2007 7897
5 November 2007 7423
这篇关于根据月-年时间格式对数据框进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!