基于365天的窗口创建10,000个具有假年份的日期数据框 [英] Create 10,000 date data.frames with fake years based on 365 days window

查看:111
本文介绍了基于365天的窗口创建10,000个具有假年份的日期数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的时间段范围:

start_day = as.Date('1974-01-01', format = '%Y-%m-%d')
end_day = as.Date('2014-12-21', format = '%Y-%m-%d')

df = as.data.frame(seq(from = start_day, to = end_day, by = 'day'))
colnames(df) = 'date'

我需要创建10,000个数据框,每个假框的假年各为365天.这意味着每10,000个数据框都需要具有不同的开始和结束时间.

I need to created 10,000 data.frames with different fake years of 365days each one. This means that each of the 10,000 data.frames needs to have different start and end of year.

总共df天有14,965天,除以365天= 41年.换句话说,df需要按41年(每个365天)进行 10,000次不同的分组. 每年的开始必须是随机的,因此可以是1974-10-03、1974-08-30、1976-01-03等,并且df末尾的剩余日期需要回收从头开始.

In total df has got 14,965 days which, divided by 365 days = 41 years. In other words, df needs to be grouped 10,000 times differently by 41 years (of 365 days each one). The start of each year has to be random, so it can be 1974-10-03, 1974-08-30, 1976-01-03, etc... and the remaining dates at the end df need to be recycled with the starting one.

分组的假年份需要出现在data.frames的第3行中.

The grouped fake years need to appear in a 3rd col of the data.frames.

我会将所有data.frames放入列表中,但是我不知道如何创建该函数来生成10,000个不同年份的开始日期,然后将每个data.frame与365天的窗口分组41次.

I would put all the data.frames into a list but I don't know how to create the function which generates 10,000 different year's start dates and subsequently group each data.frame with a 365 days window 41 times.

有人可以帮助我吗?

@gringer给出了很好的答案,但仅解决了90%的问题:

@gringer gave a good answer but it solved only 90% of the problem:

dates.df <- data.frame(replicate(10000, seq(sample(df$date, 1),
                                            length.out=365, by="day"),
                                 simplify=FALSE))
colnames(dates.df) <- 1:10000

我需要的是10,000列,其中有14,965行是根据从df提取的日期进行的,当到达df的末尾时需要最终对其进行回收.

What I need is 10,000 columns with 14,965 rows made by dates taken from df which need to be eventually recycled when reaching the end of df.

我尝试更改length.out = 14965,但是R不会回收日期.

I tried to change length.out = 14965 but R does not recycle the dates.

另一种选择是更改length.out = 1并最终通过保持相同的顺序为每列添加其余的df:

Another option could be to change length.out = 1 and eventually add the remaining df rows for each column by maintaining the same order:

dates.df <- data.frame(replicate(10000, seq(sample(df$date, 1),
                                            length.out=1, by="day"),
                                 simplify=FALSE))
colnames(dates.df) <- 1:10000

如何将其余的df行添加到每个列中?

How can I add the remaining df rows to each col?

推荐答案

如果未指定to参数,则seq方法也可以使用,因此可用于生成从特定日期开始的特定天数. :

The seq method also works if the to argument is unspecified, so it can be used to generate a specific number of days starting at a particular date:

> seq(from=df$date[20], length.out=10, by="day")
[1] "1974-01-20" "1974-01-21" "1974-01-22" "1974-01-23" "1974-01-24"
[6] "1974-01-25" "1974-01-26" "1974-01-27" "1974-01-28" "1974-01-29"

当与replicatesample结合使用时,我认为这将在列表中提供您想要的内容:

When used in combination with replicate and sample, I think this will give what you want in a list:

> replicate(2,seq(sample(df$date, 1), length.out=10, by="day"), simplify=FALSE)
[[1]]
 [1] "1985-07-24" "1985-07-25" "1985-07-26" "1985-07-27" "1985-07-28"
 [6] "1985-07-29" "1985-07-30" "1985-07-31" "1985-08-01" "1985-08-02"

[[2]]
 [1] "2012-10-13" "2012-10-14" "2012-10-15" "2012-10-16" "2012-10-17"
 [6] "2012-10-18" "2012-10-19" "2012-10-20" "2012-10-21" "2012-10-22"

没有simplify=FALSE参数,它将生成一个整数数组(即R的内部日期表示),将其转换回日期有点棘手.执行此操作的一种更复杂的方法是产生Date输出,这是在未简化的replicate结果上使用data.frame.这是一个示例,该示例将产生一个10,000列的数据帧,每列中有365个日期(在我的计算机上生成大约需要5s):

Without the simplify=FALSE argument, it produces an array of integers (i.e. R's internal representation of dates), which is a bit trickier to convert back to dates. A slightly more convoluted way to do this is and produce Date output is to use data.frame on the unsimplified replicate result. Here's an example that will produce a 10,000-column data frame with 365 dates in each column (takes about 5s to generate on my computer):

dates.df <- data.frame(replicate(10000, seq(sample(df$date, 1),
                                            length.out=365, by="day"),
                                 simplify=FALSE));
colnames(dates.df) <- 1:10000;
> dates.df[1:5,1:5];
           1          2          3          4          5
1 1988-09-06 1996-05-30 1987-07-09 1974-01-15 1992-03-07
2 1988-09-07 1996-05-31 1987-07-10 1974-01-16 1992-03-08
3 1988-09-08 1996-06-01 1987-07-11 1974-01-17 1992-03-09
4 1988-09-09 1996-06-02 1987-07-12 1974-01-18 1992-03-10
5 1988-09-10 1996-06-03 1987-07-13 1974-01-19 1992-03-11

要使日期自动换行,可以对原始数据框进行一些修改,将其自身的副本粘贴在末尾:

To get the date wraparound working, a slight modification can be made to the original data frame, pasting a copy of itself on the end:

df <- as.data.frame(c(seq(from = start_day, to = end_day, by = 'day'),
                      seq(from = start_day, to = end_day, by = 'day')));
colnames(df) <- "date";

这对于下游代码更容易编码;替代方案是为每个结果列加一个双seq,并为开始/结束语句和if语句进行额外的计算以处理边界情况.

This is easier to code for downstream; the alternative being a double seq for each result column with additional calculations for the start/end and if statements to deal with boundary cases.

现在,结果列从原始数据帧(已经完成算术)开始,而不是进行日期算术运算.从帧的前半部分的一个日期开始,然后选择下一个14965值.我使用的是nrow(df)/2而不是更通用的代码:

Now instead of doing date arithmetic, the result columns subset from the original data frame (where the arithmetic is already done). Starting with one date in the first half of the frame and choosing the next 14965 values. I'm using nrow(df)/2 instead for a more generic code:

dates.df <-
    as.data.frame(lapply(sample.int(nrow(df)/2, 10000),
                         function(startPos){
                             df$date[startPos:(startPos+nrow(df)/2-1)];
                         }));
colnames(dates.df) <- 1:10000;

>dates.df[c(1:5,(nrow(dates.df)-5):nrow(dates.df)),1:5];
               1          2          3          4          5
1     1988-10-21 1999-10-18 2009-04-06 2009-01-08 1988-12-28
2     1988-10-22 1999-10-19 2009-04-07 2009-01-09 1988-12-29
3     1988-10-23 1999-10-20 2009-04-08 2009-01-10 1988-12-30
4     1988-10-24 1999-10-21 2009-04-09 2009-01-11 1988-12-31
5     1988-10-25 1999-10-22 2009-04-10 2009-01-12 1989-01-01
14960 1988-10-15 1999-10-12 2009-03-31 2009-01-02 1988-12-22
14961 1988-10-16 1999-10-13 2009-04-01 2009-01-03 1988-12-23
14962 1988-10-17 1999-10-14 2009-04-02 2009-01-04 1988-12-24
14963 1988-10-18 1999-10-15 2009-04-03 2009-01-05 1988-12-25
14964 1988-10-19 1999-10-16 2009-04-04 2009-01-06 1988-12-26
14965 1988-10-20 1999-10-17 2009-04-05 2009-01-07 1988-12-27

现在花费的时间要少一些,大概是因为日期值已经预先计算了.

This takes a bit less time now, presumably because the date values have been pre-caclulated.

这篇关于基于365天的窗口创建10,000个具有假年份的日期数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆