如何将时代划分为年,月等 [英] How to split epochs into year, month, etc

查看:192
本文介绍了如何将时代划分为年,月等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含很多时间列的数据框。我想为年,月,日等等每个时间添加列。



这是我到目前为止所有的:

 库(dplyr)
库(lubridate)

次< - c(133456789,143456789,144456789)
train2< - data.frame(sent_time = times,open_time = times)

time_col_names< - c(sent_time,open_time)
dt_part_names< - c ,month,hour,wday,day)

train3< - as.data.frame(train2)

dummy< lapply(time_col_names,function(col_name){
pct_times < - as.POSIXct(train3 [,col_name],origin =1970-01-01,tz =GMT)
lapply(dt_part_names ,function(part_name){
part_col_name< - paste(col_name,part_name,sep =_)
train3 [,part_col_name]< - rep(NA,nrow(train3))
train3 [,part_col_name]< - factor(get(part_name)(pct_times))
})
})

一切似乎都可以工作,除了列永远不会被创建或分配。组件确实被提取,并且分配没有错误地成功,但是train3没有任何新的列。



我已经检查了当我将它称为嵌套的上下文:

  train3 [,x]<  -  rep(NA,nrow(train3))

在这种情况下,列x确实得到创建。

解决方案

相比,适用家庭在性能方面提供了一个优势, / code>循环。但是,循环的 * apply()系列之间的最重要区别在于后者是设计为没有副作用



没有副作用有利于开发干净,结构良好且简洁的代码。如果希望具有副作用(通常是有缺陷的代码设计的症状),则会出现问题。



这是一个简单的例子来说明这个

 code> myvector<  -  10:1 
sapply(myvector,prod,2)
#[1] 20 18 16 14 12 10 8 6 4 2

看起来正确吗? sapply()循环似乎将 myvec 的条目乘以2(授予,这个结果可能已经实现了很容易,但这只是一个简单的例子来讨论 * apply()的功能。



检查但是,一个人意识到这个操作没有改变 myvector

 > myvector 
#[1] 10 9 8 7 6 5 4 3 2 1

因为 sapply()没有副作用来修改 myvector 。在这个例子中, sapply()循环相当于命令 print(myvector * 2),而不是 myvector< - myvector * 2 * apply()循环返回一个对象,但它们不会修改原始对象。



如果真的想要更改循环中的对象,超分派运算符< - 是必需的修改对象超出循环范围。在这种情况下,这几乎永远不会完成,事情变得非常丑陋。例如,以下循环确实会改变我的 myvector

  sapply seq_along(myvector),function(x)myvector [x]<  -  myvector [x] * 2)
> myvector
#[1] 20 18 16 14 12 10 8 6 4 2

R不应该这样。请注意,在这个更复杂的情况下,如果使用正常赋值运算符 < - 而不是< - 然后 myvector 保持不变。正确的方法是分配由 * apply 返回的对象,而不是在循环中修改它。



在OP描述的具体情况下,变量 dummy 可能包含所需的输出,如果循环是正确的。但是不能指望在循环中修改对象 train3 。为此,< - 运算符将是必需的。



fortunes :: fortune(212)可能总结出问题:


基本上R不愿让你拍摄自己在脚下,除非
你真的决心这样做。 - Bill Venables



I have a data frame containing many time columns. I want to add columns for each time for year, month, date, etc.

Here is what I have so far:

library(dplyr)
library(lubridate)

times <- c(133456789, 143456789, 144456789 ) 
train2 <- data.frame(sent_time = times, open_time = times)

time_col_names <- c("sent_time", "open_time")
dt_part_names <- c("year", "month", "hour", "wday", "day")

train3 <- as.data.frame(train2)

dummy <- lapply(time_col_names, function(col_name) { 
  pct_times <- as.POSIXct(train3[,col_name], origin = "1970-01-01", tz = "GMT")
  lapply(dt_part_names, function(part_name) {
    part_col_name <- paste(col_name, part_name, sep = "_")
    train3[, part_col_name] <- rep(NA, nrow(train3))
    train3[, part_col_name] <- factor(get(part_name)(pct_times))
  })
})

Everything seems to work, except the columns never get created or assigned. The components do get extracted, and the assignment succeeds without error, but train3 does not have any new columns.

I have checked that the assignment works when I call it outside the nested lapply context:

train3[, "x"] <- rep(NA, nrow(train3))

In this case, column x does get created.

解决方案

It is often believed that the apply family provides an advantage in terms of performance compared to a for loop. But the most important difference between a for loop and a loop from the *apply() family is that the latter is designed to have no side effects.

The absence of side effects favors the development of clean, well-structured, and concise code. A problem occurs if one wishes to have side effects, which is usually a symptom of a flawed code design.

Here is a simple example to illustrate this:

myvector <- 10:1
sapply(myvector,prod,2)
# [1] 20 18 16 14 12 10  8  6  4  2

It looks correct, right? The sapply() loop has seemingly multiplied the entries of myvec by two (granted, this result could have been achieved more easily, but this is just a simple example to discuss the functioning of *apply()).

Upon inspection, however, one realizes that this operation has not changed myvector at all:

> myvector
# [1] 10  9  8  7  6  5  4  3  2  1

That is because sapply() did not have the side effect to modify myvector. In this example the sapply() loop is equivalent to the command print(myvector*2), and not to myvector <- myvector * 2. The *apply() loops return an object, but they don't modify the original one.

If one really wants to change the object within the loop, the superassignment operator <<- is necessary to modify the object outside the scope of the loop. This should almost never be done, and things become quite ugly in this case. For example, the following loop does change my myvector:

sapply(seq_along(myvector), function(x) myvector[x] <<- myvector[x]*2)
> myvector
# [1] 20 18 16 14 12 10  8  6  4  2

Coding in R should not look like this. Note that also in this more convoluted case, if the normal assignment operator <- is used instead of <<- then myvector remains unchanged. The correct approach is to assign the object returned by *apply instead of modifying it within the loop.

In the specific case described by the OP, the variable dummy may contain the desired output if the commands in the loop are correct. But one cannot expect that the object train3 is modified within the loop. For this the <<- operator would be necessary.

A quote mentioned in fortunes::fortune(212) possibly summarizes the problem:

Basically R is reluctant to let you shoot yourself in the foot unless you are really determined to do so. -- Bill Venables

这篇关于如何将时代划分为年,月等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆