意外的"rbind.fill"行为 [英] Unexpected "rbind.fill" behavior
问题描述
我对Hadley的"rbind.fill"功能的行为感到困惑.我有一个数据帧列表,我想对它进行简单的rbind操作,但是rbind.fill函数给我的结果我无法解释.请注意,"rbind"函数的确给了我期望的输出.这是最小的示例:
I'm confused about the behavior of Hadley's "rbind.fill" function. I have a list of data frames I would like to do a simple rbind operation on, but the rbind.fill function is giving me results that I cannot explain. Note that the "rbind" function does give me the output I expect. Here is the minimal example:
library(reshape)
data1 <- structure(list(DATE = structure(c(1277859600, 1277856000), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), BACK = c(0, -1)), .Names = c("DATE",
"BACK"), row.names = 1:2, class = "data.frame")
data2 <- structure(list(DATE = structure(c(1277856000, 1277852400), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), BACK = c(0, -1)), .Names = c("DATE",
"BACK"), row.names = 1:2, class = "data.frame")
bind1 <- rbind.fill(list(data1, data2))
bind2 <- rbind(data1, data2)
data1
data2
bind1
bind2
DATE BACK
1 2010-06-30 01:00:00 0
2 2010-06-30 00:00:00 -1
DATE BACK
1 2010-06-30 00:00:00 0
2 2010-06-29 23:00:00 -1
DATE BACK
1 2010-06-29 18:00:00 0
2 2010-06-29 17:00:00 -1
3 2010-06-29 17:00:00 0
4 2010-06-29 16:00:00 -1
DATE BACK
1 2010-06-30 01:00:00 0
2 2010-06-30 00:00:00 -1
3 2010-06-30 00:00:00 0
4 2010-06-29 23:00:00 -1
如您所见,包含 rbind.fill
输出的 bind1
在 DATE
列中创建了新的时间,甚至没有原始数据集.这是预期的行为吗?我知道我可以简单地使用绑定<-do.call(rbind,list(data1,data2))
绑定我拥有的5000多个数据帧,但是谁能说出上述行为?
谢谢你.
So as you can see, bind1
which contains the rbind.fill
output creates new times in the DATE
column that were not even in the original dataset. Is this expected behavior? I am aware that I can simply use
bind <- do.call(rbind, list(data1, data2))
to bind the 5000 + dataframes I have, but can anyone speak to the aforementioned behavior?
Thank you.
正如@DWin在下面指出的那样,这不是rbind.fill函数本身的问题,而是这样的事实,即输出中的时间以太平洋时间打印,但采用GMT格式.
As @DWin pointed out below, this was not a problem with the rbind.fill function itself, but the fact that in the output the times were being printed in Pacific time, but were in GMT format.
SessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tcltk grid stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] tcltk2_1.1-5 reshape_0.8.4 plyr_1.4 proto_0.3-9.1
loaded via a namespace (and not attached):
[1] ggplot2_0.8.9 tools_2.12.1
推荐答案
您最有可能看到的是print.POSIXct与计算机时区设置交互的行为.对于这两个函数调用,我得到的输出完全相同.
Most likely what you are seeing is the behavior of print.POSIXct interacting with timezone settings on your machine. I get exactly the same output for the two function calls.
> rbind.fill(list(data1,data2)) == rbind(data1,data2)
DATE BACK
1 TRUE TRUE
2 TRUE TRUE
3 TRUE TRUE
4 TRUE TRUE
> identical( rbind.fill(list(data1,data2)) , rbind(data1,data2) )
[1] TRUE
我可以肯定的是,格林尼治标准时间默认情况下,POSIXct时间是默认的.请注意, as.POSIXt
具有tz参数:
I'm reasonably sure that POSIXct times are by default in GMT. Note that as.POSIXt
has a tz argument:
tz A timezone specification to be used for the conversion, if one is required.
System-specific (see time zones), but "" is the current timezone, and "GMT" is
UTC (Universal Time, Coordinated).
如果键入?locales
,则将看到获取和设置语言环境设置的功能,尽管这些功能因操作系统而异,所以我在Mac上的使用体验可能与您在其他OS上的使用情况不符.我尝试使用Date类而不是POSIX类,但这只是因为我对添加的时间级别细节没有特别的需要.您可能需要检查 chron
和 lubridate
软件包中的其他功能.
If you type ?locales
, you will see the functions to get and set locale settings although these vary from OS to OS, so my experience on a Mac may not match yours on a different OS. I try to use Date class rather than POSIX classes, but that is just because I have no particular need for the added time level detail. There are additional functions in the chron
and lubridate
packages that you may want to examine.
这篇关于意外的"rbind.fill"行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!