为什么as.character()在日期列表上返回整数? [英] Why does as.character() return an integer on a list of dates?

查看:156
本文介绍了为什么as.character()在日期列表上返回整数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我惊讶地发现R中的以下行为:

I was surprised to observe the following behavior in R:

as.character(c(Sys.Date()))
#> [1] "2018-02-05"

as.character(list(Sys.Date()))
#> [1] "17567"

为什么会这样?也就是说,显然 17567是 as.integer(Sys.Date)的结果,但是我不遵循为什么 as .character(list(Sys.Date()))应该最终调用 as.integer()

Why does this happen? That is, clearly the "17567" is the result of as.integer(Sys.Date), but I do not follow the logic for why as.character(list(Sys.Date())) should wind up invoking as.integer().

(通常将字符串视为整数可以归咎于未设置
options(stringsAsFactors = FALSE),但在这里似乎不是这种情况。)

(Usually strings being treated as integers can be blamed on not setting options(stringsAsFactors=FALSE), but that doesn't appear to be the case here.)

编辑:正如乔希所观察到的,这是由于潜在的行为as.vector,但我觉得没有什么更直观的了:

EDIT: As Josh observes, this is due to the underlying behavior of as.vector, but I do not find that any more intuitive:

as.vector(Sys.Date())
#> 17567
as.vector(Sys.Date(), "character")
#> "17567"

为什么? (是的,我相信日期会以整数形式存储在较低级的内部结构中,但是这种情况下在没有警告的情况下强制转换为文字整数对我来说似乎很奇怪)。

Why? (Yes, I believe dates are stored as integers in the lower-level internals, but this coercion to a literal integer in this circumstance without a warning seems surprising to me).

这也以更微妙的方式体现出来:

Also this manifests in more subtle ways:

tbl <- tibble:::as_data_frame(list(col1 = list(Sys.Date(), "stuff")))
df <- as.data.frame(tbl)
df
#>    col1
#> 1 17567
#> 2 stuff

df[1, 1]
#> [[1]]
#> [1] "2018-02-05"

请注意, data.frame 将日期显示为整数,而实际上它是一个列表列,而日期仍然是日期。

Note that the print method for data.frame is showing the date as an integer, when in fact it is a list column and the date is still the date.

目前尚不清楚在这种情况下,打印方法是怎么回事,以及为什么它显示了这种令人误解的数据表示形式。

It's not clear what is going on with the print method in this case, and why it shows such a misleading representation of the data.

编辑

Date类出人意料地掉落的其他例子数字基类型:

Other examples where Date class surprisingly falls off, exposing underlying numeric base type:

vapply(list(Sys.Date()), I, Sys.Date())
vapply(list(Sys.Date()), lubridate::as_date, Sys.Date())

和到目前为止我最喜欢的东西:

and my favorite so far:

unlist(list(Sys.Date()))

看来,带有 Date (和POSIX对象)的向量运算很脆弱;一个人应该专注于模式 / typeof ,而不是 class 预测向量的行为。

It appears that vector operations with Date (and POSIX objects) are fragile; one should focus on the mode / typeof and not class to anticipate how the vector will behave.

推荐答案

问题最终与功能 as.vector()的行为有关。 code>。

The issue ultimately has to do with the behavior of the function as.vector().

as.character()应用于列表时,它会看到一个<$ c $类的对象c>列表 (不是日期 类之一)。由于没有用于列表的 as.character()方法,因此将分派默认方法 as.character.default 。它执行以下操作:

When you apply as.character() to a list, it sees an object of class "list" (not one of class "Date"). Since there is no as.character() method for lists, the default method as.character.default gets dispatched. Its does the following:

as.character.default
# function (x, ...) 
# .Internal(as.vector(x, "character"))
# <bytecode: 0x0000000006793e88>
# <environment: namespace:base>

首先,它通过将数据对象强制为向量来准备它。直接在Date对象列表上运行 as.vector()依次显示,它是导致强制转换为整数然后转换为字符的原因。

As you can see, it first prepares the data object by coercing it to a vector. Running as.vector() directly on a list of Date objects shows, in turn, that it is what is producing the coercion to integer and then to character.

as.vector(list(Sys.Date()), "character")
# [1] "17567"






正如Carl所指出的,即使是准确的,上述解释也是如此,并不是很令人满意。要获得更完整的答案,需要查看在调用 .Internal(as.vector(x, character))所执行的C代码中到底发生了什么。所有相关的C代码都在源文件 coerce中。 c


As Carl points out, the explanation above, even if accurate, is not really satisfying. A more complete answer requires looking at what happens under the hood, in the C code executed by the call to .Internal(as.vector(x, "character")). All of the relevant C code is in the source file coerce.c.

首先是 do_asvector() ,它调用 ascommon() ,其中调用 coerceVector() 其中调用 coerceVectorList() ,然后最后是 coerceToString() coerceToString() 检查它正在处​​理的元素 typeof ,在我们的例子中,看到它是 REAL,切换到此代码块

First up is do_asvector() which calls ascommon() which calls coerceVector() which calls coerceVectorList() and then, finally, coerceToString(). coerceToString() examines the "typeof" the element it is processing, and in our case, seeing that it is a "REAL" switches to this code block:

case REALSXP:
PrintDefaults();
savedigits = R_print.digits; R_print.digits = DBL_DIG;/* MAX precision */
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
    SET_STRING_ELT(ans, i, StringFromReal(REAL(v)[i], &warn));
}
R_print.digits = savedigits;
break;

为什么它将块用于类型为 REALSXP ?因为那是R Date 对象的存储模式(通过执行 mode(Sys.Date())可以看到或 typeof(Sys.Date()))。

And why does it use the block for objects of with a typeof REALSXP? Because that's the storage mode of R Date objects (as can be seen by doing mode(Sys.Date()) or typeof(Sys.Date())).

实际情况是这样的:在上述事件链中,列表元素没有被某种方式捕获并视为 Date 对象,而在R函数调用和方法分派。而是将它们作为列表 (也称为 VECSXP )传递给一系列C函数。到那时,为时已晚,因为处理该列表的C函数对其元素的 Date 类一无所知。特别是,最终完成转换为字符的函数 coerceToCharacter()仅看到元素的存储模式,即REAL / numeric / double,并像对待它们一样处理它们。只是全部

The take-home is this: In the chain of events described above, the elements of the list are not somehow caught and treated as a "Date" objects while in the realm of R function calls and method dispatch. Instead, they get passed along as a "list" (aka VECSXP) to a series of C functions. And at that point, it's kind of too late, as the C functions that process that list know nothing about the "Date" class of its elements. In particular, the function that ultimately does the conversion to character, coerceToCharacter() only sees the elements' storage mode, which is REAL/numeric/double, and processes them as if that was all that they were.

这篇关于为什么as.character()在日期列表上返回整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆