之间有什么区别.和.data? [英] What is the difference between . and .data?
问题描述
我正在尝试更深入地理解将点(.")与 dplyr
一起使用,并将 .data
代词与 dplyr一起使用
.我写的激励这篇文章的代码看起来像这样:
I'm trying to develop a deeper understanding of using the dot (".") with dplyr
and using the .data
pronoun with dplyr
. The code I was writing that motivated this post, looked something like this:
cat_table <- tibble(
variable = vector("character"),
category = vector("numeric"),
n = vector("numeric")
)
for(i in c("cyl", "vs", "am")) {
cat_stats <- mtcars %>%
count(.data[[i]]) %>%
mutate(variable = names(.)[1]) %>%
rename(category = 1)
cat_table <- bind_rows(cat_table, cat_stats)
}
# A tibble: 7 x 3
variable category n
<chr> <dbl> <dbl>
1 cyl 4 11
2 cyl 6 7
3 cyl 8 14
4 vs 0 18
5 vs 1 14
6 am 0 19
7 am 1 13
该代码可以完成我想要的操作,而实际上并不是该问题的重点.我只是提供它作为背景.
The code does what I wanted it to do and isn’t really the focus of this question. I was just providing it for context.
我试图对为什么做我想做的事情有更深的了解.更具体地说,为什么我不能互换使用.
和 .data
.我已经阅读了使用dplyr编程一文,但我想我都认为.
和 .data
只是表示我们到管道中的这一点的结果."但是,似乎好像我在简化关于它们如何工作的思维模型,因为当我在下面的 names()
中使用 .data
时出现错误:>
I'm trying to develop a deeper understanding of why it does what I want it to do. And more specifically, why I can't use .
and .data
interchangeably. I've read the Programming with dplyr article, but I guess in my mind, both .
and .data
just mean "our result up to this point in the pipeline." But, it appears as though I'm oversimplifying my mental model of how they work because I get an error when I use .data
inside of names()
below:
mtcars %>%
count(.data[["cyl"]]) %>%
mutate(variable = names(.data)[1])
Error: Problem with `mutate()` input `variable`.
x Can't take the `names()` of the `.data` pronoun
ℹ Input `variable` is `names(.data)[1]`.
Run `rlang::last_error()` to see where the error occurred.
当我在 count()
内使用.
时,得到了意外的结果(对我来说):
And I get an unexpected (to me) result when I use .
inside of count()
:
mtcars %>%
count(.[["cyl"]]) %>%
mutate(variable = names(.)[1])
.[["cyl"]] n variable
1 4 11 .[["cyl"]]
2 6 7 .[["cyl"]]
3 8 14 .[["cyl"]]
我怀疑与它有关,"请注意,.data不是数据帧;这是一个特殊的代名词,它使您可以直接使用.data $ x访问当前变量,也可以使用.data [[var]]间接访问当前变量.不要期望其他功能可以使用它."摘自使用dplyr编程一文.这告诉我什么 .data
不是-数据框-但是,我仍然不确定什么 .data
是以及它与.
.
I suspect it has something to do with, "Note that .data is not a data frame; it’s a special construct, a pronoun, that allows you to access the current variables either directly, with .data$x or indirectly with .data[[var]]. Don’t expect other functions to work with it," from the Programming with dplyr article. This tells me what .data
isn't -- a data frame -- but, I'm still not sure what .data
is and how it differs from .
.
我试图这样解决:
mtcars %>%
count(.data[["cyl"]]) %>%
mutate(variable = list(.data))
但是,结果< S3:rlang_data_pronoun>
对我来说并不意味着任何可以帮助我理解的东西.如果外面有人对此有更好的了解,我将不胜感激.谢谢!
But, the result <S3: rlang_data_pronoun>
doesn't mean anything to me that helps me understand. If anybody out there has a better grasp on this, I would appreciate a brief lesson. Thanks!
推荐答案
首先,我认为 .data
的意图有些混乱,直到人们也考虑了其同级代词.env
.
Up front, I think .data
's intent is a little confusing until one also considers its sibling pronoun, .env
.
点.
是 magrittr ::%>%
设置和使用的点;因为 dplyr
重新导出它,所以它就在那里.每当您引用它时,它都是一个真实的对象,因此 names(.)
, nrow(.)
等都可以正常工作.它确实反映了管道中这一点的数据.
The dot .
is something that magrittr::%>%
sets up and uses; since dplyr
re-exports it, it's there. And whenever you reference it, it is a real object, so names(.)
, nrow(.)
, etc all work as expected. It does reflect data up to this point in the pipeline.
.data
是在 rlang
中定义的,目的是消除符号解析的歧义.与 .env
一起使用,它可以使您清楚地知道要解析特定符号的位置(在预期有歧义的情况下).我从 ?. data
是一个明确的对比:
.data
, on the other hand, is defined within rlang
for the purpose of disambiguating symbol resolution. Along with .env
, it allows you to be perfectly clear on where you want a particular symbol resolved (when ambiguity is expected). From ?.data
, I think this is a clarifying contrast:
disp <- 10
mtcars %>% mutate(disp = .data$disp * .env$disp)
mtcars %>% mutate(disp = disp * disp)
但是,如帮助页面中所述, .data
(和 .env
)只是一个代词".(我们有动词,所以现在我们也有了代词),所以它只是一个解释,用于解释应解析符号的整洁内部结构.这只是种提示.
However, as stated in the help pages, .data
(and .env
) is just a "pronoun" (we have verbs, so now we have pronouns too), so it is just a pointer to explain to the tidy internals where the symbol should be resolved. It's just a hint of sorts.
所以你的声明
.
和.data
只是表示我们在管道中的这一点的结果."
both
.
and.data
just mean "our result up to this point in the pipeline."
是不正确的:.
表示到目前为止的数据, .data
只是内部的声明性提示.
is not correct: .
represents the data up to this point, .data
is just a declarative hint to the internals.
考虑另一种考虑 .data
的方法:假设我们有两个函数完全消除了符号所针对的环境的歧义:
Consider another way of thinking about .data
: let's say we have two functions that completely disambiguate the environment a symbol is referenced against:
-
get_internally
,此符号必须始终引用列名,如果该列不存在,它将不会与封闭环境联系;和 -
get_externally
,此符号必须始终在封闭环境中引用变量/对象,它永远不会与列匹配.
get_internally
, this symbol must always reference a column name, it will not reach out to the enclosing environment if the column does not exist; andget_externally
, this symbol must always reference a variable/object in the enclosing environment, it will never match a column.
在这种情况下,翻译上面的示例,可能会使用
In that case, translating the above examples, one might use
disp <- 10
mtcars %>%
mutate(disp = get_internally(disp) * get_externally(disp))
在这种情况下, get_internally
并非框架,这很明显,因此您不能调用 names(get_internally)
并期望它会做一些有意义的事情(而不是 NULL
).就像 names(mutate)
.
In that case, it seems more obvious that get_internally
is not a frame, so you can't call names(get_internally)
and expect it to do something meaningful (other than NULL
). It'd be like names(mutate)
.
因此,请勿将 .data
视为对象,而应将其视为消除符号环境歧义的机制.我认为它使用的 $
既简短又易于使用,并且绝对具有误导性:它不是类似于 list
的列表或 environment
类对象,即使它被这样对待.
So don't think of .data
as an object, think of it as a mechanism to disambiguate the environment of the symbol. I think the $
it uses is both terse/easy-to-use and absolutely-misleading: it is not a list
-like or environment
-like object, even if it is being treated as such.
顺便说一句:可以为 $
编写任何S3方法,使任何分类的对象看起来像框架/环境:
BTW: one can write any S3 method for $
that makes any classed-object look like a frame/environment:
`$.quux` <- function(x, nm) paste0("hello, ", nm, "!")
obj <- structure(0, class = "quux")
obj$r2evans
# [1] "hello, r2evans!"
names(obj)
# NULL
(存在 $
访问器并不总是表示该对象是框架/环境.)
(The presence of a $
accessor does not always mean the object is a frame/env.)
这篇关于之间有什么区别.和.data?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!