之间有什么区别.和.data? [英] What is the difference between . and .data?

查看:83
本文介绍了之间有什么区别.和.data?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试更深入地理解将点(.")与 dplyr 一起使用,并将 .data 代词与 dplyr一起使用.我写的激励这篇文章的代码看起来像这样:

I'm trying to develop a deeper understanding of using the dot (".") with dplyr and using the .data pronoun with dplyr. The code I was writing that motivated this post, looked something like this:

cat_table <- tibble(
  variable = vector("character"), 
  category = vector("numeric"), 
  n        = vector("numeric")
) 

for(i in c("cyl", "vs", "am")) {
  cat_stats <- mtcars %>% 
    count(.data[[i]]) %>% 
    mutate(variable = names(.)[1]) %>%
    rename(category = 1)
  
  cat_table <- bind_rows(cat_table, cat_stats)
}

# A tibble: 7 x 3
  variable category     n
  <chr>       <dbl> <dbl>
1 cyl             4    11
2 cyl             6     7
3 cyl             8    14
4 vs              0    18
5 vs              1    14
6 am              0    19
7 am              1    13

该代码可以完成我想要的操作,而实际上并不是该问题的重点.我只是提供它作为背景.

The code does what I wanted it to do and isn’t really the focus of this question. I was just providing it for context.

我试图对为什么做我想做的事情有更深的了解.更具体地说,为什么我不能互换使用. .data .我已经阅读了使用dplyr编程一文,但我想我都认为. .data 只是表示我们到管道中的这一点的结果."但是,似乎好像我在简化关于它们如何工作的思维模型,因为当我在下面的 names()中使用 .data 时出现错误:

I'm trying to develop a deeper understanding of why it does what I want it to do. And more specifically, why I can't use . and .data interchangeably. I've read the Programming with dplyr article, but I guess in my mind, both . and .data just mean "our result up to this point in the pipeline." But, it appears as though I'm oversimplifying my mental model of how they work because I get an error when I use .data inside of names() below:

mtcars %>% 
  count(.data[["cyl"]]) %>% 
  mutate(variable = names(.data)[1])

Error: Problem with `mutate()` input `variable`.
x Can't take the `names()` of the `.data` pronoun
ℹ Input `variable` is `names(.data)[1]`.
Run `rlang::last_error()` to see where the error occurred.

当我在 count()内使用.时,得到了意外的结果(对我来说):

And I get an unexpected (to me) result when I use . inside of count():

mtcars %>% 
  count(.[["cyl"]]) %>% 
  mutate(variable = names(.)[1])

  .[["cyl"]]  n   variable
1          4 11 .[["cyl"]]
2          6  7 .[["cyl"]]
3          8 14 .[["cyl"]]

我怀疑与它有关,"请注意,.data不是数据帧;这是一个特殊的代名词,它使您可以直接使用.data $ x访问当前变量,也可以使用.data [[var]]间接访问当前变量.不要期望其他功能可以使用它."摘自使用dplyr编程一文.这告诉我什么 .data 不是-数据框-但是,我仍然不确定什么 .data 以及它与..

I suspect it has something to do with, "Note that .data is not a data frame; it’s a special construct, a pronoun, that allows you to access the current variables either directly, with .data$x or indirectly with .data[[var]]. Don’t expect other functions to work with it," from the Programming with dplyr article. This tells me what .data isn't -- a data frame -- but, I'm still not sure what .data is and how it differs from ..

我试图这样解决:

mtcars %>% 
  count(.data[["cyl"]]) %>% 
  mutate(variable = list(.data))

但是,结果< S3:rlang_data_pronoun> 对我来说并不意味着任何可以帮助我理解的东西.如果外面有人对此有更好的了解,我将不胜感激.谢谢!

But, the result <S3: rlang_data_pronoun> doesn't mean anything to me that helps me understand. If anybody out there has a better grasp on this, I would appreciate a brief lesson. Thanks!

推荐答案

首先,我认为 .data 的意图有些混乱,直到人们也考虑了其同级代词.env .

Up front, I think .data's intent is a little confusing until one also considers its sibling pronoun, .env.

. magrittr ::%>%设置和使用的点;因为 dplyr 重新导出它,所以它就在那里.每当您引用它时,它都是一个真实的对象,因此 names(.) nrow(.)等都可以正常工作.它确实反映了管道中这一点的数据.

The dot . is something that magrittr::%>% sets up and uses; since dplyr re-exports it, it's there. And whenever you reference it, it is a real object, so names(.), nrow(.), etc all work as expected. It does reflect data up to this point in the pipeline.

.data 是在 rlang 中定义的,目的是消除符号解析的歧义.与 .env 一起使用,它可以使您清楚地知道要解析特定符号的位置(在预期有歧义的情况下).我从 ?. data 是一个明确的对比:

.data, on the other hand, is defined within rlang for the purpose of disambiguating symbol resolution. Along with .env, it allows you to be perfectly clear on where you want a particular symbol resolved (when ambiguity is expected). From ?.data, I think this is a clarifying contrast:

disp <- 10
mtcars %>% mutate(disp = .data$disp * .env$disp)
mtcars %>% mutate(disp = disp * disp)

但是,如帮助页面中所述, .data (和 .env )只是一个代词".(我们有动词,所以现在我们也有了代词),所以它只是一个解释,用于解释应解析符号的整洁内部结构.这只是种提示.

However, as stated in the help pages, .data (and .env) is just a "pronoun" (we have verbs, so now we have pronouns too), so it is just a pointer to explain to the tidy internals where the symbol should be resolved. It's just a hint of sorts.

所以你的声明

. .data 只是表示我们在管道中的这一点的结果."

both . and .data just mean "our result up to this point in the pipeline."

是不正确的:.表示到目前为止的数据, .data 只是内部的声明性提示.

is not correct: . represents the data up to this point, .data is just a declarative hint to the internals.

考虑另一种考虑 .data 的方法:假设我们有两个函数完全消除了符号所针对的环境的歧义:

Consider another way of thinking about .data: let's say we have two functions that completely disambiguate the environment a symbol is referenced against:

  • get_internally ,此符号必须始终引用列名,如果该列不存在,它将不会与封闭环境联系;和
  • get_externally ,此符号必须始终在封闭环境中引用变量/对象,它永远不会与列匹配.
  • get_internally, this symbol must always reference a column name, it will not reach out to the enclosing environment if the column does not exist; and
  • get_externally, this symbol must always reference a variable/object in the enclosing environment, it will never match a column.

在这种情况下,翻译上面的示例,可能会使用

In that case, translating the above examples, one might use

disp <- 10
mtcars %>%
  mutate(disp = get_internally(disp) * get_externally(disp))

在这种情况下, get_internally 并非框架,这很明显,因此您不能调用 names(get_internally)并期望它会做一些有意义的事情(而不是 NULL ).就像 names(mutate).

In that case, it seems more obvious that get_internally is not a frame, so you can't call names(get_internally) and expect it to do something meaningful (other than NULL). It'd be like names(mutate).

因此,请勿将 .data 视为对象,而应将其视为消除符号环境歧义的机制.我认为它使用的 $ 既简短又易于使用,并且绝对具有误导性:它不是类似于 list 的列表或 environment 类对象,即使它被这样对待.

So don't think of .data as an object, think of it as a mechanism to disambiguate the environment of the symbol. I think the $ it uses is both terse/easy-to-use and absolutely-misleading: it is not a list-like or environment-like object, even if it is being treated as such.

顺便说一句:可以为 $ 编写任何S3方法,使任何分类的对象看起来像框架/环境:

BTW: one can write any S3 method for $ that makes any classed-object look like a frame/environment:

`$.quux` <- function(x, nm) paste0("hello, ", nm, "!")
obj <- structure(0, class = "quux")
obj$r2evans
# [1] "hello, r2evans!"
names(obj)
# NULL

(存在 $ 访问器并不总是表示该对象是框架/环境.)

(The presence of a $ accessor does not always mean the object is a frame/env.)

这篇关于之间有什么区别.和.data?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆