dplyr期间的字符“"是什么.参考? [英] What does the dplyr period character "." reference?
问题描述
以下dplyr代码中的周期.
表示什么?
What does the period .
reference in the following dplyr code?:
(df <- as.data.frame(matrix(rep(1:5, 5), ncol=5)))
# V1 V2 V3 V4 V5
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
dplyr::mutate_each(df, funs(. == 5))
# V1 V2 V3 V4 V5
# 1 FALSE FALSE FALSE FALSE FALSE
# 2 FALSE FALSE FALSE FALSE FALSE
# 3 FALSE FALSE FALSE FALSE FALSE
# 4 FALSE FALSE FALSE FALSE FALSE
# 5 TRUE TRUE TRUE TRUE TRUE
这是所有列"的简写吗?这是.
特定的dplyr语法还是一般的R语法(如
Is this shorthand for "all columns"? Is this .
specific dplyr syntax or is it general R syntax (as discussed here)?
此外,为什么以下代码会导致错误?
Also, why does the following code result in an error?
dplyr::filter(df, . == 5)
# Error: object '.' not found
推荐答案
该点主要在dplyr中使用,而不是仅在mutate_each
,summarise_each
和do
中使用.在前两个(及其SE对应项)中,它是指funs
中的函数所应用到的所有列.在do
中,它引用了(可能分组的)data.frame,因此您可以通过使用.$xyz
引用名为"xyz"的列来引用单个列.
The dot is used within dplyr mainly (not exclusively) in mutate_each
, summarise_each
and do
. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs
are applied. In do
it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz
to reference a column named "xyz".
无法运行的原因
filter(df, . == 5)
是因为a)filter
不适用于例如mutate_each
之类的多列,并且b)您需要使用管道运算符%>%
(最初来自magrittr
).
is because a) filter
is not designed to work with multiple columns like mutate_each
for example and b) you would need to use the pipe operator %>%
(originally from magrittr
).
但是,当与管道运算符%>%
结合使用时,可以将其与filter
内部的rowSums
之类的函数一起使用:
However, you could use it with a function like rowSums
inside filter
when combined with the pipe operator %>%
:
> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found
> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
您还应该查看magrittr帮助文件:
You should also take a look at the magrittr help files:
library(magrittr)
help("%>%")
从帮助页面:
将lhs放置在rhs呼叫中的其他地方 通常,您会希望lhs到rhs呼叫的位置不同于第一个.为此,可以将点(.)用作占位符.例如,
y %>% f(x, .)
等效于f(x, y)
,而z %>% f(x, y, arg = .)
等效于f(x, y, arg = z)
.
Placing lhs elsewhere in rhs call Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example,
y %>% f(x, .)
is equivalent tof(x, y)
andz %>% f(x, y, arg = .)
is equivalent tof(x, y, arg = z)
.
使用点作为辅助目的
通常,除了lhs本身的值外,在rhs调用中还需要lhs的某些属性或属性.行或列的数量.在rhs调用中多次使用点占位符是完全有效的,但这是设计使然
在嵌套中使用它时,行为略有不同
函数调用.特别是,如果占位符仅用于
嵌套函数调用时,lhs也将作为第一个参数放置!
这样做的原因是,在大多数用例中,这种方法产生的效果最大.
可读的代码.例如,iris %>% subset(1:nrow(.) %% 2 == 0)
是
等效于iris %>% subset(., 1:nrow(.) %% 2 == 0)
,但略有不同
更紧凑.可以通过封装来否决此行为
大括号中的rhs.例如,1:10 %>% {c(min(.), max(.))}
是
等效于c(min(1:10), max(1:10))
.
Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0)
is
equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0)
but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))}
is
equivalent to c(min(1:10), max(1:10))
.
这篇关于dplyr期间的字符“"是什么.参考?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!