使用的错误含义。 dplyr函数内的简写 [英] Meaning of error using . shorthand inside dplyr function

查看:80
本文介绍了使用的错误含义。 dplyr函数内的简写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了 dplyr :: bind_rows 错误。这是一个非常琐碎的问题,因为我可以轻松解决它,但是我想了解错误消息的含义。



我有以下一些数据新英格兰各州的人口群体,我想将这些相同值的副本绑定到名称上,将其名称更改为新英格兰,以便我可以按名称分组并累加起来,从而为我提供各个州的价值,以及该地区的总体价值。

  df< ;-structure(list(name = c( CT, MA, ME, NH, RI, VT),
估计= c(501074,1057316,47369,76630 ,141206,27464)),
class = c( tbl_df, tbl, data.frame),row.names = c(NA,-6L))

我这样做是作为大量管道步骤的一部分,所以我不能只做 bind_rows(df,df%>%mutate(name = New England)) dplyr 提供了方便的是将数据帧从一个函数传递到下一个函数的简写,但是我不能



做什么起作用,并得到我想要的输出:

 库(tidyverse)

df%&%;%
#任意管道操作
mutate(name = str_to_lower(name))%>%
bind_rows(mutate(。,name = New England))%> %% b $ b group_by (名称)%>%
summarise(估计=总和(估计))
#> #小动作:7 x 2
#>名称估计
#> < chr> < dbl>
#> 1克拉501074
#> 2 ma 1057316
#> 3我47369
#> 4新英格兰1851059
#> 5 nh 76630
#> 6 ri 141206
#> 7 vt 27464

但是当我尝试对做同样的事情时。 速记,出现此错误:

  df%>% 
mutate(name = str_to_lower(name))%&%;%
bind_rows(。%>%mutate(name = New England))
#> bind_rows_(x,.id)中的错误:参数2必须是数据帧或命名的原子向量,而不是fseq /函数

就像我说的那样,第一种方法很好,但是我想理解该错误,因为我编写了很多多步管道代码。

magrittr 在这种情况下解析点的方式所致:

from ?'%>%'


将点占位符用作lhs



当点用作lhs时,
结果将是一个功能序列,即将
整个右侧链依次应用于其输入的功能。


为避免触发此问题,对lhs上的表达式进行任何修改都可以:

  df%&%;%
mutate(name = str_to_lower(name))%>%
bind_rows((。)%>%mutate(name =新英格兰))

df%&%;%
mutate(name = str_to_lower(name))%&%;%
bind_rows({。}%&%;%mutate( name = New England))

df%>%
mutate(name = str_to_lower(name))%>%
bind_rows(identity(。)%> ;%mutate(name = New England))

这里有一个建议可以完全避免该问题:

  df%>%
#任意管道操作
mutate(name = str_to_lower(name)) %>%
复制(2,。,simplify = FALSE)%>%
map_at(2,mutate_at, name,〜 New England)%>%
bind_rows

##小动作:12 x 2
#名称估计
#< chr> < dbl>
#1 ct 501074
#2 ma 1057316
#3 me 47369
#4 nh 76630
#5 ri 141206
#6 vt 27464
#7新英格兰501074
#8新英格兰1057316
#9新英格兰47369
#10新英格兰76630
#11新英格兰141206
#12新英格兰27464


I'm getting a dplyr::bind_rows error. It's a very trivial problem, because I can easily get around it, but I'd like to understand the meaning of the error message.

I have the following data of some population groups for New England states, and I'd like to bind on a copy of these same values with the name changed to "New England," so that I can group by name and add them up, giving me values for the individual states, plus an overall value for the region.

df <- structure(list(name = c("CT", "MA", "ME", "NH", "RI", "VT"), 
        estimate = c(501074, 1057316, 47369, 76630, 141206, 27464)),
        class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

I'm doing this as part of a much larger flow of piped steps, so I can't just do bind_rows(df, df %>% mutate(name = "New England")). dplyr gives the convenient . shorthand for a data frame being piped from one function to the next, but I can't use that to bind the data frame to itself in a way I'd like.

What does work and gets me the output I want:

library(tidyverse)

df %>%
  # arbitrary piped operation
  mutate(name = str_to_lower(name)) %>%
  bind_rows(mutate(., name = "New England")) %>%
  group_by(name) %>%
  summarise(estimate = sum(estimate))
#> # A tibble: 7 x 2
#>   name        estimate
#>   <chr>          <dbl>
#> 1 ct            501074
#> 2 ma           1057316
#> 3 me             47369
#> 4 New England  1851059
#> 5 nh             76630
#> 6 ri            141206
#> 7 vt             27464

But when I try to do the same thing with the . shorthand, I get this error:

df %>%
  mutate(name = str_to_lower(name)) %>%
  bind_rows(. %>% mutate(name = "New England"))
#> Error in bind_rows_(x, .id): Argument 2 must be a data frame or a named atomic vector, not a fseq/function

Like I said, doing it the first way is fine, but I'd like to understand the error because I write a lot of multi-step piped code.

解决方案

As @aosmith noted in the comments it's due to the way magrittr parses the dot in this case :

from ?'%>%':

Using the dot-place holder as lhs

When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input.

To avoid triggering this, any modification of the expression on the lhs will do:

df %>%
  mutate(name = str_to_lower(name)) %>%
  bind_rows((.) %>% mutate(name = "New England"))

df %>%
  mutate(name = str_to_lower(name)) %>%
  bind_rows({.} %>% mutate(name = "New England"))

df %>%
  mutate(name = str_to_lower(name)) %>%
  bind_rows(identity(.) %>% mutate(name = "New England"))

Here's a suggestion that avoid the problem altogether:

df %>%
  # arbitrary piped operation
  mutate(name = str_to_lower(name)) %>%
  replicate(2,.,simplify = FALSE) %>%
  map_at(2,mutate_at,"name",~"New England") %>%
  bind_rows

# # A tibble: 12 x 2
#    name        estimate
#    <chr>          <dbl>
#  1 ct            501074
#  2 ma           1057316
#  3 me             47369
#  4 nh             76630
#  5 ri            141206
#  6 vt             27464
#  7 New England   501074
#  8 New England  1057316
#  9 New England    47369
# 10 New England    76630
# 11 New England   141206
# 12 New England    27464

这篇关于使用的错误含义。 dplyr函数内的简写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆