主要dplyr功能在函数中 [英] Major dplyr functions in a function

查看:103
本文介绍了主要dplyr功能在函数中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看过一些关于如何用 dplyr 函数编写自己的函数的帖子。例如,您可以在 group_by(reroup)和汇总 //stackoverflow.com/questions/21815060/dplyr-how-to-use-group-by-inside-a-function\">这篇文章。我认为看到我是否可以使用主要的 dplyr 函数编写一个函数会很有趣。我希望我们可以进一步了解如何使用 dplyr 函数编写函数。



DATA

  country<  -  rep(c(UK,France),each = 5)
id < - rep(letters [1:5],times = 2)
value< - runif(10,50,100)
foo< - data.frame(country,id, value,stringsAsFactors = FALSE)

目标



我想在函数中写下面的过程。

  foo%>%
mutate(new = ifelse(value> 60,1,0))%>%
过滤器(%c(a,b,d)中的%%)%> %
group_by(country)%>%
总结(whatever = sum(value))

TRY

  ###这是一个执行相同进程

myFun< - function(x,ana,bob,cathy)x%>%
mutate(new = ifelse(ana> 60,1,0))%>%
过滤器(百分比%c(a,b ,d))%>%
重组(as.list(cathy))%>%
总结(whatever = sum(ana))

myFun (foo,value,id,country)

资料来源:本地数据框架[2 x 2]

国家任何
1法国233.1384
2 UK 245.5400

您可能会意识到 arrange()不在那里这是我正在努力的一个。这里有两个意见。第一个实验是成功的。这些国家的顺序从英国 - 法国改为法国 - 英国。但是第二个实验不成功。

  ###实验1:这适用于排列()

myFun< - function(x,ana)x%>%
arrange(ana)

myFun(foo,country)

country id value
1法国a 90.12723
2法国b 86.64229
3法国c 74.93320
4法国d 80.69495
5法国e 72.60077
6英国a 84.28033
7英国b 67.01209
8英国c 94.24756
9英国d 79.49848
10英国e 63.51265


###实验2:这不是成功的。

myFun< - function(x,ana,bob)x%>%
过滤器(%c(a,b,d)中的ana%) %>%
安排(bob)

myFun(foo,id,country)

错误:大小不正确(10),期待:6

###顺便说一句。
foo%>%
过滤器(%c中的%%(a,b,d))%>%
安排(国家)
鉴于第一个实验成功,我很难明白为什么第二次实验失败。任何人都有想法可能有什么必须做的第二个实验?感谢您抽出时间。

解决方案

实际上,您的实验不起作用,您将会遇到所有这些问题。看起来他们正在工作,因为您已经定义了向量 country id ,并没有删除它们。所以当你打电话给你的功能时,他们正在使用来自全球环境的向量。



为了显示这个,我们在调用你的函数之前删除这些向量:



创建向量和data.frame:

 库(dplyr)
国家< - rep(c(UK,France),each = 5)
id < - rep(letters [1:5],times = 2)
value < runif(10,50,100)
foo< - data.frame(country,id,value,stringsAsFactors = FALSE)

定义您的第一个函数:

  myFun<  -  function(x,ana,bob, catty)%%
mutate(new = ifelse(ana> 60,1,0))%>%
过滤器(%c(a,b ,d))%>%
重组(as.list(cathy))%>%
总结(whatever = sum(ana))

调用而不删除向量(它将看起来像它的工作原理,但实际上是使用来自全局env的向量):

  myFun(foo,val ue,id,country)
来源:本地数据框架[2 x 2]

国家任何
1法国208.1008
2英国192.4287

现在删除向量并调用函数(现在它不起作用,因为它找不到向量) :

  rm(country,id,value)
myFun(foo,value,id,country)




mutate_impl(.data,named_dots(...)),环境中的错误()):

对象'value'not found


所以这就解释了为什么你的安排示例不工作而其他的。您的第二个实验正在调用的向量是全局环境中的向量 country ,它具有10个元素。但是函数排列只能预料到6个元素,这是过滤向量的结果。



你有不同的策略来使你的函数工作。例如,请查看他的答案G.GGthendieck ,以了解如何做到这一点。或者等一下,正如Hadley指出的那样,在dplyr中的编程是即将推出的未来功能。


I have seen a couple of posts of how to write one's own function with dplyr functions. For example, you can see how you can use group_by (regroup) and summarise in this post. I thought that it would be interesting to see if I can write a function using major dplyr functions. My hope is that we can further understand how to write functions using dplyr functions.

DATA

country <- rep(c("UK", "France"), each = 5)
id <- rep(letters[1:5], times = 2)
value <- runif(10, 50, 100)
foo <- data.frame(country, id, value, stringsAsFactors = FALSE)

GOAL

I wanted to write the following process in a function.

foo %>%
    mutate(new = ifelse(value > 60, 1, 0)) %>%
    filter(id %in% c("a", "b", "d")) %>%
    group_by(country) %>%
    summarize(whatever = sum(value))

TRY

### Here is a function which does the same process

myFun <- function(x, ana, bob, cathy) x %>%
    mutate(new = ifelse(ana > 60, 1, 0)) %>%
    filter(bob %in% c("a", "b", "d")) %>%
    regroup(as.list(cathy)) %>%
    summarize(whatever = sum(ana))

myFun(foo, value, id, "country")

Source: local data frame [2 x 2]

  country whatever
1  France 233.1384
2      UK 245.5400

You may realise that arrange() is not there. This is the one I am struggling. Here are two observations. The first experiment was successful. The order of the countries changed from UK-France to France-UK. But the second experiment was not successful.

### Experiment 1: This works for arrange()

myFun <- function(x, ana) x %>%
         arrange(ana)

myFun(foo, country)

   country id    value
1   France  a 90.12723
2   France  b 86.64229
3   France  c 74.93320
4   France  d 80.69495
5   France  e 72.60077
6       UK  a 84.28033
7       UK  b 67.01209
8       UK  c 94.24756
9       UK  d 79.49848
10      UK  e 63.51265


### Experiment2: This was not successful.

myFun <- function(x, ana, bob) x %>%
         filter(ana %in% c("a", "b", "d")) %>%
         arrange(bob)

myFun(foo, id, country)

Error: incorrect size (10), expecting :6

### This works, by the way.
foo %>%
filter(id %in% c("a", "b", "d")) %>%
arrange(country)

Given the first experiment was successful, I have a hard time to understand why the second experiment failed. There may be something one has to do in the 2nd experimentDoes anybody have an idea? Thank you for taking your time.

解决方案

Actually, your experiments do not work, you will have scoping problems with all of them. It looks like they are working because you have defined the vectors country, id, and value on the Global Environment and did not remove them. So when you call your functions, they are using the vectors from the Global Environment.

To show this, let's remove those vectors before calling your functions:

Creating the vectors and data.frame:

library(dplyr)
country <- rep(c("UK", "France"), each = 5)
id <- rep(letters[1:5], times = 2)
value <- runif(10, 50, 100)
foo <- data.frame(country, id, value, stringsAsFactors = FALSE)

Defining your first function:

myFun <- function(x, ana, bob, cathy) x %>%
  mutate(new = ifelse(ana > 60, 1, 0)) %>%
  filter(bob %in% c("a", "b", "d")) %>%
  regroup(as.list(cathy)) %>%
  summarize(whatever = sum(ana))

Calling without removing the vectors (it will look like it works, but it is actually using the vectors from the global env):

myFun(foo, value, id, "country")
Source: local data frame [2 x 2]

  country whatever
1  France 208.1008
2      UK 192.4287

Now removing the vectors and calling your function (and now it does not work, for it can't find the vectors):

rm(country, id, value)
myFun(foo, value, id, "country")

Error in mutate_impl(.data, named_dots(...), environment()) :
object 'value' not found

So that explains why your arrange example did not work while the others did. The vector your second experiment was calling was the vector country on the Global Environment, which has 10 elements. But the function arrange was expecting only 6 elements, which is the result of the filtered vector.

You have different strategies to make your functions work. For example, take a look at this answer by G. Grothendieck to have some insights on how to do it. Or just wait a little, for as Hadley pointed out, programming in dplyr is a future feature coming soon.

这篇关于主要dplyr功能在函数中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆