如何使用管道传递数据框列作为函数中的参数? [英] How to pass a dataframe column as an argument in a function using piping?

查看:137
本文介绍了如何使用管道传递数据框列作为函数中的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在弄乱R中的内置数据集economics,并且我试图将数据框列作为参数传递给使用管道的函数( top_n()在我的自定义函数中.以下是我如何将人口最多的5个国家(没有自定义功能)划分为子集的方法:

I'm messing around with the built-in dataset economics in R, and I'm trying to pass a dataframe column as an argument in a function that uses piping (dplyr, %>%). But I'm experiencing some seemingly strange problems. Somehow I can't successfully pass a column name as an argument to the function top_n() within my custom function. Here's how I would subset the 5 countries with the biggest population without a custom functon:

代码1:

library(dplyr)

df_econ <- economics
df_top_5 <- df_econ %>% top_n(5, pop)
df_top_5

输出1:

2014-12-01  12122.0 320201  5.0 12.6    8688
2015-01-01  12080.8 320367  5.5 13.4    8979
2015-02-01  12095.9 320534  5.7 13.1    8705
2015-03-01  12161.5 320707  5.2 12.2    8575
2015-04-01  12158.9 320887  5.6 11.7    8549

包装到自定义函数中后,它看起来可能像这样:

Wrapped into a custom function, it could look like this:

代码2:

library(dplyr)

# data
data(economics)
df_econ <- economics

# custom function
fxtop <- function(df, number, column){

  tops <- df %>% top_n(number, column)
  return(tops)
}

# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5

输出2:

1967-07-01  507.4   198712  12.5    4.5 2944
1967-08-01  510.5   198911  12.5    4.7 2945
1967-09-01  516.3   199113  11.7    4.6 2958
1967-10-01  512.9   199311  12.5    4.9 3143
1967-11-01  518.1   199498  12.5    4.7 3066
1967-12-01  525.8   199657  12.1    4.8 3018
1968-01-01  531.5   199808  11.7    5.1 2878
1968-02-01  534.2   199920  12.2    4.5 3001
1968-03-01  544.9   200056  11.6    4.1 2877
1968-04-01  544.6   200208  12.2    4.6 2709

此输出有10行,而不是预期的5行.我怀疑参数number=5只是被忽略了,而实际使用的数字默认为10.数据似乎也没有按'pop'排序.

This output has 10 rows and not 5 as expected. I suspect that the argument number=5 is simply ignored and that the number that is actually used is defaulted to 10. The data does not seem to be sorted by 'pop' either.

到目前为止,我已经尝试过:

尝试1:自定义功能中的popnumber硬代码:

Attempt 1: hard-code pop and number within the custom function:

library(dplyr)

# data
data(economics)
df_econ <- economics

# custom function
fxtop <- function(df, number, column){

  tops <- df %>% top_n(5, pop)
  return(tops)
}

# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5

尝试1:输出:

2014-12-01  12122.0 320201  5.0 12.6    8688
2015-01-01  12080.8 320367  5.5 13.4    8979
2015-02-01  12095.9 320534  5.7 13.1    8705
2015-03-01  12161.5 320707  5.2 12.2    8575
2015-04-01  12158.9 320887  5.6 11.7    8549

尝试1:评论

这是所需的输出!

让我们看看当我通过函数传递变量时会发生什么情况

Let's see what happens when I'm passing the variables through the function

尝试2:将变量作为对象而不是字符串:

Attempt 2: pass variables as object instead of string:

library(dplyr)

# data
data(economics)
df_econ <- economics

# custom function
fxtop <- function(df, number, column){

  tops <- df %>% top_n(5, column)
  return(tops)
}

# build a df using custom function
df_top_5 <- fxtop(df=df_econ, number=5, column='pop')
df_top_5

尝试2:输出:

现在输出与第一个示例相同.这两个变量似乎都被忽略了.

Now the output is the same as in the first example. Both variables are seemingly ignored.

那么,有什么建议吗?

推荐答案

我们可以对curl-curly({{}})使用非标准评估

We can use non-standard evaluation with curly-curly ({{}})

library(dplyr)
library(rlang)

fxtop <- function(df, number, column){
   tops <- df %>% top_n(number, {{column}})
   return(tops)
}

并传递未加引号的变量名

and pass unquoted variable names

fxtop(df=df_econ, number=5, pop)

#   date        pce     pop psavert uempmed unemploy
#  <date>      <dbl>   <dbl>   <dbl>   <dbl>    <dbl>
#1 2014-12-01 12062  319746.     7.6    12.9     8717
#2 2015-01-01 12046  319929.     7.7    13.2     8903
#3 2015-02-01 12082. 320075.     7.9    12.9     8610
#4 2015-03-01 12158. 320231.     7.4    12       8504
#5 2015-04-01 12194. 320402.     7.6    11.5     8526


如果您想将列名作为字符串(用引号引起来)传递,我们可以将sym!!

fxtop <- function(df, number, column){
  tops <- df %>% top_n(number, !!sym(column))
  return(tops)
}
fxtop(df=df_econ, number=5, 'pop')

这篇关于如何使用管道传递数据框列作为函数中的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆