使用 Apply 或 Vectorize 将自定义函数应用于数据框 [英] Using Apply or Vectorize to apply custom function to a dataframe

查看:26
本文介绍了使用 Apply 或 Vectorize 将自定义函数应用于数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试应用一个自定义函数,该函数调用该数据帧的组件来进行计算.我在下面做了一个简单的例子,因为我的实际问题很难做出一个可重现的例子.在下面的示例中,我希望将前两列加在一起以创建第三列,即它们的总和.下面是我在网上找到的一个接近我想要的例子:

I am attempting to apply a custom function that calls components of that dataframe to do a calculation. I have made a trivial example below because my actual problem is very hard to make a reproducible example. In the below example I want to have the first two columns be added together to create a third column which is the sum of them. Below is an example I found online that gets close to what I want:

celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45))
f=function(x,output){
  name=x[1]
  income=x[3]
  cat(name,income,"\n")
}
apply(celebrities,1,f)

但是当我尝试使用它并应用数学函数时它不起作用:

But when I try to take it and apply mathematical function it doesn't work:

  f2=function(x,output){
  age=x[2]
  income=x[3]
  sum(age,income)
}
apply(celebrities,1,f2)

本质上,我需要的是申请获取数据集,使用该行中的值作为函数的输入遍历该数据集的每一行,并将第三列添加到具有函数结果的数据集.如果需要,请告诉我如何澄清这个问题.我已经参考了以下问题,但它们似乎对我不起作用.

In essence what I need is for apply to take a dataset, go through every row of that dataset using the values in that row as inputs into the function and add a third column to the dataset with the results of the function. Please let me know how I can clarify this question if needed. I have referred to the questions below, but they don't seem to work for me.

应用函数到矩阵或数据框的每一行

如何将 lapply 中的新值分配给列表中数据帧中的新列

调用在每一行数据帧上应用类似函数,每行有多个参数

推荐答案

对于请求的特定任务,它可以是

For the particular task requested it could be

celebrities$newcol <- with(celebrities, age + income)

+ 函数本质上是矢量化的.使用 applysum 是低效的.通过省略第一列可以大大简化 apply 的使用,因为这样可以避免由第一列引起的对字符矩阵的强制转换.

The + function is inherently vectorized. Using apply with sum is inefficient. Using apply could have been greatly simplified by omitting the first column because that would avoid the coercion to a character matrix caused by the first column.

 celebrities$newcol <- apply(celebrities[-1], function(x) sum(x) )

这样你就可以避免将向量强制转换为字符",然后需要将以前的数字列强制返回到 numeric.在 apply 中使用 sum 确实可以避免 sum 未向量化的事实,但它是 R 编码效率低下的一个例子.

That way you would avoid coercing the vectors to "character" and then needing to coerce back the formerly-numeric columns to numeric. Using sum inside apply does get around the fact that sum is not vectorized, but it's an example of inefficient R coding.

如果内部"算法可以完全由向量化函数构建,您将获得自动向量化:Math 和 Ops 组是常用组件.参见 ?Ops.否则,您可能需要使用 mapplyVectorize.

You get automatic vectorization if the "inner" algorithm can be constructed completely from vectorized functions: the Math and Ops groups being the usual components. See ?Ops. Otherwise, you may need to use mapply or Vectorize.

这篇关于使用 Apply 或 Vectorize 将自定义函数应用于数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆