在R中的数据帧的每一行上应用一个函数 [英] Applying a function on each row of a data frame in R

查看:153
本文介绍了在R中的数据帧的每一行上应用一个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在R中对数据帧的每一行应用一些函数。

I would like to apply some function on each row of a dataframe in R.

该函数可以返回单行数据帧或没有任何内容(我猜'返回()'return nothing?)。

The function can return a single-row dataframe or nothing (I guess 'return ()' return nothing?).

我想将此函数应用于给定数据帧的每一行,并获得结果数据帧(可能较短,即具有比原始行少的行)。

I would like to apply this function on each of the rows of a given dataframe, and get the resulting dataframe (which is possibly shorter, i.e. has less rows, than the original one).

例如,如果原始数据框类似于:

For example, if the original dataframe is something like:

id size name
1  100  dave
2  200  sarah
3  50   ben

我使用的函数在数据帧(即单行数据帧)中获取一行,如果名称与勇敢 ,否则返回null,那么结果应该是:

And the function I'm using gets a row n the dataframe (i.e. a single-row dataframe), returns it as-is if the name rhymes with "brave", otherwise returns null, then the result should be:

id size name
1  100  dave

这个例子实际上是指过滤一个数据框,我很想得到这个任务的具体答案b甚至当一个辅助函数的结果(在单行上操作的函数)的结果可能是具有单个行的任意数据帧时,也是更一般的情况。请注意,即使在过滤的情况下,我想使用一些复杂的逻辑(不是简单的东西,如 $ size> 100 ,但是更复杂的条件由一个函数,让我们说 boo(single_row_df)

This example actually refers to filtering a dataframe, and I would love to get both an answer specific to this kind of task but also to a more general case when even the result of the helper function (the one that operates on a single row) may be an arbitrary data frame with a single row. Please note than even in the case of filtering, I would like to use some sophisticated logic (not something simple like $size>100, but a more complex condition that is checked by a function, let's say boo(single_row_df).

Ps
我迄今为止做了什么这些情况是使用 apply(df,MARGIN = 1)然后 do.call(rbind ...)我认为这会给我一些麻烦,我的数据框只有一行(我得到 do.call(rbind,filterd)中的错误:第二个参数必须是一个列表

P.s. What I have done so far in these cases is to use apply(df, MARGIN=1) then do.call(rbind ...) but I think it give me some trouble when my dataframe only has a single row (I get Error in do.call(rbind, filterd) : second argument must be a list)

更新

以下Stephen回复我做了以下:

Following Stephen reply I did the following:

ranges.filter <- function(ranges,boo) {
    subset(x=ranges,subset=!any(boo[start:end]))
}

然后调用范围。过滤器与一些范围数据框如下所示:

I then call ranges.filter with some ranges dataframe that looks like this:

start end
100   200
250   400
698   1520
1988  2147
...

和一些布尔向量

(TRUE,FALSE,TRUE,TRUE,TRUE,...)

我想从布尔向量过滤掉包含TRUE值的任何范围。例如,如果布尔向量为 FALSE ,则第一个范围 100 .. 200 将保留在数据框中位置 100 .. 200

I want to filter out any ranges that contain a TRUE value from the boolean vector. For example, the first range 100 .. 200 will be left in the data frame iff the boolean vector is FALSE in positions 100 .. 200.

这似乎是做这个工作,但是我收到一条警告,说数字表达式有53个元素:只有第一个使用

This seems to do the work, but I get a warning saying numerical expression has 53 elements: only the first used.

推荐答案

使用 lapply 而不是申请强制结果成为列表。

You may have to use lapply instead of apply to force the result to be a list.

> rhymesWithBrave <- function(x) substring(x,nchar(x)-2) =="ave"
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+                      if(rhymesWithBrave(dfr[i,"name"])) dfr[i,] else NULL,
+                      dfr))
  id size name
1  1  100 dave

但在这种情况下,子集将更合适:

But in this case, subset would be more appropriate:

> subset(dfr,rhymesWithBrave(name))
  id size name
1  1  100 dave

如果要在返回结果之前执行其他转换,可以返回上述 lapply 方法:

If you want to perform additional transformations before returning the result, you can go back to the lapply approach above:

> add100tosize <- function(x) within(x,size <- size+100)
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+                      if(rhymesWithBrave(dfr[i,"name"])) add100tosize(dfr[i,])
+                      else NULL,dfr))
  id size name
1  1  200 dave

或者,在这种简单的情况下,将该函数应用于子集的输出。

Or, in this simple case, apply the function to the output of subset.

> add100tosize(subset(dfr,rhymesWithBrave(name)))
  id size name
1  1  200 dave

更新:

要选择不在开始和结束之间的行,可以构造一个不同的函数:当将布尔/逻辑向量的结果相加时,将TRUE值转换为1,将FALSE值转换为0)

To select rows that do not fall between start and end, you might construct a different function (note: when summing result of boolean/logical vectors, TRUE values are converted to 1s and FALSE values are converted to 0s)

test <- function(x)
  rowSums(mapply(function(start,end,x) x >= start & x <= end,
                 start=c(100,250,698,1988),
                 end=c(200,400,1520,2147))) == 0

subset(dfr,test(size))

这篇关于在R中的数据帧的每一行上应用一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆