通过匹配列名称中的模式的data.frame的不同列来迭代函数 [英] Iterating a function through different columns of a data.frame matching a pattern in the column names

查看:146
本文介绍了通过匹配列名称中的模式的data.frame的不同列来迭代函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过data.frame中的不同列(具有列名称中的常见模式)迭代函数。
子集的data.frame我使用这个代码工作:

  df [,grep(abc,但是我不知道如何应用我的函数f(x),但是我不知道如何使用函数f(x)到所有匹配这个模式的列,使用for循环或lapply函数。

我使用的函数是:

  compress = function(x){
aggregate(df [,x,drop = FALSE],
list(hour = with(df, (日期(时间),
sprintf(%d:00:00,小时(时间))))),
sum,na.rm = TRUE)
}

其中df(数据框)和Time可以被设置为变量本身,但是目前我不需要这样做。



感谢
Giulia

解决方案

你基本上已经知道了。只需在 apply 函数 f 的子集数据的列上使用 apply (code> apply >第二个参数中的 2 )表示列,而不是 1 ,表示 apply over rows):

  apply(df [,grep(abc,colnames(df))],2,f)

或者如果你不想强制你的 df 矩阵(这将会发生应用),你可以用同样的方式使用 lapply ...



$ p $ lt; code> lapply(df [,grep(abc,colnames(df))],f)

lapply 的返回值是一个列表,每列有一个元素。您可以通过用数据包装 lapply 调用来将其重新转换为 data.frame .frame ,例如

$ b $ h
$ $ b

 #此函数将其参数乘以2 
f< - function(x)x * 2

df< - data .frame(AB = runif(5),AC = runif(5),BB = runif(5))


apply(df [,grep(A,colnames(df ))],2,f)
#AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame(lapply(df [,grep(A,colnames(df ))],f))
#AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

#注意两个方法之间的重要区别...
class(data.frame(lapply(df [,grep(A,colnames(df)) ],f)))
#[1]data.frame
class(apply(df [,grep(A,colnames(df))],2,f))
#[1]matrix



第二次编辑



对于要运行的示例函数,可能会更容易把它重写为一个以 df 作为输入的函数,以及一个你想操作的列名向量。在这个例子中,函数返回一个列表,该列表的每个元素都包含一个聚合的 data.frame



<$ p $ x $ {







$ (df,paste(日期(时间),
sprintf(%d:00:00,hours(Time))))),
sum,na.rm = TRUE)



$ $
$ b

运行函数然后你只要调用它,传递data.frame和一个colnames向量...

$ $ p $ compress(df,names( df)[grep(abc,names(df))])


I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:

df[,grep("abc", colnames(df))]

but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.

the function I'm using is:

compress= function(x) {
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
}

where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.

Thanks Giulia

解决方案

You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):

apply( df[,grep("abc", colnames(df))] , 2 , f )

Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...

lapply( df[,grep("abc", colnames(df))] , f )

The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )

Example

# This function just multiplies its argument by 2
f <- function(x) x * 2

df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )


apply( df[,grep("A", colnames(df))] , 2 , f )
#            AB        AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
#         AB        AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"

Second edit

For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:

compress= function( df , x ) {
  lapply( x , function(x){
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
    }
  )
}

To run the function you then just call it, passing it the data.frame and a vector of colnames...

compress( df , names(df)[ grep("abc", names(df) ) ] ) 

这篇关于通过匹配列名称中的模式的data.frame的不同列来迭代函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆