通过匹配列名称中的模式的data.frame的不同列来迭代函数 [英] Iterating a function through different columns of a data.frame matching a pattern in the column names

查看：146 发布时间：2018/1/27 23:18:56 r for-loop lapply

本文介绍了通过匹配列名称中的模式的data.frame的不同列来迭代函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过data.frame中的不同列（具有列名称中的常见模式）迭代函数。
子集的data.frame我使用这个代码工作：

  df [，grep（abc，但是我不知道如何应用我的函数f（x），但是我不知道如何使用函数f（x）到所有匹配这个模式的列，使用for循环或lapply函数。
 
 我使用的函数是：
 
 
  compress = function（x）{
 aggregate（df [，x，drop = FALSE]，
 list（hour = with（df， （日期（时间），
 sprintf（％d：00：00，小时（时间））））），
 sum，na.rm = TRUE）
} 
  
其中df（数据框）和Time可以被设置为变量本身，但是目前我不需要这样做。
 
 
 感谢
 Giulia 
解决方案
你基本上已经知道了。只需在 apply 函数 f 的子集数据的列上使用 apply  （code> apply >第二个参数中的 2 ）表示列，而不是 1 ，表示 apply  over rows）： 
 
 
  apply（df [，grep（abc，colnames（df））]，2，f）
  
或者如果你不想强制你的 df 到矩阵（这将会发生应用），你可以用同样的方式使用 lapply  ... 
 
 
 $ p $ lt; code> lapply（df [，grep（abc，colnames（df））]，f）

从 lapply 的返回值是一个列表，每列有一个元素。您可以通过用数据包装 lapply 调用来将其重新转换为 data.frame .frame ，例如 $ b $ h $ $ b

 ＃此函数将其参数乘以2 
f<  -  function（x）x * 2 
 
 df<  -  data .frame（AB = runif（5），AC = runif（5），BB = runif（5））
 
 
 apply（df [，grep（A，colnames（df ））]，2，f）
＃AB AC 
＃[1，] 0.4130628 1.3302304 
＃[2，] 0.2550633 0.1896813 
＃[3，] 1.5066157 0.7679393 
＃[4，] 1.7900907 0.5487673 
＃[5，] 0.7489256 1.6292801 
 
 
 data.frame（lapply（df [，grep（A，colnames（df ））]，f））
＃AB AC 
＃1 0.4130628 1.3302304 
＃2 0.2550633 0.1896813 
＃3 1.5066157 0.7679393 
＃4 1.7900907 0.5487673 
 ＃5 0.7489256 1.6292801 
 
＃注意两个方法之间的重要区别... 
 class（data.frame（lapply（df [，grep（A，colnames（df）） ]，f）））
＃[1]data.frame
 class（apply（df [，grep（A，colnames（df））]，2，f））
＃[1]matrix

`第二次编辑`

 
 
 对于要运行的示例函数，可能会更容易把它重写为一个以 df 作为输入的函数，以及一个你想操作的列名向量。在这个例子中，函数返回一个列表，该列表的每个元素都包含一个聚合的 data.frame ：
 
 
 <$ p $ x $ {
 
 
 
 
 
 
 
 $ （df，paste（日期（时间），
 sprintf（％d：00：00，hours（Time））））），
 sum，na.rm = TRUE）
 
 
 
 $ $ 
 $ b 运行函数然后你只要调用它，传递data.frame和一个colnames向量... 
 
 $ $ p $  compress（df，names（ df）[grep（abc，names（df））]）
  
 
I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. 
for subsetting the data.frame I use this code that works:
df[,grep("abc", colnames(df))]
but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.

the function I'm using is:
compress= function(x) {
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
}
where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.

Thanks
Giulia
 解决方案 
You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):
apply( df[,grep("abc", colnames(df))] , 2 , f )
Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...
lapply( df[,grep("abc", colnames(df))] , f )
The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )

Example

# This function just multiplies its argument by 2
f <- function(x) x * 2

df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )


apply( df[,grep("A", colnames(df))] , 2 , f )
#            AB        AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801


data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
#         AB        AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801

# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"


Second edit

For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:
compress= function( df , x ) {
  lapply( x , function(x){
  aggregate(df[,x,drop=FALSE],
        list(hour = with(df,paste(dates(Time),
                                         sprintf("%d:00:00",hours(Time))))),
        sum,na.rm=TRUE)
    }
  )
}
To run the function you then just call it, passing it the data.frame and a vector of colnames...
compress( df , names(df)[ grep("abc", names(df) ) ] ) 


                        
这篇关于通过匹配列名称中的模式的data.frame的不同列来迭代函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过匹配列名称中的模式的data.frame的不同列来迭代函数 [英] Iterating a function through different columns of a data.frame matching a pattern in the column names

问题描述

`第二次编辑`

Example

Second edit

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过匹配列名称中的模式的data.frame的不同列来迭代函数 [英] Iterating a function through different columns of a data.frame matching a pattern in the column names

问题描述

第二次编辑

Example

Second edit

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

`第二次编辑`

登录关闭