通过匹配列名称中的模式的data.frame的不同列来迭代函数 [英] Iterating a function through different columns of a data.frame matching a pattern in the column names
问题描述
我想通过data.frame中的不同列(具有列名称中的常见模式)迭代函数。
子集的data.frame我使用这个代码工作:
df [,grep(abc,但是我不知道如何应用我的函数f(x),但是我不知道如何使用函数f(x)到所有匹配这个模式的列,使用for循环或lapply函数。
我使用的函数是:
compress = function(x){
aggregate(df [,x,drop = FALSE],
list(hour = with(df, (日期(时间),
sprintf(%d:00:00,小时(时间))))),
sum,na.rm = TRUE)
}
其中df(数据框)和Time可以被设置为变量本身,但是目前我不需要这样做。
感谢
Giulia
解决方案你基本上已经知道了。只需在 apply
函数 f $ c>的子集数据的列上使用 apply
(code> apply >第二个参数中的 2
)表示列,而不是 1
,表示 apply over rows):
apply(df [,grep(abc,colnames(df))],2,f)
或者如果你不想强制你的 df
到矩阵
(这将会发生应用
),你可以用同样的方式使用 lapply
...
$ p $ lt; code> lapply(df [,grep(abc,colnames(df))],f)
从 lapply
的返回值是一个列表,每列有一个元素。您可以通过用数据包装
,例如 lapply
调用来将其重新转换为 data.frame
.frame
$ b $ h
$ $ b
#此函数将其参数乘以2
f< - function(x)x * 2
df< - data .frame(AB = runif(5),AC = runif(5),BB = runif(5))
apply(df [,grep(A,colnames(df ))],2,f)
#AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801
data.frame(lapply(df [,grep(A,colnames(df ))],f))
#AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801
#注意两个方法之间的重要区别...
class(data.frame(lapply(df [,grep(A,colnames(df)) ],f)))
#[1]data.frame
class(apply(df [,grep(A,colnames(df))],2,f))
#[1]matrix
第二次编辑
对于要运行的示例函数,可能会更容易把它重写为一个以 df
作为输入的函数,以及一个你想操作的列名向量。在这个例子中,函数返回一个列表,该列表的每个元素都包含一个聚合的 data.frame
:
<$ p $ x $ {
$ (df,paste(日期(时间),
sprintf(%d:00:00,hours(Time))))),
sum,na.rm = TRUE)
$ $
$ b
运行函数然后你只要调用它,传递data.frame和一个colnames向量...
$ $ p $ compress(df,names( df)[grep(abc,names(df))])
I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:
df[,grep("abc", colnames(df))]
but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.
the function I'm using is:
compress= function(x) {
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.
Thanks Giulia
You've basically got it. Just use apply
on the columns of your subsetted data to apply
function f
over columns (the 2
in the second argument of apply
indicates columns, as opposed to 1
which indicates to apply
over rows):
apply( df[,grep("abc", colnames(df))] , 2 , f )
Or if you don't want to coerce your df
to a matrix
(which will happen with apply
) you can use lapply
as you suggest in much the same manner...
lapply( df[,grep("abc", colnames(df))] , f )
The return value from lapply
will be a list, with one element for each column. You can turn this back into a data.frame
by wrapping the lapply
call with a data.frame
, e.g. data.frame( lapply(...) )
Example
# This function just multiplies its argument by 2
f <- function(x) x * 2
df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) )
apply( df[,grep("A", colnames(df))] , 2 , f )
# AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801
data.frame( lapply( df[,grep("A", colnames(df))] , f ) )
# AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801
# Note the important difference between the two methods...
class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) )
#[1] "data.frame"
class( apply( df[,grep("A", colnames(df))] , 2 , f ) )
#[1] "matrix"
Second edit
For the example function you want to run, it might be easier to rewrite it as a function that takes the df
as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame
:
compress= function( df , x ) {
lapply( x , function(x){
aggregate(df[,x,drop=FALSE],
list(hour = with(df,paste(dates(Time),
sprintf("%d:00:00",hours(Time))))),
sum,na.rm=TRUE)
}
)
}
To run the function you then just call it, passing it the data.frame and a vector of colnames...
compress( df , names(df)[ grep("abc", names(df) ) ] )
这篇关于通过匹配列名称中的模式的data.frame的不同列来迭代函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!