使用Reduce将多个数据框与传递参数合并,并且不在Reduce外部定义函数(语法) [英] Using Reduce to merge multiple data frames with passing arguments and without defining function outside the Reduce (syntax)
问题描述
我列出了一些与下面生成的对象相对应的数据集:
data(AirPassengers ); data(mtcars)
lstDta< - list(dtaCars1 = mtcars,dtaCars2 = mtcars,
dtaCars3 = mtcars,dtaOtherStuff = AirPassengers)
我想在 row.names
上合并 cars 数据集
结果
结果应该对应于数据框:
res< - merge(
x = merge(x = lstDta $ dtaCars1,y = lstDta $ dtaCars2,by =row.names),
y = lstDta $ dtaCars3,by .y =row.names,
by.x =Row.names)
< (理想情况下,我会放弃 Row.names
> 变量,但这并不妨碍我):
> dim(res)
[1] 32 34
问题
我希望使用 Reduce
获得相同的结果,特别是我感兴趣的:
row.names
其他要求
非常有用的答案建议定义一个函数外部减少,在代码的行:
merge.all< - 函数(x,y){
merge(x,y,all = TRUE,by =Sample)
}
输出< - Reduce(merge.all,DataList )
我想避免定义 Reduce
语法。
尝试
如下图所示,我想将 Reduce
:
dtaMrgd< - Reduce(f = function(x,y){合并(x,y,by =row.names)},
lapply(lstDta [grepl(Cars,names(lstDta))== TRUE]))
$ c $
所以 Reduce
会做两件事:
- 根据匹配的名称使用字符串过滤传递的列表
- 使用过滤的对象应用具有所需特性的合并函数
针头说,上面的代码失败。
笔记
我特别感兴趣的是一种格式为 res< - 减少(...)
。我不想在 Reduce()
之外创建一些额外的对象/函数。
我认为这是实现您想要的一种方式:
res2 < - <减少(函数(x,y){
data.frame(merge(x,y,by = 0),row.names = row.names(x))[, - 1]
}, lstDta [grep(Cars,names(lstDta))])
dim(res2)
#[1] 32 33
names(res2)
#[1]mpg.xcyl.xdisp.xhp.xdrat.xwt.xqsec.xvs.xam。 xgear.xcarb.x
#[12]mpg.ycyl.ydisp.yhp.ydrat.ywt.y qsec.yvsyam.ygear.ycarb.y
#[23]mpgcyldisphpdratwt qsec与amgearcarb
- 要过滤输入列表,我使用
lstDta [grep(Cars,names(lstDta))]
- 我删除每个合并结果的第一列(
Row.names
)与[, - 1]
- 您可以使用
by = 0
作为row.names
的同义词,以避免尝试合并到Row.names
如果没有显式的data.frame(...,row.names = row),那么
row.names
.names(x)),合并将删除原来的
mtcars
行名称并将它们替换为默认的 1:nrow (x)的
。这对于后续的合并调用是有问题的。 从helpfile ?merge
,
要合并的列可以通过名称,数字或逻辑
vector:名称row.names或数字0指定行名称。
如果按名称指定,它必须唯一对应于
中的命名列输入。
I've a a list with a number of data set that corresponds to the object generated below:
data("AirPassengers"); data("mtcars")
lstDta <- list(dtaCars1 = mtcars, dtaCars2 = mtcars,
dtaCars3 = mtcars, dtaOtherStuff = AirPassengers)
I would like to merge the cars data sets on row.names
Results
The results should correspond to the data frame:
res <- merge(
x = merge(x = lstDta$dtaCars1, y = lstDta$dtaCars2, by = "row.names"),
y = lstDta$dtaCars3, by.y = "row.names",
by.x = "Row.names")
where the columns are joined using row.names
(ideally, I would drop the Row.names
variable but this doesn't bother me):
> dim(res)
[1] 32 34
Problem
I want to achieve the same results making use of Reduce
, in particular I am interested in:
- Merging the data frames on the
row.names
- Filtering the list. For example, I want to merge the cars data only and ignore the other data set
Additional requirements
Very useful answer suggests defining a function outside reduce, on the lines of the code:
merge.all <- function(x, y) { merge(x, y, all=TRUE, by="Sample") } output <- Reduce(merge.all, DataList)
I would like to avoid defining the function outside the Reduce
syntax.
Attempt
As shown in the attempt below, I would like to cram everything inside the Reduce
:
dtaMrgd <- Reduce(f = function(x,y) {merge(x,y, by = "row.names")},
lapply(lstDta[grepl("Cars", names(lstDta)) == TRUE]))
so the Reduce
does two things:
- Filters the passed list using string according to matching names
- Uses the filtered object to apply merge function with the desired characteristics
Needles to say, the code above fails.
Notes
I'm specifically interested in a solution that would be of format res <- Reduce( ... )
. I'm not interested in creating some additional objects/functions outside the Reduce()
.
I think this is one way to achieve what you want:
res2 <- Reduce(function(x, y) {
data.frame(merge(x, y, by = 0), row.names = row.names(x))[,-1]
}, lstDta[grep("Cars", names(lstDta))])
dim(res2)
#[1] 32 33
names(res2)
#[1] "mpg.x" "cyl.x" "disp.x" "hp.x" "drat.x" "wt.x" "qsec.x" "vs.x" "am.x" "gear.x" "carb.x"
#[12] "mpg.y" "cyl.y" "disp.y" "hp.y" "drat.y" "wt.y" "qsec.y" "vs.y" "am.y" "gear.y" "carb.y"
#[23] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
- To filter the input list, I'm using
lstDta[grep("Cars", names(lstDta))]
- I drop the first column of each merged result (
Row.names
) with[,-1]
- You can use
by = 0
as a synonym forrow.names
to avoid any problems trying to merge onRow.names
androw.names
- Without the explicit
data.frame(..., row.names = row.names(x))
, merge will drop the originalmtcars
row names and replace them with the default1:nrow(x)
. This would be problematic for ensuing merge calls.
From the helpfile ?merge
,
Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input.
这篇关于使用Reduce将多个数据框与传递参数合并,并且不在Reduce外部定义函数(语法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!