使用Reduce将多个数据框与传递参数合并,并且不在Reduce外部定义函数(语法) [英] Using Reduce to merge multiple data frames with passing arguments and without defining function outside the Reduce (syntax)

查看:120
本文介绍了使用Reduce将多个数据框与传递参数合并,并且不在Reduce外部定义函数(语法)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我列出了一些与下面生成的对象相对应的数据集:

  data(AirPassengers ); data(mtcars)
lstDta< - list(dtaCars1 = mtcars,dtaCars2 = mtcars,
dtaCars3 = mtcars,dtaOtherStuff = AirPassengers)

我想在 row.names
上合并 cars 数据集

结果



结果应该对应于数据框:

  res<  -  merge(
x = merge(x = lstDta $ dtaCars1,y = lstDta $ dtaCars2,by =row.names),
y = lstDta $ dtaCars3,by .y =row.names,
by.x =Row.names)



< (理想情况下,我会放弃 Row.names > 变量,但这并不妨碍我):

 > dim(res)
[1] 32 34



问题



我希望使用 Reduce 获得相同的结果,特别是我感兴趣的:


  • 合并 row.names
  • 中的数据框
  • 过滤列表。例如,我想仅合并汽车数据并忽略其他数据集



其他要求



非常有用的答案建议定义一个函数外部减少,在代码的行:


  merge.all<  - 函数(x,y){
merge(x,y,all = TRUE,by =Sample)
}

输出< - Reduce(merge.all,DataList )


我想避免定义 Reduce 语法。

尝试



如下图所示,我想将 Reduce

  dtaMrgd<  -  Reduce(f = function(x,y){合并(x,y,by =row.names)},
lapply(lstDta [grepl(Cars,names(lstDta))== TRUE]))

所以 Reduce 会做两件事:


  1. 根据匹配的名称使用字符串过滤传递的列表

  2. 使用过滤的对象应用具有所需特性的合并函数

针头说,上面的代码失败。




笔记



我特别感兴趣的是一种格式为 res< - 减少(...) 。我不想在 Reduce()之外创建一些额外的对象/函数。

解决方案

我认为这是实现您想要的一种方式:

  res2 < -  <减少(函数(x,y){
data.frame(merge(x,y,by = 0),row.names = row.names(x))[, - 1]
}, lstDta [grep(Cars,names(lstDta))])

dim(res2)
#[1] 32 33

names(res2)
#[1]mpg.xcyl.xdisp.xhp.xdrat.xwt.xqsec.xvs.xam。 xgear.xcarb.x
#[12]mpg.ycyl.ydisp.yhp.ydrat.ywt.y qsec.yvsyam.ygear.ycarb.y
#[23]mpgcyldisphpdratwt qsec与amgearcarb







  • 要过滤输入列表,我使用 lstDta [grep(Cars,names(lstDta))]

  • 我删除每个合并结果的第一列( Row.names )与 [, - 1]

  • 您可以使用 by = 0 作为 row.names 的同义词,以避免尝试合并到 Row.names 如果没有显式的 data.frame(...,row.names = row),那么 row.names
  • .names(x)),合并将删除原来的 mtcars 行名称并将它们替换为默认的 1:nrow (x)的。这对于后续的合并调用是有问题的。





从helpfile ?merge


要合并的列可以通过名称,数字或逻辑
vector:名称row.names或数字0指定行名称。
如果按名称指定,它必须唯一对应于
中的命名列输入。



I've a a list with a number of data set that corresponds to the object generated below:

data("AirPassengers"); data("mtcars")
lstDta <- list(dtaCars1 = mtcars, dtaCars2 = mtcars,
               dtaCars3 = mtcars, dtaOtherStuff = AirPassengers)

I would like to merge the cars data sets on row.names

Results

The results should correspond to the data frame:

res <- merge(
    x = merge(x = lstDta$dtaCars1, y = lstDta$dtaCars2, by = "row.names"),
    y = lstDta$dtaCars3, by.y = "row.names",
    by.x = "Row.names")

where the columns are joined using row.names (ideally, I would drop the Row.names variable but this doesn't bother me):

> dim(res)
[1] 32 34 

Problem

I want to achieve the same results making use of Reduce, in particular I am interested in:

  • Merging the data frames on the row.names
  • Filtering the list. For example, I want to merge the cars data only and ignore the other data set

Additional requirements

Very useful answer suggests defining a function outside reduce, on the lines of the code:

merge.all <- function(x, y) {
    merge(x, y, all=TRUE, by="Sample")
}

output <- Reduce(merge.all, DataList)

I would like to avoid defining the function outside the Reduce syntax.

Attempt

As shown in the attempt below, I would like to cram everything inside the Reduce:

dtaMrgd <- Reduce(f = function(x,y) {merge(x,y, by = "row.names")},
              lapply(lstDta[grepl("Cars", names(lstDta)) == TRUE]))

so the Reduce does two things:

  1. Filters the passed list using string according to matching names
  2. Uses the filtered object to apply merge function with the desired characteristics

Needles to say, the code above fails.


Notes

I'm specifically interested in a solution that would be of format res <- Reduce( ... ). I'm not interested in creating some additional objects/functions outside the Reduce().

解决方案

I think this is one way to achieve what you want:

res2 <- Reduce(function(x, y) {
  data.frame(merge(x, y, by = 0), row.names = row.names(x))[,-1]
}, lstDta[grep("Cars", names(lstDta))])

dim(res2)
#[1] 32 33

names(res2)
#[1] "mpg.x"  "cyl.x"  "disp.x" "hp.x"   "drat.x" "wt.x"   "qsec.x" "vs.x"   "am.x"   "gear.x" "carb.x"
#[12] "mpg.y"  "cyl.y"  "disp.y" "hp.y"   "drat.y" "wt.y"   "qsec.y" "vs.y"   "am.y"   "gear.y" "carb.y"
#[23] "mpg"    "cyl"    "disp"   "hp"     "drat"   "wt"     "qsec"   "vs"     "am"     "gear"   "carb"


  • To filter the input list, I'm using lstDta[grep("Cars", names(lstDta))]
  • I drop the first column of each merged result (Row.names) with [,-1]
  • You can use by = 0 as a synonym for row.names to avoid any problems trying to merge on Row.names and row.names
  • Without the explicit data.frame(..., row.names = row.names(x)), merge will drop the original mtcars row names and replace them with the default 1:nrow(x). This would be problematic for ensuing merge calls.

From the helpfile ?merge,

Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input.

这篇关于使用Reduce将多个数据框与传递参数合并,并且不在Reduce外部定义函数(语法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆