合并R中的数据帧 [英] consolidating data frames in R

查看:145
本文介绍了合并R中的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我有很多CSV文件要处理。每个文件由运行算法生成。
我的数据总是有一个键和一个这样的值:

Hi I have a lot of CSV files to process. Each file is generated by a run of an algorithm. My data always has one key and a value like this:

csv1:

        index value
  1     1     1
  2     2     1
  3     3     1
  4     4     1
  5     5     1

csv2:

      index value
1     4     3
2     5     3
3     6     3
4     7     3
5     8     3

现在,我想汇总这些CSV数据,如下所示:

Now I want to aggregate these CSV data, like this:

当两个文件包含相同的键5,结果行应包含密钥两个文件共享(5)和两个值的平均值((1 + 3)/ 2 = 2)。如果只有一个文件包含一个键(例如2),这一行只是添加到结果表(键= 2,值= 1)。

When both files contain an identical key e.g. 5, the resulting row should contain the key both files share (5) and the mean of both values ((1+3)/2 = 2). If only one file contains a key (e.g. 2), this row is just added to the result table (key = 2, value = 1).

      index value
1     1     1
2     2     1
3     3     1
4     4     2 (as (1+4)/2 = 2)
5     5     2 (as (1+4)/2 = 2)
6     6     3
7     7     3
8     8     3

一开始我认为 rbind code>执行作业,但它不会聚合值,只会连接数据。我如何实现R?

At first I thought rbind() does the job, but it does not aggregate the values, only concatenates the data. How can I achieve that with R?

推荐答案

这里是一个解决方案。我正在跟踪所有优秀的意见,到目前为止,希望通过显示你如何处理任何数量的文件增值。我假设你有所有的csv文件在同一个目录( my.csv.dir 下面)。

Here is a solution. I am following all the excellent comments so far, and hopefully adding value by showing you how to handle any number of files. I am assuming you have all your csv files in the same directory (my.csv.dir below).

# locate the files
files <- list.files(my.csv.dir)

# read the files into a list of data.frames
data.list <- lapply(files, read.csv)

# concatenate into one big data.frame
data.cat <- do.call(rbind, data.list)

# aggregate
data.agg <- aggregate(value ~ index, data.cat, mean)






编辑:在您的评论中处理您更新的问题:


to handle your updated question in your comment below:

files     <- list.files(my.csv.dir)
algo.name <- sub("-.*", "", files)
data.list <- lapply(files, read.csv)
data.list <- Map(transform, data.list, algorithm = algo.name)
data.cat  <- do.call(rbind, data.list)
data.agg  <- aggregate(value ~ algorithm + index, data.cat, mean)

这篇关于合并R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆