合并R中的数据帧 [英] consolidating data frames in R
问题描述
您好,我有很多CSV文件要处理。每个文件由运行算法生成。
我的数据总是有一个键和一个这样的值:
Hi I have a lot of CSV files to process. Each file is generated by a run of an algorithm. My data always has one key and a value like this:
csv1:
index value
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
csv2:
index value
1 4 3
2 5 3
3 6 3
4 7 3
5 8 3
现在,我想汇总这些CSV数据,如下所示:
Now I want to aggregate these CSV data, like this:
当两个文件包含相同的键5,结果行应包含密钥两个文件共享(5)和两个值的平均值((1 + 3)/ 2 = 2)。如果只有一个文件包含一个键(例如2),这一行只是添加到结果表(键= 2,值= 1)。
When both files contain an identical key e.g. 5, the resulting row should contain the key both files share (5) and the mean of both values ((1+3)/2 = 2). If only one file contains a key (e.g. 2), this row is just added to the result table (key = 2, value = 1).
index value
1 1 1
2 2 1
3 3 1
4 4 2 (as (1+4)/2 = 2)
5 5 2 (as (1+4)/2 = 2)
6 6 3
7 7 3
8 8 3
一开始我认为 rbind code>执行作业,但它不会聚合值,只会连接数据。我如何实现R?
At first I thought rbind()
does the job, but it does not aggregate the values, only concatenates the data. How can I achieve that with R?
推荐答案
这里是一个解决方案。我正在跟踪所有优秀的意见,到目前为止,希望通过显示你如何处理任何数量的文件增值。我假设你有所有的csv文件在同一个目录( my.csv.dir
下面)。
Here is a solution. I am following all the excellent comments so far, and hopefully adding value by showing you how to handle any number of files. I am assuming you have all your csv files in the same directory (my.csv.dir
below).
# locate the files
files <- list.files(my.csv.dir)
# read the files into a list of data.frames
data.list <- lapply(files, read.csv)
# concatenate into one big data.frame
data.cat <- do.call(rbind, data.list)
# aggregate
data.agg <- aggregate(value ~ index, data.cat, mean)
编辑:在您的评论中处理您更新的问题:
to handle your updated question in your comment below:
files <- list.files(my.csv.dir)
algo.name <- sub("-.*", "", files)
data.list <- lapply(files, read.csv)
data.list <- Map(transform, data.list, algorithm = algo.name)
data.cat <- do.call(rbind, data.list)
data.agg <- aggregate(value ~ algorithm + index, data.cat, mean)
这篇关于合并R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!