使用R汇总来自csv的数据 [英] summarize data from csv using R
问题描述
我是R的新手,我写了一些代码来根据我的需要对.csv文件中的数据进行总结。
I'm new to R, and I wrote some code to summarize data from .csv file according to my needs.
这里是代码。 >
here is the code.
raw <- read.csv("trees.csv")
看起来像这样
SNAME CNAME FAMILY PLOT INDIVIDUAL CAP H
1 Alchornea triplinervia (Spreng.) M. Arg. Tainheiro Euphorbiaceae 5 176 15 9.5
2 Andira fraxinifolia Benth. Angelim Fabaceae 3 321 12 6.0
3 Andira fraxinifolia Benth. Angelim Fabaceae 3 326 14 7.0
4 Andira fraxinifolia Benth. Angelim Fabaceae 3 327 18 5.0
5 Andira fraxinifolia Benth. Angelim Fabaceae 3 328 12 6.0
6 Andira fraxinifolia Benth. Angelim Fabaceae 3 329 21 7.0
#add 2 other rows
for (i in 1:nrow(raw)) {
raw$VOLUME[i] <- treeVolume(raw$CAP[i],raw$H[i])
raw$BASALAREA[i] <- treeBasalArea(raw$CAP[i])
}
#来了。
我需要一个新的数据框,列H和CAP的平均值以及列VOLUME和BASALAREA的和。该数据帧按列SNAME分组,并按列PLOT进行分组。
#here comes. I need a new data frame, with the mean of columns H and CAP and the sums of columns VOLUME and BASALAREA. This dataframe is grouped by column SNAME and subgrouped by column PLOT.
plotSummary = merge(
aggregate(raw$CAP ~ raw$SNAME * raw$PLOT, raw, mean),
aggregate(raw$H ~ raw$SNAME * raw$PLOT, raw, mean))
plotSummary = merge(
plotSummary,
aggregate(raw$VOLUME ~ raw$SNAME * raw$PLOT, raw, sum))
plotSummary = merge(
plotSummary,
aggregate(raw$BASALAREA ~ raw$SNAME * raw$PLOT, raw, sum))
函数treeVolume和treeBasal区域只返回数字。 / p>
The functions treeVolume and treeBasal area just return numbers.
treeVolume <- function(radius, height) {
return (0.000074230*radius**1.707348*height**1.16873)
}
treeBasalArea <- function(radius) {
return (((radius**2)*pi)/40000)
}
我相信有更好的方法来做,但是如何?
I'm sure that there is a better way of doing this, but how?
推荐答案
我无法设法读取您的示例数据,但我认为我已经做出了一般代表它的东西所以给这个旋转。这个答案建立在Greg的建议之下,以查看plyr和函数 ddply
来分组你的data.frame和 numcolwise
计算您的兴趣统计。
I can't manage to read your example data in, but I think I've made something that generally represents it...so give this a whirl. This answer builds off of Greg's suggestion to look at plyr and the functions ddply
to group by segments of your data.frame and numcolwise
to calculate your statistics of interest.
#Sample data
set.seed(1)
dat <- data.frame(sname = rep(letters[1:3],2), plot = rep(letters[1:3],2),
CAP = rnorm(6),
H = rlnorm(6),
VOLUME = runif(6),
BASALAREA = rlnorm(6)
)
#Calculate mean for all numeric columns, grouping by sname and plot
library(plyr)
ddply(dat, c("sname", "plot"), numcolwise(mean))
#-----
sname plot CAP H VOLUME BASALAREA
1 a a 0.4844135 1.182481 0.3248043 1.614668
2 b b 0.2565755 3.313614 0.6279025 1.397490
3 c c -0.8280485 1.627634 0.1768697 2.538273
编辑 - 对upd的响应有问题
好的 - 现在你的问题或多或少是可重现的,下面是我如何处理。首先,您可以利用以下事实:R是向量化的/ a>意味着您可以在一次通过中从VOLUME和BASALAREA中计算所有值,而不循环遍历每一行。对于那个位,我推荐变换
函数:
dat <- transform(dat, VOLUME = treeVolume(CAP, H), BASALAREA = treeBasalArea(CAP))
其次,意识到您打算计算不同的CAP& H,然后VOLUME& BASALAREA,我建议使用总结
函数,如下所示:
Secondly, realizing that you intend to calculate different statistics for CAP & H and then VOLUME & BASALAREA, I recommend using the summarize
function, like this:
ddply(dat, c("sname", "plot"), summarize,
meanCAP = mean(CAP),
meanH = mean(H),
sumVOLUME = sum(VOLUME),
sumBASAL = sum(BASALAREA)
)
哪个会给你一个输出看起来像:
Which will give you an output that looks like:
sname plot meanCAP meanH sumVOLUME sumBASAL
1 a a 0.5868582 0.5032308 9.650184e-06 7.031954e-05
2 b b 0.2869029 0.4333862 9.219770e-06 1.407055e-05
3 c c 0.7356215 0.4028354 2.482775e-05 8.916350e-05
?ddply,?transform,...总结
的帮助页面应该有洞察力。
The help pages for ?ddply, ?transform, ?summarize
should be insightful.
这篇关于使用R汇总来自csv的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!