使用R汇总来自csv的数据 [英] summarize data from csv using R

查看：107 发布时间：2017/3/26 2:20:07 r dataframe aggregate summary

本文介绍了使用R汇总来自csv的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是R的新手，我写了一些代码来根据我的需要对.csv文件中的数据进行总结。

I'm new to R, and I wrote some code to summarize data from .csv file according to my needs.

这里是代码。 >

here is the code.

raw <- read.csv("trees.csv")

看起来像这样

                                 SNAME     CNAME        FAMILY PLOT INDIVIDUAL CAP   H
1 Alchornea triplinervia (Spreng.) M. Arg. Tainheiro Euphorbiaceae    5        176  15 9.5
2               Andira fraxinifolia Benth.   Angelim      Fabaceae    3        321  12 6.0
3               Andira fraxinifolia Benth.   Angelim      Fabaceae    3        326  14 7.0
4               Andira fraxinifolia Benth.   Angelim      Fabaceae    3        327  18 5.0
5               Andira fraxinifolia Benth.   Angelim      Fabaceae    3        328  12 6.0
6               Andira fraxinifolia Benth.   Angelim      Fabaceae    3        329  21 7.0

#add 2 other rows
for (i in 1:nrow(raw)) {
  raw$VOLUME[i] <- treeVolume(raw$CAP[i],raw$H[i])  
  raw$BASALAREA[i] <- treeBasalArea(raw$CAP[i])
}

＃来了。
我需要一个新的数据框，列H和CAP的平均值以及列VOLUME和BASALAREA的和。该数据帧按列SNAME分组，并按列PLOT进行分组。

#here comes. I need a new data frame, with the mean of columns H and CAP and the sums of columns VOLUME and BASALAREA. This dataframe is grouped by column SNAME and subgrouped by column PLOT.

plotSummary = merge(
  aggregate(raw$CAP ~ raw$SNAME * raw$PLOT, raw, mean),
  aggregate(raw$H ~ raw$SNAME * raw$PLOT, raw, mean))

plotSummary = merge(
  plotSummary,
  aggregate(raw$VOLUME ~ raw$SNAME * raw$PLOT, raw, sum))


plotSummary = merge(
  plotSummary,
  aggregate(raw$BASALAREA ~ raw$SNAME * raw$PLOT, raw, sum))

函数treeVolume和treeBasal区域只返回数字。 / p>

The functions treeVolume and treeBasal area just return numbers.

treeVolume <- function(radius, height) {
  return (0.000074230*radius**1.707348*height**1.16873)
}

treeBasalArea <- function(radius) {
  return (((radius**2)*pi)/40000)
}

我相信有更好的方法来做，但是如何？

I'm sure that there is a better way of doing this, but how?

推荐答案

我无法设法读取您的示例数据，但我认为我已经做出了一般代表它的东西所以给这个旋转。这个答案建立在Greg的建议之下，以查看plyr和函数 ddply 来分组你的data.frame和 numcolwise 计算您的兴趣统计。

I can't manage to read your example data in, but I think I've made something that generally represents it...so give this a whirl. This answer builds off of Greg's suggestion to look at plyr and the functions ddply to group by segments of your data.frame and numcolwise to calculate your statistics of interest.

#Sample data
set.seed(1)
dat <- data.frame(sname = rep(letters[1:3],2), plot = rep(letters[1:3],2), 
                  CAP = rnorm(6), 
                  H = rlnorm(6), 
                  VOLUME = runif(6),
                  BASALAREA = rlnorm(6)
                  )


#Calculate mean for all numeric columns, grouping by sname and plot
library(plyr)
ddply(dat, c("sname", "plot"), numcolwise(mean))
#-----
  sname plot        CAP        H    VOLUME BASALAREA
1     a    a  0.4844135 1.182481 0.3248043  1.614668
2     b    b  0.2565755 3.313614 0.6279025  1.397490
3     c    c -0.8280485 1.627634 0.1768697  2.538273

编辑 - 对upd的响应有问题

好的 - 现在你的问题或多或少是可重现的，下面是我如何处理。首先，您可以利用以下事实：R是向量化的/ a>意味着您可以在一次通过中从VOLUME和BASALAREA中计算所有值，而不循环遍历每一行。对于那个位，我推荐变换函数：

dat <- transform(dat, VOLUME = treeVolume(CAP, H), BASALAREA = treeBasalArea(CAP))

其次，意识到您打算计算不同的CAP& H，然后VOLUME& BASALAREA，我建议使用总结函数，如下所示：

Secondly, realizing that you intend to calculate different statistics for CAP & H and then VOLUME & BASALAREA, I recommend using the summarize function, like this:

ddply(dat, c("sname", "plot"), summarize,
  meanCAP = mean(CAP),
  meanH = mean(H),
  sumVOLUME = sum(VOLUME),
  sumBASAL = sum(BASALAREA)
  )

哪个会给你一个输出看起来像：

Which will give you an output that looks like:

  sname plot   meanCAP     meanH    sumVOLUME     sumBASAL
1     a    a 0.5868582 0.5032308 9.650184e-06 7.031954e-05
2     b    b 0.2869029 0.4333862 9.219770e-06 1.407055e-05
3     c    c 0.7356215 0.4028354 2.482775e-05 8.916350e-05

？ddply，？transform，...总结的帮助页面应该有洞察力。

The help pages for ?ddply, ?transform, ?summarize should be insightful.

这篇关于使用R汇总来自csv的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用R汇总来自csv的数据 [英] summarize data from csv using R

问题描述

推荐答案

编辑 - 对upd的响应有问题

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用R汇总来自csv的数据 [英] summarize data from csv using R

问题描述

推荐答案

编辑 - 对upd的响应有问题

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭