解释R Tapply描述 [英] Explain R tapply description

查看:87
本文介绍了解释R Tapply描述的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解tapply()在R中的作用。但是,我无法从文档中解析此描述:

I understand what tapply() does in R. However, I cannot parse this description of it from the documentaion:



Apply a Function Over a "Ragged" Array

Description:

     Apply a function to each cell of a ragged array, that is to each
     (non-empty) group of values given by a unique combination of the
     levels of certain factors.

Usage:

     tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

当我想到tapply时,我想到了sql中的group by。您可以将X中的值通过INDEX中其并行因子级别分组在一起,然后将FUN应用于这些组。我已经阅读了Tapply 100次的说明,但仍无法弄清它的内容如何映射到我对Tapply的理解。也许有人可以帮我解析它?

When I think of tapply, I think of group by in sql. You group values in X together by its parallel factor levels in INDEX and apply FUN to those groups. I have read the description of tapply 100 times and still can't figure out how what it says maps to how I understand tapply. Perhaps someone can help me parse it?

推荐答案

让我们看看 R文档对此主题说:

Let's see what the R documentation says on the subject:


向量和标记因子的组合是有时称为参差不齐数组的示例,因为子类的大小可能不规则。当子类的大小都相同时,可以隐式且高效地进行索引,这将在下一节中看到。

The combination of a vector and a labelling factor is an example of what is sometimes called a ragged array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same the indexing may be done implicitly and much more efficiently, as we see in the next section.

您通过 INDEX 提供的因子列表一起指定了 X 子集的集合,这些子集的长度可能不同(因此, 破烂的描述符)。然后将 FUN 应用于每个子集。

The list of factors you supply via INDEX together specify a collection of subsets of X, of possibly different lengths (hence, the 'ragged' descriptor). And then FUN is applied to each subset.

编辑:@Joris在注释中提出了一个很好的观点。将 tapply(X,Y,...)视为 sapply(split(X,Y),)的包装可能会有所帮助。 ..),因为如果Y是分组因子的列表,它会根据其唯一级别构建一个新的单个分组因子,并相应地拆分X并将FUN应用于每个片段。

@Joris makes an excellent point in the comments. It may be helpful to think of tapply(X,Y,...) as a wrapper for sapply(split(X,Y),...) in that if Y is a list of grouping factors, it builds a new, single grouping factor based on their unique levels, splits X accordingly and applies FUN to each piece.

编辑:这是一个示例:

library(lattice)
library(plyr)
set.seed(123)

#Make this example unbalanced
dat <- barley[sample(1:120,50),]

#Suppose we want the avg yield by year/site:
table(dat$year,dat$site)

#That's what they mean by 'ragged' array; there are different
# numbers of obs at each comb of levels

#In plyr we could use ddply:
ddply(dat,.(year,site),.fun=function(x){mean(x$yield)})

#Which gives the same result (listed in a diff order) as:
melt(tapply (dat$yield, list (dat$year, dat$site), mean))

这篇关于解释R Tapply描述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆