基于某些条件（这种情况下的3个条件）过滤和添加的有效方法 [英] Efficient method to filter and add based on certain conditions (3 conditions in this case)

查看：125 发布时间：2017/3/12 10:23:34 r data.table plyr dplyr subset-sum

本文介绍了基于某些条件（这种情况下的3个条件）过滤和添加的有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的数据框。

I have a data frame which looks like this

     a    b    c   d
     1    1    1   0
     1    1    1   200
     1    1    1   300
     1    1    2   0
     1    1    2   600
     1    2    3   0
     1    2    3   100
     1    2    3   200
     1    3    1   0

我有一个数据框，看起来像这样

I have a data frame which looks like this

     a    b    c   d
     1    1    1   250
     1    1    2   600
     1    2    3   150
     1    3    1   0

我目前正在做它
{

I am currently doing it {

  n=nrow(subset(Wallmart, a==i &    b==j & c==k  ))
  sum=subset(Wallmart, a==i &    b==j & c==k  )
  #sum
  sum1=append(sum1,sum(sum$d)/(n-1))

}

'd'coloumn并通过计算行数而不计数为0来计算平均值。例如，第一行是（200 + 300）/ 2 = 250.
目前我正在建立一个列表， coloumn但理想情况下，我想要它在上面的格式。例如，第一行类似

I would like to add the 'd' coloumn and take the average by counting the number of rows without counting 0. For example the first row is (200+300)/2 = 250. Currently I am building a list that stores the 'd' coloumn but ideally I want it in the format above. For example first row would look like

     a    b    c   d
     1    1    1   250

这是一个非常低效的方式来完成这项工作。代码在循环中运行需要很长时间。
所以任何帮助是赞赏，使其运行更快。

This is a very inefficient way to do this work. The code takes a long time to run in a loop. so any help is appreciated that makes it run faster. The original data frame has about a million rows.

推荐答案

您可以尝试 aggregate ：

aggregate(d ~ a + b + c, data = df, sum)
#   a b c   d
# 1 1 1 1 500
# 2 1 3 1   0
# 3 1 1 2 600
# 4 1 2 3 300

如@Roland所示，对于更大的数据集，您可以尝试 data.table code> dplyr ，例如：

As noted by @Roland, for bigger data sets, you may try data.table or dplyr instead, e.g.:

library(dplyr)
df %>%
  group_by(a, b, c) %>%
  summarise(
    sum_d = sum(d))

# Source: local data frame [4 x 4]
# Groups: a, b
# 
#   a b c sum_d
# 1 1 1 1   500
# 2 1 1 2   600
# 3 1 2 3   300
# 4 1 3 1     0

修改以下更新的问题。
如果要计算按组平均值，排除为零的行，您可以尝试：

Edit following updated question. If you want to calculate group-wise mean, excluding rows that are zero, you may try this:

aggregate(d ~ a + b + c, data = df, function(x) mean(x[x > 0]))
#   a b c   d
# 1 1 1 1 250
# 2 1 3 1 NaN
# 3 1 1 2 600
# 4 1 2 3 150

df %>%
  filter(d != 0) %>%
  group_by(a, b, c) %>%
  summarise(
    mean_d = mean(d))

#   a b c mean_d
# 1 1 1 1    250
# 2 1 1 2    600
# 3 1 2 3    150

但是，因为似乎你希望将零作为缺失值而不是数字零，我认为最好将它们转换为 NA

However, because it seems that you wish to treat your zeros as missing values rather than numeric zeros, I think it would be better to convert them to NA when preparing your data set, before the calculations.

df$d[df$d == 0] <- NA
df %>%
  group_by(a, b, c) %>%
  summarise(
    mean_d = mean(d, na.rm = TRUE))

#   a b c mean_d
# 1 1 1 1    250
# 2 1 1 2    600
# 3 1 2 3    150
# 4 1 3 1    NaN

这篇关于基于某些条件（这种情况下的3个条件）过滤和添加的有效方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于某些条件（这种情况下的3个条件）过滤和添加的有效方法 [英] Efficient method to filter and add based on certain conditions (3 conditions in this case)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于某些条件（这种情况下的3个条件）过滤和添加的有效方法 [英] Efficient method to filter and add based on certain conditions (3 conditions in this case)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭