R:对Complete求和。一列中的情况按另一列中的值分组(或排序) [英] R: Sum Complete.cases in one column grouped by (or sorted by) a value in another column

查看:226
本文介绍了R:对Complete求和。一列中的情况按另一列中的值分组(或排序)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R中可用的 airquality 数据集,并试图计算数据中不包含任何的行数NA ,而按 Month 进行汇总。

I'm using the airquality data set available in R, and attempting to count the number of rows within the data that do not contain any NAs, while aggregating by Month.

数据如下:

head(airquality)
#   Ozone Solar.R Wind Temp Month Day
# 1    41     190  7.4   67     5   1
# 2    36     118  8.0   72     5   2
# 3    12     149 12.6   74     5   3
# 4    18     313 11.5   62     5   4
# 5    NA      NA 14.3   56     5   5
# 6    28      NA 14.9   66     5   6

如您所见,我有<$ Ozone Solar.R 列中的c $ c> NA s。我使用函数 complete.cases 如下:

As you can see, I have NAs in columns Ozone and Solar.R. I used the function complete.cases as follows:

x  <- airquality[,1] # for the Ozone
y  <- airquality[,2] # for the Solar.R
ok <- complete.cases(x,y)

然后进行检查:

nrow(airquality)
# [1] 153
sum(!ok)
# [1] 42
sum(ok)
# [1] 111

这很棒。

但是现在,我想将数据拆开以按 Month (Column5)进行排序,这是我遇到问题的地方-尝试按column5中的值汇总 sort c>月)。

But now, I'd like to pull that data apart to sort by Month (Column5) and this is where I'm running into problems - in trying to aggregate or sort by the value in column5 (Month).

我能够运行它,但还不能按 Month 排序(我只是想确保可以运行该函数):

I was able to get this to run, it won't sort by Month yet (I just wanted to make sure I could get the function to run):

aggregate(x = sum(complete.cases(airquality)), by= list(nrow(airquality)), FUN = sum)
#   Group.1   x
# 1     153 111

好的...所以整理一下。我正在尝试使用聚合函数的 by 部分进行排序。我在 airquality 内尝试了column5的许多变体。

OK... so to sort it out. I am trying to use the by part of the aggregate function to sort. I tried many variations of the column5 within airquality.

- airquality[,5]
- airquality[,"Month"]

我得到这些错误:

aggregate(x = sum(complete.cases(airquality)), by= list(airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) : 
#   arguments must have same length

aggregate(x = sum(complete.cases(airquality)), by= 
      list(sum(complete.cases(airquality)),airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) : 
#   arguments must have same length

我试图进一步搜索?aggregate(x ,...)函数。即在 by 部分...

I tried to search further into the ?aggregate(x, ...) function. Namely on the by part...


by-分组元素列表,每个只要数据框x中的变量即可。元素在使用前会被强制转换为因素。

by - a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.

我在?factor 中查找了,但似乎看不到如何应用它(在这种情况下,即使是必要的话)。我还尝试将 break = 放入其中,但是没有用。

I looked up ?factor, but can't seem to see how to apply it (if even necessary in this case). I also tried putting break = into it but didn't work.

似乎没有一个可能已经有了答案的问题 ,其中许多都提供了C#和SQL解决方案。

None of the "Questions that may already have your answer" seem to apply, many of which give solutions in C# and SQL.

编辑:预期结果

Count  Month
  24       5
   9       6
  26       7
  23       8
  29       9


推荐答案

作为其他答案的补充,您可以使用 dplyr

As an addition to the other answers, you could do it with dplyr.

require(dplyr)

airquality %.%
  group_by(Month) %.%
  summarize(incomplete = sum(!complete.cases(Ozone, Solar.R)),
             complete = sum(complete.cases(Ozone, Solar.R)))

#  Month incomplete complete
#1     5          7       24
#2     6         21        9
#3     7          5       26
#4     8          8       23
#5     9          1       29

这篇关于R:对Complete求和。一列中的情况按另一列中的值分组(或排序)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆