R:对Complete求和。一列中的情况按另一列中的值分组(或排序) [英] R: Sum Complete.cases in one column grouped by (or sorted by) a value in another column
问题描述
我正在使用R中可用的 airquality
数据集,并试图计算数据中不包含任何的行数NA
,而按 Month
进行汇总。
I'm using the airquality
data set available in R, and attempting to count the number of rows within the data that do not contain any NA
s, while aggregating by Month
.
数据如下:
head(airquality)
# Ozone Solar.R Wind Temp Month Day
# 1 41 190 7.4 67 5 1
# 2 36 118 8.0 72 5 2
# 3 12 149 12.6 74 5 3
# 4 18 313 11.5 62 5 4
# 5 NA NA 14.3 56 5 5
# 6 28 NA 14.9 66 5 6
如您所见,我有<$ Ozone
和 Solar.R
列中的c $ c> NA s。我使用函数 complete.cases
如下:
As you can see, I have NA
s in columns Ozone
and Solar.R
. I used the function complete.cases
as follows:
x <- airquality[,1] # for the Ozone
y <- airquality[,2] # for the Solar.R
ok <- complete.cases(x,y)
然后进行检查:
nrow(airquality)
# [1] 153
sum(!ok)
# [1] 42
sum(ok)
# [1] 111
这很棒。
但是现在,我想将数据拆开以按 Month
(Column5)进行排序,这是我遇到问题的地方-尝试按column5中的值汇总
或 sort
c>月)。
But now, I'd like to pull that data apart to sort by Month
(Column5) and this is where I'm running into problems - in trying to aggregate
or sort
by the value in column5 (Month
).
我能够运行它,但还不能按 Month
排序(我只是想确保可以运行该函数):
I was able to get this to run, it won't sort by Month
yet (I just wanted to make sure I could get the function to run):
aggregate(x = sum(complete.cases(airquality)), by= list(nrow(airquality)), FUN = sum)
# Group.1 x
# 1 153 111
好的...所以整理一下。我正在尝试使用聚合函数的 by
部分进行排序。我在 airquality
内尝试了column5的许多变体。
OK... so to sort it out. I am trying to use the by
part of the aggregate function to sort. I tried many variations of the column5 within airquality
.
- airquality[,5]
- airquality[,"Month"]
我得到这些错误:
aggregate(x = sum(complete.cases(airquality)), by= list(airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
aggregate(x = sum(complete.cases(airquality)), by=
list(sum(complete.cases(airquality)),airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
我试图进一步搜索?aggregate(x ,...)
函数。即在 by
部分...
I tried to search further into the ?aggregate(x, ...)
function. Namely on the by
part...
by-分组元素列表,每个只要数据框x中的变量即可。元素在使用前会被强制转换为因素。
by - a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.
我在?factor
中查找了,但似乎看不到如何应用它(在这种情况下,即使是必要的话)。我还尝试将 break =
放入其中,但是没有用。
I looked up ?factor
, but can't seem to see how to apply it (if even necessary in this case). I also tried putting break =
into it but didn't work.
似乎没有一个可能已经有了答案的问题 ,其中许多都提供了C#和SQL解决方案。
None of the "Questions that may already have your answer" seem to apply, many of which give solutions in C# and SQL.
编辑:预期结果
Count Month
24 5
9 6
26 7
23 8
29 9
推荐答案
作为其他答案的补充,您可以使用 dplyr
。
As an addition to the other answers, you could do it with dplyr
.
require(dplyr)
airquality %.%
group_by(Month) %.%
summarize(incomplete = sum(!complete.cases(Ozone, Solar.R)),
complete = sum(complete.cases(Ozone, Solar.R)))
# Month incomplete complete
#1 5 7 24
#2 6 21 9
#3 7 5 26
#4 8 8 23
#5 9 1 29
这篇关于R:对Complete求和。一列中的情况按另一列中的值分组(或排序)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!