忽略重复的汇总数据框 [英] Summarise data frame ignoring repetition

查看:32
本文介绍了忽略重复的汇总数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中一列中有重复的条目.我想根据那一列总结其他列.我希望摘要在制作摘要时考虑每个独特的条目,而不是总数.例如在下面的数据框示例中,如果我想回答关于被调查的人中有多少人是年轻人、中年人和老年人的问题?RefID"1-1 在总结中被视为 1"ageclass"=young 并且不解释为计数 5.

I have a data frame in which there are repetitions of entries in one column. I want to summarize the other columns based on the that one column. I wish the summary to consider each unique entry and not the total count when making the summary. For example in the data frame example below, if i want to answer the question on how many people surveyed are young,midage and old? "RefID" 1-1 is taken as a count of 1 in summarising "ageclass"=young and not interpreted as a count of 5.

RefID   Altitude    Sex ageclass
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-1 Low F   young
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-2 Low F   midage
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-3 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-4 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-5 Low F   old
1-7 Low F   old
1-7 Low F   old
1-7 Low F   old
1-7 Low F   old
1-8 Low F   old
1-8 Low F   old
1-9 Low F   old
1-9 Low F   old
1-9 Low F   old

谢谢.

推荐答案

要获取数据帧中的唯一条目,请参阅 ?uniqe :

To get unique entries in a dataframe, see ?uniqe :

Data <- unique(Mydata)

您可以使用:

by(Data,Data$ageclass,summary)

另见 ?summary 以了解结果.如果您对计数感兴趣,可以使用 table ,例如:

See also ?summary to understand the outcome. If you are interested in counts, you can use table ,eg :

table(Data$RefID,Data$ageclass)

或总结:

margin.table(table(Data$RefID,Data$ageclass),margin=2)

你必须小心一点,因为 unique() 需要唯一的行.如果您同时拥有 refID 1-1 的男性和女性,那么您仍然会计算两次.但我认为您的数据不会出现这种情况.如果你真的想确定,你可以这样做:

EDIT : you'll have to be a bit careful, as unique() takes the unique rows. If you have both a male and a female having refID 1-1 , then you'll still count it twice. But I presume that won't be the case in your data. If you really want to make sure, you can do :

with(unique(Data[c(1,4)]),margin.table(table(RefID,ageclass),margin=2))

或者采用这里提到的 plyr 解决方案.

or take the plyr solution mentioned here.

这篇关于忽略重复的汇总数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆