计算 data.frame 子集中数字的出现次数 [英] Calculating the occurrences of numbers in the subsets of a data.frame

查看:32
本文介绍了计算 data.frame 子集中数字的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个数据框,类似于以下内容.实际上,我真正的 'df' 数据框比这里的要大得多,但我真的不想混淆任何人,所以这就是我尝试尽可能简化事情的原因.

I have a data frame in R which is similar to the follows. Actually my real ’df’ dataframe is much bigger than this one here but I really do not want to confuse anybody so that is why I try to simplify things as much as possible.

这是数据框.

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)   
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)

df <-data.frame(id,a,b,c,d,e)
df

基本上我想要做的是获取每列 (a,b,c,d,e) 和每个 id 组 (1,2,3) 的数字出现次数(对于后一种分组,请参阅我的列id").

Basically what I would like to do is to get the occurrences of numbers for each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see my column ’id’).

因此,对于列 'a' 和 id 号 '1'(后者参见列 'id'),代码将是这样的:

So, for column ’a’ and for id number ’1’ (for the latter see column ’id’) the code would be something like this:

as.numeric(table(df[1:10,2]))

##The results are:
[1] 3 7

简单地解释一下我的结果:在a"列中(并且仅针对那些在id"列中具有数字1"的记录)我们可以说数字1"出现了 3 次,数字3"出现了7 次.

Just to briefly explain my results: in column ’a’ (and regarding only those records which have number ’1’ in column ’id’) we can say that number '1' occured 3 times and number '3' occured 7 times.

再次向您展示另一个示例.对于列 'a' 和 id 号 '2'(对于后面的分组,请再次参见列 'id'):

Again, just to show you another example. For column ’a’ and for id number ’2’ (for the latter grouping see again column ’id’):

as.numeric(table(df[11:20,2]))

##After running the codes the results are: 
[1] 4 3 3

让我再解释一下:在a"列中,仅针对那些在id"列中具有数字2"的观察值)我们可以说数字1"出现了 4 次,数字2"出现了 3 次次数和数字3"出现了 3 次.

Let me explain a little again: in column ’a’ and regarding only those observations which have number ’2’ in column ’id’) we can say that number '1' occured 4 times, number '2' occured 3 times and number '3' occured 3 times.

所以这就是我想做的.计算每个自定义子集的数字出现次数(然后将这些值收集到数据框中).我知道这不是一项艰巨的任务,但问题是我必须定期更改输入的df"数据框,因此总行数和列数可能会随着时间的推移而发生变化......

So this is what I would like to do. Calculating the occurrences of numbers for each custom-defined subsets (and then collecting these values into a data frame). I know it is not a difficult task but the PROBLEM is that I’m gonna have to change the input ’df’ dataframe on a regular basis and hence both the overall number of rows and columns might change over time…

到目前为止我所做的是按列分隔df"数据框,如下所示:

What I have done so far is that I have separated the ’df’ dataframe by columns, like this:

for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])

所以 df.2 将指代 df$a,df.3 将等于 df$b,df.4 将等于 df$c 等等.但我现在真的被困住了,我不知道如何前进……

So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But I’m really stuck now and I don’t know how to move forward…

是否有适当的自动"方法来解决这个问题?

Is there a proper, "automatic" way to solve this problem?

推荐答案

怎么样 -

> library(reshape)

> dftab <- table(melt(df,'id'))
> dftab
, , value = 1

   variable
id  a b c d e
  1 3 8 2 2 4
  2 4 6 3 2 4
  3 4 2 1 5 1

, , value = 2

   variable
id  a b c d e
  1 0 1 4 3 3
  2 3 3 3 6 2
  3 1 4 5 3 4

, , value = 3

   variable
id  a b c d e
  1 7 1 4 5 3
  2 3 1 4 2 4
  3 5 4 4 2 5

所以要获得a"列和1"组中3"的数量你可以这样做

So to get the number of '3's in column 'a' and group '1' you could just do

> dftab[3,'a',1]
[1] 4

这篇关于计算 data.frame 子集中数字的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆