R计数相似行的数据帧的出现 [英] R counting the occurrences of similar rows of data frame

查看:187
本文介绍了R计数相似行的数据帧的出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的数据称为DF(这只是一个简化的示例):

I have data in the following format called DF (this is just a made up simplified sample):

eval.num, eval.count, fitness, fitness.mean, green.h.0, green.v.0, offset.0 random
1         1           1500     1500          100        120        40       232342
2         2           1000     1250          100        120        40       11843
3         3           1250     1250          100        120        40       981340234
4         4           1000     1187.5        100        120        40       4363453
5         1           2000     2000          200        100        40       345902
6         1           3000     3000          150        90         10       943
7         1           2000     2000          90         90         100      9304358
8         2           1800     1900          90         90         100      284333

但是,eval.count列不正确,我需要修复它。它应该通过只查看以前的行来报告具有相同值(green.h.0,green.v.0和offset.0)的行数。

However, the eval.count column is incorrect and I need to fix it. It should report the number of rows with the same values for (green.h.0, green.v.0, and offset.0) by only looking at the previous rows.

上面的例子使用期望的值,但假设它们不正确。

The example above uses the expected values, but assume they are incorrect.

如何添加一个新列(说count具有相同的指定变量值的行?

How can I add a new column (say "count") which will count all previous rows which have the same values of the specified variables?

我已经得到了帮助类似的问题,只是选择所有行具有相同的值指定的列,所以我应该

I have gotten help on a similar problem of just selecting all rows with the same values for specified columns, so I supposed I could just write a loop around that, but it seems inefficient to me.

推荐答案

确定, 让我们先在你只有一列的简单情况下做它。

Ok, let's first do it in the easy case where you just have one column.

> data <- rep(sample(1000, 5),
              sample(5, 5))
> head(data)
[1] 435 435 435 278 278 278

使用rle来计算出连续的序列:

Then you can just use rle to figure out the contiguous sequences:

> sequence(rle(data)$lengths)
[1] 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1

或完全:

> head(cbind(data, sequence(rle(data)$lengths)))
[1,]  435 1
[2,]  435 2
[3,]  435 3
[4,]  278 1
[5,]  278 2
[6,]  278 3


b $ b

对于具有多个列的情况,可能有很多种方法来应用此解决方案。最简单的可能是粘贴您关心的列以形成一个单一的向量。

For your case with multiple columns, there are probably a bunch of ways of applying this solution. Easiest might be to just paste the columns you care about together to form a single vector.

这篇关于R计数相似行的数据帧的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆