如何根据R中其他列中的值添加计数列 [英] How to add a counting column based on values in other columns in R

查看:32
本文介绍了如何根据R中其他列中的值添加计数列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较大的数据集(16,000+ x ~31).换句话说,它足够大,我不想在 Excel 中逐行操作它.数据格式如下:

I have a relatively large dataset (16,000+ x ~31). In other words, it's large enough that I don't want to manipulate it line by line in Excel. The data is in this form:

block  site     day  X1   X2
1      1        1    0.4  5.1 
1      1        2    0.8  1.1
1      1        3    1.1  4.2
1      2        1    ...  ...
1      2        2
1      2        3
2      3        1
2      3        2
2      3        3
2      4        1
2      4        2
2      4        3

如您所见,站点计数是连续的,但我想要一个列,其中站点编号随每个块重置.例如,我想要以下内容:

As you can see, the site count is continuous but I would like a column where the site number resets with each block. For example, I would like something like this below:

block  site     day  X1   X2    site2
1      1        1    0.4  5.1   1
1      1        2    0.8  1.1   1
1      1        3    1.1  4.2   1
1      2        1    ...  ...   2
1      2        2               2
1      2        3               2
2      3        1               1
2      3        2               1
2      3        3               1
2      4        1               2
2      4        2               2
2      4        3               2

我正在考虑使用 R 函数 rle,但我不确定它是否会起作用,因为白天很复杂.否则,我会尝试类似的东西:

I was thinking about using the R function rle but am not sure if it will work because of complications with day. Otherwise, I would try something like:

Data$site2 <- sequence(rle(Data$block)$lengths)

有没有人对添加一个列计数(序列)每个块内的站点数量有任何建议?如果有帮助,每个站点记录的天数 (263) 相同,但每个块的站点数不同.

Does anyone have any suggestions for adding a column counting (sequence) the number of sites within each block? If it helps, there are the same number of days (263) recorded for each site but there are a different number of sites per block.

推荐答案

这是一个使用 plyrddply 的稍微笨拙的解决方案:

Here's a slightly clumsy solution using plyr and ddply:

ddply(df,.(block),transform,
                  site1 = rep(1:length(unique(site)),
                             times = rle(site)$lengths))

或者更简洁的版本:

ddply(df,.(block),transform,site1 = as.integer(as.factor(site)))

虽然使用各种 seqsequencerle 函数可以直接做到这一点,但是我的大脑此刻有点朦胧.如果您将其保持开放一段时间,那么可能会有人提出一个巧妙的非plyr 解决方案.

There may be a clever way of doing this directly, though, using the various seq, sequence and rle functions, but my brain is a bit hazy at the moment. If you leave this open for a bit someone will likely come along with a slick non-plyr solution.

这篇关于如何根据R中其他列中的值添加计数列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆