为每个连续序列创建一个组号 [英] Create a group number for each consecutive sequence
问题描述
我在下面有data.frame.我想添加一列"g",该列根据 h_no
列中的连续序列对我的数据进行分类.也就是说,h_no 1、2、3、4
的第一个序列是组1, h_no
的第二个序列(1到7)是组2,依此类推.,如最后一列"g"所示.
I have the data.frame below. I want to add a column 'g' that classifies my data according to consecutive sequences in column h_no
. That is, the first sequence of h_no 1, 2, 3, 4
is group 1, the second series of h_no
(1 to 7) is group 2, and so on, as indicated in the last column 'g'.
h_no h_freq h_freqsq g
1 0.09091 0.008264628 1
2 0.00000 0.000000000 1
3 0.04545 0.002065702 1
4 0.00000 0.000000000 1
1 0.13636 0.018594050 2
2 0.00000 0.000000000 2
3 0.00000 0.000000000 2
4 0.04545 0.002065702 2
5 0.31818 0.101238512 2
6 0.00000 0.000000000 2
7 0.50000 0.250000000 2
1 0.13636 0.018594050 3
2 0.09091 0.008264628 3
3 0.40909 0.167354628 3
4 0.04545 0.002065702 3
推荐答案
您可以使用多种技术在数据中添加一列.下面的引号来自相关帮助文本 [[.. data.frame
.
You can add a column to your data using various techniques. The quotes below come from the "Details" section of the relevant help text, [[.data.frame
.
可以在几种模式下对数据帧建立索引.当
[
和[[
]与单个矢量索引(x [i]
或x [[i]]
),它们将数据框索引为列表.
Data frames can be indexed in several modes. When
[
and[[
are used with a single vector index (x[i]
orx[[i]]
), they index the data frame as if it were a list.
my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector
$
的data.frame方法将x
视为列表
The data.frame method for
$
, treatsx
as a list
my.dataframe$new.col <- a.vector
当
[
和[[
]与两个索引(x [i,j]
和x [[i,j]]
),它们就像索引矩阵一样
When
[
and[[
are used with two indices (x[i, j]
andx[[i, j]]
) they act like indexing a matrix
my.dataframe[ , "new.col"] <- a.vector
由于 data.frame
的方法假定如果您未指定要使用列还是行,则将假定您是指列.
Since the method for data.frame
assumes that if you don't specify if you're working with columns or rows, it will assume you mean columns.
以您的示例为例,这应该可行:
For your example, this should work:
# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))
# find where one appears and
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs
# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
len <- length(seq(from = x[1], to = y[1]))
return(rep(z, times = len))
})
# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)
no h_freq h_freqsq group
1 1 0.40998238 0.06463876 1
2 2 0.98086928 0.33093795 1
3 3 0.28908651 0.74077119 1
4 4 0.10476768 0.56784786 1
5 1 0.75478995 0.60479945 2
6 2 0.26974011 0.95231761 2
7 3 0.53676266 0.74370154 2
8 4 0.99784066 0.37499294 2
9 5 0.89771767 0.83467805 2
10 6 0.05363139 0.32066178 2
11 7 0.71741529 0.84572717 2
12 1 0.10654430 0.32917711 3
13 2 0.41971959 0.87155514 3
14 3 0.32432646 0.65789294 3
15 4 0.77896780 0.27599187 3
16 5 0.06100008 0.55399326 3
这篇关于为每个连续序列创建一个组号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!