根据具体的行值将列添加到数据帧(2) [英] Add column to dataframe depending on specific row values (2)
问题描述
这里有一个例子我的data.frame:
df< - read.table(text ='ID日数
33012 9526 4
35004 9526 4 $
37006 9526 4
37008 9526 4
21009 1913 3
24005 1913 3
25009 1913 3
22317 2286 2
37612 2286 2
25009 14329 1
48007 9527 0
88662 9528 0
1845 9528 0
8872 2287 0
49002 1914 0
1664 1915 0',标题= TRUE)
我需要添加一个新列( new_col
)到我的data.frame,其中包含从1到4的值。这些 new_col
值必须包括,每一天(x)天(x + 1) )和day(x +2),其中x = 9526,1913,2286,14329(列 Day
)。
<我的好消息应该如下:
ID日数new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
22317 2286 2 3
37612 2286 2 3
25009 14329 1 4
48007 9527 0 1
88662 9528 0 1
1845 9528 0 1
8872 2287 0 3
49002 1914 0 2
1664 1915 0 2
按 new_col
将会:
ID日数new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
48007 9527 0 1
88662 9528 0 1
1845 9528 0 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
49002 1914 0 2
1664 1915 0 2
22317 2286 2 3
37612 2286 2 3
8872 2287 0 3
25009 14329 1 4
我的实际数据框比例子更复杂 Count
列中的更多列和更多值)。
@mrbrick在我上一个问题中建议我的代码(根据特定行值将列添加到数据框 )如下:
x< - c(1913,2286,9526,14329)
df $ new_col < - cut(df $ Day,c(-Inf,x,Inf))
df $ new_col< - as.numeric(factor(df $ new_col,levels = unique(df $ new_col)))
但它只适用于第x天,第x -1天和第x -2天。
任何建议都将非常有用。
Day
在不同顺序组中的值是这样的:删除 Day
的最后两位数字标识每个组转换剩下的内容以序号作为标签的因素。
g< - df $ Day%/%100
u< - unique )
transform(df,new_col = factor(g,levels = u,labels = seq_along(u)))
给出:
ID日数计数new_col
1 33012 9526 4 1
2 35004 9526 4 1
3 37006 9526 4 1
4 37008 9526 4 1
5 21009 1913 3 2
6 24005 1913 3 2
7 25009 1913 3 2
8 22317 2286 2 3
9 37612 2286 2 3
10 25009 14329 1 4
11 48007 9527 0 1
12 88662 9528 0 1
13 1845 9528 0 1
14 8872 2287 0 3
15 49002 1914 0 2
16 1664 1915 0 2
另一种可能性是用以下之一替换 g < - ...
行:
(a)已知数量的组使用 kmeans
与适当数量的集群:
g < - kmeans(df $ Day,4)$ cluster
(b)手动设置或手动设置中心,并使用它来启动 kmeans
:
中心< - c(1913,2286,9526,14329)+ 1
$ p (c)检查x-1和x-2 或派生
g< - kmeans(df $ day,centers)$ cluster
中心
像这样。如果一天x
则没有x-1
或x-2
然后x
必须是序列中的第一个,所以我们选择这样的值,并添加1来获取中心。不同于(a)要求我们知道聚类的数量,(b)哪一个要求我们知道实际的序列,这个序列不需要这些序列。
<$ p中心< - 与(df,唯一(Day [!((Day-1)%%日)&!((Day-2)%in%Day)])+ 1)
g< - kmeans(df $ Day,centers)$ cluster
(d)简化最后一点,或者如果我们保证,如果 x
是序列中的第一个,则x,x + 1和x + 2全部出现,那么我们可以确定,如果有一个no x-1
,那么 x
是序列中的第一个,所以我们可以简化(c)到:
#假设x,x + 1,x + 2都显示为每个序列
中心< - with(df,unique(Day [!(Day-1)%in%Day])+ 1)
g< - kmeans(df $ Day,centers)$ cluster
解决方案应该工作,如果组是充分分离和基于在任务中显示的数据似乎他们是。
I have to adjust a code which works perfectly with a different data.frame but with similar conditions.
Here an example of my data.frame:
df <- read.table(text = 'ID Day Count
33012 9526 4
35004 9526 4
37006 9526 4
37008 9526 4
21009 1913 3
24005 1913 3
25009 1913 3
22317 2286 2
37612 2286 2
25009 14329 1
48007 9527 0
88662 9528 0
1845 9528 0
8872 2287 0
49002 1914 0
1664 1915 0', header = TRUE)
I need to add a new column (new_col
) to my data.frame which contains values from 1 to 4. These new_col
values have to include, each one, day (x) day (x +1) and day (x +2), where x = 9526, 1913, 2286, 14329 (column Day
).
My output should be the following:
ID Day Count new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
22317 2286 2 3
37612 2286 2 3
25009 14329 1 4
48007 9527 0 1
88662 9528 0 1
1845 9528 0 1
8872 2287 0 3
49002 1914 0 2
1664 1915 0 2
The data.frame ordered by new_col
will be then:
ID Day Count new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
48007 9527 0 1
88662 9528 0 1
1845 9528 0 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
49002 1914 0 2
1664 1915 0 2
22317 2286 2 3
37612 2286 2 3
8872 2287 0 3
25009 14329 1 4
My real data.frame is more complex than the example (i.e. more columns and more values in the Count
column).
The code that @mrbrick suggested me in my previous question (Add column to dataframe depending on specific row values) is the following:
x <- c(1913, 2286, 9526, 14329)
df$new_col <- cut(df$Day, c(-Inf, x, Inf))
df$new_col <- as.numeric(factor(df$new_col, levels=unique(df$new_col)))
But it works only with day x, day x -1 and day x -2.
Any suggestion will be really helpful.
Assuming that the Day
values in the different sequential groups are such that dropping the last two digits of Day
identifies each group convert what is left to a factor with sequence numbers as labels. No packages are used.
g <- df$Day %/% 100
u <- unique(g)
transform(df, new_col = factor(g, levels = u, labels = seq_along(u)))
giving:
ID Day Count new_col
1 33012 9526 4 1
2 35004 9526 4 1
3 37006 9526 4 1
4 37008 9526 4 1
5 21009 1913 3 2
6 24005 1913 3 2
7 25009 1913 3 2
8 22317 2286 2 3
9 37612 2286 2 3
10 25009 14329 1 4
11 48007 9527 0 1
12 88662 9528 0 1
13 1845 9528 0 1
14 8872 2287 0 3
15 49002 1914 0 2
16 1664 1915 0 2
Another possibility is to replace the g <- ...
line with one of the following:
(a) known number of groups use kmeans
with the the appropriate number of clusters:
g <- kmeans(df$Day, 4)$cluster
(b) manually set or manually set centers and use that to initiate kmeans
:
centers <- c(1913, 2286, 9526, 14329) + 1
g <- kmeans(df$day, centers)$cluster
(c) check x-1 and x-2 or derive centers
like this. If for a day x
there is no x-1
or x-2
then x
must be the first in the sequence so we pick out such values and add 1 to get the centers. Unlike (a) which requires that we know the number of clusters and (b) which requires that we know the actual sequences this one does not require that these be known.
centers <- with(df, unique(Day[ ! ((Day-1) %in% Day) & ! ((Day-2) %in% Day) ]) + 1)
g <- kmeans(df$Day, centers)$cluster
(d) simplication of last point or if we are guarantted that if x
is the first in the sequence then x, x+1 and x+2 all appear then we can be sure that x
is the first in the sequence if there is a no x-1
so we can simplify (c) to:
# assumes x, x+1, x+2 all appear for each sequence
centers <- with(df, unique(Day[ ! (Day-1) %in% Day ]) + 1)
g <- kmeans(df$Day, centers)$cluster
The kmeans
solutions should work if the groups are sufficiently separated and based on the data shown in the question it seems that they are.
这篇关于根据具体的行值将列添加到数据帧(2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!