根据另一个数据框中的值替换数据框中的行元素 [英] Replacing row elements in a dataframe based on values from another dataframe
问题描述
我对R还是很陌生,所以我希望有人可以帮助我.我的脚本之一中的输出表是下面的averagetable
,它显示了三个不同群集中事件Standing
的不同比例:
I'm fairly new to R so I hope somebody can help me. An output table in one of my scripts is the averagetable
below showing different proportions of the event Standing
in three different clusters:
> print(averagetable)
Group.1 Standing
1 cluster1 0.5642857
2 cluster2 0.7795848
3 cluster3 0.7922980
请注意,每次我运行脚本时,R可以为averagetable$Standing
上的值分配不同的群集名称(cluster1
,cluster2
或cluster3
).另一个输出可以是:
Note that R can assign different cluster names (cluster1
, cluster2
or cluster3
) to the values on averagetable$Standing
each time I'm running the scrip. Another output can be:
> print(averagetable)
Group.1 Standing
1 cluster1 0.7795848
2 cluster2 0.5642857
3 cluster3 0.7922980
另一方面,我的脚本生成tableresults
数据帧.请在下面找到一个head()
示例:
On the other hand, my script produces the tableresults
dataframe. Please find a head()
sample below:
> head(tableresults)
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 cluster3
2 20 14 24 cluster3
3 34 35 49 cluster3
4 18 5 19 cluster2
5 23 27 35 cluster3
6 33 20 39 cluster3
我的问题很简单.我想根据以下三个规则来更改tableresults
中的数据,从而更改winning_cluster
列中的字符串:
My question is fairly simple. I would like to transform the data in tableresults
changing the string in the column winning_cluster
based on three rules:
1)在tableresults$wining_cluster
中写入Standing
,将其替换为averagetable
中具有最高Standing
值的群集名称.
1) Write Standing
in tableresults$wining_cluster
replacing it by the cluster name having the highest Standing
value in averagetable
.
2)在tableresults$wining_cluster
中写入Moving/Feeding
,将其替换为averagetable
中具有第二高Standing
值的群集名称.
2) Write Moving/Feeding
in tableresults$wining_cluster
replacing it by the cluster name having the second highest Standing
value in averagetable
.
3)在tableresults$wining_cluster
中写入Feeding/Moving
,将其替换为averagetable
中具有第三高Standing
值的群集名称.
3) Write Feeding/Moving
in tableresults$wining_cluster
replacing it by the cluster name having the third highest Standing
value in averagetable
.
换句话说,这是所需的输出:
In other words, this is the output desired:
> head(tableresults_output)
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 Standing
2 20 14 24 Standing
3 34 35 49 Standing
4 18 5 19 Moving/Feeding
5 23 27 35 Standing
6 33 20 39 Standing
请注意,具有一个基于值的层次结构组件将非常重要,该组件将根据averagetable
值分配条件1)2)或3).使用以下方法无法解决此问题:
Note that it is very important to have a value-based, hierarchical component that will assign conditions 1) 2) or 3) depending on averagetable
values. This is not solved by using:
averagetable$classification <- factor(x = as.character(sort(averagetable$Standing)),
labels = c('Feeding/Moving', 'Moving/Feeding','Standing'))
使用此命令,Standing
将始终链接到cluster1
,Moving/Feeding
链接到cluster2
,并且Feeding/Moving
链接到cluster3
,并且在重新生成averagetable
时不一定是正确的.
With this command Standing
will be always linked to cluster1
, Moving/Feeding
to cluster2
and Feeding/Moving
to cluster3
and that is not necessarily true when averagetable
is regenerated.
无论如何,我们将不胜感激,希望我的问题对论坛足够有趣.
Anyways, any help is appreciated and I hope my question was interesting enough for the forum.
推荐答案
这是一个刺路:
tableresults <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 cluster3
2 20 14 24 cluster3
3 34 35 49 cluster3
4 18 5 19 cluster2
5 23 27 35 cluster3
6 33 20 39 cluster3")
averagetable <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Group.1 Standing
1 cluster1 0.5642857
2 cluster2 0.7795848
3 cluster3 0.7922980")
averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ rank(-averagetable$Standing) ]
merge(tableresults, averagetable[,c(1,3)], by.x="winning_cluster", by.y="Group.1")
# winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z x
# 1 cluster2 18 5 19 Moving/Feeding
# 2 cluster3 19 21 28 Standing
# 3 cluster3 20 14 24 Standing
# 4 cluster3 34 35 49 Standing
# 5 cluster3 23 27 35 Standing
# 6 cluster3 33 20 39 Standing
这篇关于根据另一个数据框中的值替换数据框中的行元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!