根据另一个数据框中的值替换数据框中的行元素 [英] Replacing row elements in a dataframe based on values from another dataframe

查看:99
本文介绍了根据另一个数据框中的值替换数据框中的行元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R还是很陌生,所以我希望有人可以帮助我.我的脚本之一中的输出表是下面的averagetable,它显示了三个不同群集中事件Standing的不同比例:

I'm fairly new to R so I hope somebody can help me. An output table in one of my scripts is the averagetable below showing different proportions of the event Standing in three different clusters:

> print(averagetable)
   Group.1  Standing
1 cluster1  0.5642857
2 cluster2  0.7795848
3 cluster3  0.7922980

请注意,每次我运行脚本时,R可以为averagetable$Standing上的值分配不同的群集名称(cluster1cluster2cluster3).另一个输出可以是:

Note that R can assign different cluster names (cluster1, cluster2 or cluster3) to the values on averagetable$Standing each time I'm running the scrip. Another output can be:

> print(averagetable)
   Group.1 Standing
1 cluster1 0.7795848
2 cluster2 0.5642857
3 cluster3 0.7922980

另一方面,我的脚本生成tableresults数据帧.请在下面找到一个head()示例:

On the other hand, my script produces the tableresults dataframe. Please find a head() sample below:

> head(tableresults)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        cluster3
2         20         14         24        cluster3
3         34         35         49        cluster3
4         18          5         19        cluster2
5         23         27         35        cluster3
6         33         20         39        cluster3

我的问题很简单.我想根据以下三个规则来更改tableresults中的数据,从而更改winning_cluster列中的字符串:

My question is fairly simple. I would like to transform the data in tableresults changing the string in the column winning_cluster based on three rules:

1)在tableresults$wining_cluster中写入Standing,将其替换为averagetable中具有最高Standing值的群集名称.

1) Write Standing in tableresults$wining_cluster replacing it by the cluster name having the highest Standing value in averagetable.

2)在tableresults$wining_cluster中写入Moving/Feeding,将其替换为averagetable中具有第二高Standing值的群集名称.

2) Write Moving/Feeding in tableresults$wining_cluster replacing it by the cluster name having the second highest Standing value in averagetable.

3)在tableresults$wining_cluster中写入Feeding/Moving,将其替换为averagetable中具有第三高Standing值的群集名称.

3) Write Feeding/Moving in tableresults$wining_cluster replacing it by the cluster name having the third highest Standing value in averagetable.

换句话说,这是所需的输出:

In other words, this is the output desired:

> head(tableresults_output)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        Standing
2         20         14         24        Standing
3         34         35         49        Standing
4         18          5         19        Moving/Feeding
5         23         27         35        Standing
6         33         20         39        Standing

请注意,具有一个基于值的层次结构组件将非常重要,该组件将根据averagetable值分配条件1)2)或3).使用以下方法无法解决此问题:

Note that it is very important to have a value-based, hierarchical component that will assign conditions 1) 2) or 3) depending on averagetable values. This is not solved by using:

averagetable$classification <- factor(x = as.character(sort(averagetable$Standing)),
                labels = c('Feeding/Moving', 'Moving/Feeding','Standing'))

使用此命令,Standing将始终链接到cluster1Moving/Feeding链接到cluster2,并且Feeding/Moving链接到cluster3,并且在重新生成averagetable时不一定是正确的.

With this command Standing will be always linked to cluster1, Moving/Feeding to cluster2 and Feeding/Moving to cluster3 and that is not necessarily true when averagetable is regenerated.

无论如何,我们将不胜感激,希望我的问题对论坛足够有趣.

Anyways, any help is appreciated and I hope my question was interesting enough for the forum.

推荐答案

这是一个刺路:


tableresults <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        cluster3
2         20         14         24        cluster3
3         34         35         49        cluster3
4         18          5         19        cluster2
5         23         27         35        cluster3
6         33         20         39        cluster3")

averagetable <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
   Group.1  Standing
1 cluster1  0.5642857
2 cluster2  0.7795848
3 cluster3  0.7922980")

averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ rank(-averagetable$Standing) ]
merge(tableresults, averagetable[,c(1,3)], by.x="winning_cluster", by.y="Group.1")
#   winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z              x
# 1        cluster2         18          5         19 Moving/Feeding
# 2        cluster3         19         21         28       Standing
# 3        cluster3         20         14         24       Standing
# 4        cluster3         34         35         49       Standing
# 5        cluster3         23         27         35       Standing
# 6        cluster3         33         20         39       Standing

这篇关于根据另一个数据框中的值替换数据框中的行元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆