对于每行,提取与单元格中的另一个值匹配的列名称中的值 [英] For each row extract the value in the column name that match another value in the cell

查看:133
本文介绍了对于每行,提取与单元格中的另一个值匹配的列名称中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,可以用for循环轻松解决。然而,由于我在数据帧中有数十万行,所以这需要很长的计算时间,因此我正在寻找一个快速而智能的解决方案。



对于每个在我的数据框中,我想粘贴列名匹配第一列(INDEX)的单元格的值。



数据框看起来像这样



 > mydata 
指数1 2 3 4 5 6
1 2 18.9 9.5 22.6 4.7 16.2 7.4
2 2 18.9 9.5 22.6 4.7 16.2 7.4
3 2 18.9 9.5 22.6 4.7 16.2 7.4
4 4 18.9 9.5 22.6 4.7 16.2 7.4
5 4 18.9 9.5 22.6 4.7 16.2 7.4
6 5 18.9 9.5 22.6 4.7 16.2 7.4

以下是复制代码:

  mydata<  -  data。 (INDEX = c(2,2,2,4,4,5),ONE =(rep(18.9,6)),TWO =(rep(9.5,6)),
THREE =(rep (代表(16,6)),FOUR =(rep(4.7,6)),FIVE =(rep(16.2,6)),SIX =(rep(7.4,6)))
colnames(mydata) c(INDEX,1,2,3,4,5,6)

这是新数据框与新计算的变量:

 > new_mydf 
索引1 2 3 4 5 6变量
3 2 18.9 9.5 22.6 4.7 16.2 7.4 9.5
2 2 18.9 9.5 22.6 4.7 16.2 7.4 9.5
1 2 18.9 9.5 22.6 4.7 16.2 7.4 9.5
5 4 18.9 9.5 22.6 4.7 16.2 7.4 4.7
4 4 18.9 9.5 22.6 4.7 16.2 7.4 4.7
6 5 18.9 9.5 22.6 4.7 16.2 7.4 16.2

我使用下面的for循环解决了这个问题,但是正如我在上面写的,我正在寻找一个更直接的解决方案像dplyr或其他函数?),因为循环对于我的扩展数据集来说缓慢

  id = mydata $ INDEX 
new_mydf< - data.frame()
for(i in 1:length(id)){
mydata_row< - mydata [i,]
value< - mydata_row $ INDEX
mydata_row [VARIABLE]< - mydata_row [,names(mydata_row)== value]
new_mydf< - rbind(mydata_row,new_mydf)
}
new_mydf< ; - new_mydf [order(new_mydf [,1])]]


解决方案

根据您的循环,使用应用与匿名函数可能会更快(与您的 mydata 初始定义):

  mydata $ VARIABLE< -apply(mydata, function(x){x [names(x)== x [names(x)==INDEX]]})

编辑:即使在 INDEX 中也可以使用字符:

 code> mydata<  -  data.frame(INDEX = c(B,B,B,D,D,E),A=(代表(18.9 ,6)),B=(rep(9.5,6)),
C=(rep(22.6,6)),D=(rep(4.7,6)),E =(rep(16.2,6)),F=(rep(7.4,6)))

mydata $ VARIABLE< -apply(mydata,1,function(x){x [ name(x)== x [names(x)==INDEX]]})

> mydata
指数ABCDEF VARIABLE
1 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5
2 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5
3 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5
4 D 18.9 9.5 22.6 4.7 16.2 7.4 4.7
5 D 18.9 9.5 22.6 4.7 16.2 7.4 4.7
6 E 18.9 9.5 22.6 4.7 16.2 7.4 16.2


I have a question which can be easily solved with a for-loop. However, since I have hundred-thousands rows in a dataframe, this would take very long computational time, and thus I am looking for a quick and smart solution.

For each row in my dataframe, I would like to paste the value of the cell whose column name matches the one from the first column (INDEX)

The dataframe looks like this

> mydata
  INDEX    1   2    3   4    5   6
1     2 18.9 9.5 22.6 4.7 16.2 7.4
2     2 18.9 9.5 22.6 4.7 16.2 7.4
3     2 18.9 9.5 22.6 4.7 16.2 7.4
4     4 18.9 9.5 22.6 4.7 16.2 7.4
5     4 18.9 9.5 22.6 4.7 16.2 7.4
6     5 18.9 9.5 22.6 4.7 16.2 7.4

Here's the code for reproducing it:

mydata <- data.frame(INDEX=c(2,2,2,4,4,5), ONE=(rep(18.9,6)), TWO=(rep(9.5,6)), 
                     THREE=(rep(22.6,6)), FOUR=(rep(4.7,6)), FIVE=(rep(16.2,6)), SIX=(rep(7.4,6)))
colnames(mydata) <- c("INDEX",1,2,3,4,5,6)

And this is the new dataframe with the newly calculated variable:

> new_mydf
  INDEX    1   2    3   4    5   6 VARIABLE
3     2 18.9 9.5 22.6 4.7 16.2 7.4      9.5
2     2 18.9 9.5 22.6 4.7 16.2 7.4      9.5
1     2 18.9 9.5 22.6 4.7 16.2 7.4      9.5
5     4 18.9 9.5 22.6 4.7 16.2 7.4      4.7
4     4 18.9 9.5 22.6 4.7 16.2 7.4      4.7
6     5 18.9 9.5 22.6 4.7 16.2 7.4     16.2

I solved it using the for-loop here below, but, as I wrote above, I am looking for a more straightforward solution (maybe using packages like dplyr, or other functions?), as the loop is to slow for my extended dataset

id = mydata$INDEX
new_mydf <- data.frame()
for (i in 1:length(id)) {
  mydata_row <- mydata[i,]
  value <- mydata_row$INDEX
  mydata_row["VARIABLE"] <- mydata_row[,names(mydata_row) == value]
  new_mydf <- rbind(mydata_row,new_mydf)
}
new_mydf <- new_mydf[ order(new_mydf[,1]), ] 

解决方案

Based on your loop, this use of apply with an anonymous function may be faster (with your mydata initial definition) :

mydata$VARIABLE<-apply(mydata, 1, function(x) { x[names(x)==x[names(x)=="INDEX"]] })

Edit : And it works even with INDEX in characters :

mydata <- data.frame(INDEX=c("B","B","B","D","D","E"), "A"=(rep(18.9,6)), "B"=(rep(9.5,6)), 
                 "C"=(rep(22.6,6)), "D"=(rep(4.7,6)), "E"=(rep(16.2,6)), "F"=(rep(7.4,6)))

mydata$VARIABLE<-apply(mydata, 1, function(x) { x[names(x)==x[names(x)=="INDEX"]] })

> mydata INDEX A B C D E F VARIABLE 1 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5 2 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5 3 B 18.9 9.5 22.6 4.7 16.2 7.4 9.5 4 D 18.9 9.5 22.6 4.7 16.2 7.4 4.7 5 D 18.9 9.5 22.6 4.7 16.2 7.4 4.7 6 E 18.9 9.5 22.6 4.7 16.2 7.4 16.2

这篇关于对于每行,提取与单元格中的另一个值匹配的列名称中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆