R:如何在数据框中进行偏移和匹配? [英] R: How to offset and match within a dataframe?

查看:86
本文介绍了R:如何在数据框中进行偏移和匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用类似于Excel的OFFSET和MATCH函数的功能,这是一个示例数据集: 数据=

I would like to use something similar to the OFFSET and MATCH functions of Excel, here is an example data set: data=

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |
Test2      |FALSE   |25%    |0      |
Test1      |TRUE    |16%    |0      |
Test3      |FALSE   |12%    |1      |

结果列应显示为:

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |TRUE
Test2      |FALSE   |25%    |0      |25%
Test1      |TRUE    |16%    |0      |TRUE
Test3      |FALSE   |12%    |1      |1

在最后的RESULT列中,我想要搜索哪个"测试的测试结果?柱子.在此示例中,RESULT列可以返回例如数字或字符串.在Excel中,公式为:

In the final RESULT column I would like the test result of searching the Which test? column. In this example the RESULT column could return, for example, numbers or strings. In the Excel formula would be:

=OFFSET($A$1, ROW()-1,MATCH(A2,$B$1:$D$1,0))

到目前为止,我已经尝试使用sapply列出测试,并将其返回给另一个函数,例如which(colnames ...,这就是我遇到的问题.

I have tried to list the Tests using sapply so far and return this to another function such as which(colnames... and this is where I am stuck.

非常感谢!

推荐答案

我将和sapply一起去:

I'll go with sapply:

data <- read.table(text="Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |
Test2      |FALSE   |25%    |0      |
Test1      |TRUE    |16%    |0      |
Test3      |FALSE   |12%    |1      |", 
 header=T, 
 sep="|",
 stringsAsFactors=F,
 strip.white=T)

data$RESULT <- sapply( 1:nrow(data), function(x) { data[x,data[x,1]] })

对于每一行,获取目标列data[x,1](内部访问),对于此列,获取行值data[x,...].

For each row, get the target column data[x,1] (the inner access), and for this column get the row value data[x,...].

输出:

> data
  Which.Test. Test1 Test2 Test3 RESULT Result
1       Test1  TRUE   80%     0     NA   TRUE
2       Test2 FALSE   25%     0     NA    25%
3       Test1  TRUE   16%     0     NA   TRUE
4       Test3 FALSE   12%     1     NA      1

有两个变量,sapply中的函数将是:

With two vars the function in the sapply would be:

function(x) {
 tcol <- data[x,1] # First column value of row x
 data[x,tcol]) # Get the value at row x and column tcol
}

使用Map/mapply的方法是提供'i'(seq(nrow(data))),'j'(match(data$Which.Test., names(data)))行/列索引,并使用[从'data'中提取元素.我们用list包装,以使数据"保留为单个data.frame,并将在长度"i","j"中循环使用.

An approach using Map/mapply would be to provide the 'i' (seq(nrow(data))), 'j' (match(data$Which.Test., names(data))) row/column index and use [ to extract the elements from the 'data'. We wrap with list so that the 'data' remains as a single data.frame and will recycle through the lengths of 'i', 'j'.

 mapply(`[`, list(data), seq(nrow(data)), match(data$Which.Test., names(data) ) )
 #[1] "TRUE" "25%"  "TRUE" "1"   


尽管如此,可能的矢量化方法只是


Though, a possible vectorized approach would be just

data[cbind(1:nrow(data), match(data$Which.Test., names(data)))]
## [1] " TRUE" "25%"   " TRUE" "1"  

这会将Which.Test.中的值与data的列名进行匹配,并返回匹配列的索引.然后,我们通过使用cbind将其与1:nrow(data)组合在一起,将每一行的这些列作为子集.

This is matching the values in Which.Test. against the column names of data and returning the index of the matched column. Then, we subset these columns per each row by combining it with 1:nrow(data) using cbind.

上面@DavidArenburg解决方案的更详细说明(因为我不得不花一些时间来完全理解它):

More detailed explanation of @DavidArenburg solution above (as I had to spend some time to understand it fully):

子集运算符接受矩阵,所以我们这样做:

The subset operator accepts a matrix so we do:

  1. 1:nrow(data)容易,它给出了一个与数据集中的行数相对应的向量[1] 1 2 3 4
  2. match(data$Which.Test., names(data)))给出每个匹配测试[1] 1 2 3 4
  3. 的索引
  4. cbind(..,..)将前面的两个点绑定以构建矩阵:

  1. 1:nrow(data) easy it gives a vector [1] 1 2 3 4 corresponding to the number of rows in our dataset
  2. match(data$Which.Test., names(data))) giving the index of each matching test [1] 1 2 3 4
  3. cbind(..,..)bind our two preceding point to build a matrix:

     [,1] [,2]
[1,]    1    2
[2,]    2    3
[3,]    3    2
[4,]    4    4

我们看到此矩阵匹配我们希望获取其值的列的每一行.因此,当将此矩阵作为数据集的选择器时,我们将获得正确的结果.然后,我们可以将其分配给新变量或df的新列.

We see this matrix match for each row the column we wish to take the value of. So when giving this matrix as the selector of our dataset we get the correct results. Then we can assign it to a new variable or to a new column of the df.

这篇关于R:如何在数据框中进行偏移和匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆