选择匹配上方和下方的 N 行 [英] Select N rows above and below match

查看:20
本文介绍了选择匹配上方和下方的 N 行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择匹配项上方和下方的 N 行.

我正在尝试命令:

mtcars[which(mtcars$vs == 1) + c(-1:1), ]

它返回以下警告:

<块引用>

警告信息:其中(mtcars$vs == 1) + c(-1:1):较长的物体长度不是较短物体长度的倍数

解决方案

这似乎是一个简单的问题,但并不像预期的那样微不足道.

问题在于 which(mtcars$vs == 1) 返回一个向量而不是单个值:

<块引用>

[1] 3 4 6 8 9 10 11 18 19 20 21 26 28 32

如果向其中添加另一个向量 -1:1(即 c(-1L, 0L, 1L)),则对向量进行操作的正常 R 规则不等长度适用:The回收规则

<块引用>

任何短向量操作数都通过回收它们的值来扩展,直到它们匹配任何其他操作数的大小.

因此较短的向量-1:1会循环到which(mtcars$vs == 1)的长度,即

rep(-1:1, length.out = length(which(mtcars$vs == 1)))

<块引用>

 [1] -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0

因此,

的结果

which(mtcars$vs == 1) + -1:1

是两个向量元素的元素之和,其中较短的向量已被回收以匹配较长向量的长度.

<块引用>

 [1] 2 4 7 7 9 11 10 18 20 19 21 27 27 32

这可能不是 OP 所期望的.

另外,我们得到

<块引用>

警告信息:
其中(mtcars$vs == 1) + -1:1 :
较长的物体长度不是较短物体长度的倍数

因为which(mtcars$vs == 1) 的长度是 14,而 -1:1 的长度是 3.

使用outer()

的解决方案

为了选择每个匹配行上下的N行,我们需要在中添加-N:Nwhich(mtcars$vs == 1)返回的每个行号:

outer(which(mtcars$vs == 1), -1:1, `+`)[,1] [,2] [,3][1,] 2 3 4[2,] 3 4 5[3,] 5 6 7[4,] 7 8 9[5,] 8 9 10[6,] 9 10 11[7,] 10 11 12[8,] 17 18 19[9,] 18 19 20[10,] 19 20 21[11,] 20 21 22[12,] 25 26 27[13,] 27 28 29[14,] 31 32 33

现在,我们有一个包含所有行号的数组.不幸的是,它不能直接用于子集化,因为它包含重复项并且存在 mtcars 中不存在的行号.所以结果必须经过后处理"才能用于子集.

library(magrittr) # 管道用于清晰rn <-外(which(mtcars$vs == 1),-1:1,`+`)%>%as.vector() %>%唯一的()%>%过滤器(函数(x)x[1 <= x & x <= nrow(mtcars)],.)恩

<块引用>

 [1] 2 3 4 5 6 7 8 9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32

mtcars[rn, ]

<块引用>

 mpg cyl disp hp drat wt qsec vs am gear carb马自达 RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4达特桑 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1大黄蜂 4 驱动器 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2英勇 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1除尘器 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2默克 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2默克 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4默克 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3克莱斯勒帝国 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4菲亚特 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1本田思域 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2丰田卡罗拉 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1丰田电晕 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1道奇挑战者 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2庞蒂亚克火鸟 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2菲亚特 X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1保时捷 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2莲花欧罗巴 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2福特 Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4玛莎拉蒂宝来 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8沃尔沃 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

I would like to select N rows above and below a match.

I'm trying the command:

mtcars[which(mtcars$vs == 1) + c(-1:1), ]

It returns the follow warning:

Warning message: In which(mtcars$vs == 1) + c(-1:1): longer object length is not a multiple of shorter object length

解决方案

This seems to be a simple question but is not as trivial as presumably expected.

The issue is that which(mtcars$vs == 1) returns a vector rather than a single value:

[1]  3  4  6  8  9 10 11 18 19 20 21 26 28 32

If another vector -1:1 (which is c(-1L, 0L, 1L)) is added to it, the normal R rules for operations on vectors of unequal lengths apply: The recycling rule says

Any short vector operands are extended by recycling their values until they match the size of any other operands.

Therefore the shorter vector -1:1 will be recycled to the length of which(mtcars$vs == 1), i.e.,

rep(-1:1, length.out = length(which(mtcars$vs == 1)))

 [1] -1  0  1 -1  0  1 -1  0  1 -1  0  1 -1  0

Therefore, the result of

which(mtcars$vs == 1) + -1:1

is the element-wise sum of the elements of both vectors where the shorter vector has been recycled to match the length of the longer vector.

 [1]  2  4  7  7  9 11 10 18 20 19 21 27 27 32

which is propably not what the OP has expected.

In addition, we get the

Warning message:
In which(mtcars$vs == 1) + -1:1 :
longer object length is not a multiple of shorter object length

because which(mtcars$vs == 1) has length 14 and -1:1 has length 3.

Solution using outer()

In order to select the N rows above and below each matching row, we need to add -N:N to each row number returned by which(mtcars$vs == 1):

outer(which(mtcars$vs == 1), -1:1, `+`)

      [,1] [,2] [,3]
 [1,]    2    3    4
 [2,]    3    4    5
 [3,]    5    6    7
 [4,]    7    8    9
 [5,]    8    9   10
 [6,]    9   10   11
 [7,]   10   11   12
 [8,]   17   18   19
 [9,]   18   19   20
[10,]   19   20   21
[11,]   20   21   22
[12,]   25   26   27
[13,]   27   28   29
[14,]   31   32   33

Now, we have an array of all row numbers. Unfortunately, it cannot be used directly for subsetting because it contains duplicates and there are row numbers which do not exist in mtcars. So the the result has to be "post-processed" before it can be used for subsetting.

library(magrittr) # piping used for clarity
rn <- outer(which(mtcars$vs == 1), -1:1, `+`) %>% 
  as.vector() %>% 
  unique() %>% 
  Filter(function(x) x[1 <= x & x <= nrow(mtcars)], .)

rn

 [1]  2  3  4  5  6  7  8  9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32

mtcars[rn, ]

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

这篇关于选择匹配上方和下方的 N 行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆