选择上下两个N行 [英] Select N rows above and below match

查看:113
本文介绍了选择上下两个N行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择一个匹配项上下的N行。



我正在尝试以下命令:

  mtcars [which(mtcars $ vs == 1)+ c(-1:1),] 

它返回以下警告:


警告消息:
其中( mtcars $ vs == 1)+ c(-1:1):
个较长的对象长度不是较短对象个数的倍数



解决方案

这似乎是一个简单的问题,但并不像想像的那么简单。



问题在于 which(mtcars $ vs == 1)返回向量而不是单个值:


  [1] 3 4 6 8 9 10 11 18 19 20 21 26 28 32 


如果另一个向量 -1:1 (即 c(-1L,0L ,1L))添加到其中,适用于对不等长向量进行操作的常规R规则:回收规则


任何短向量操作数都通过循环其值来扩展,直到
与任何其他操作数的大小匹配为止。


因此,较短的向量 -1:1 将被回收到的长度,其中(mtcars $ vs == 1 ),即

  rep(-1:1,length.out = length(which( mtcars $ vs == 1)))




  [1] -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 


因此,

 的结果(mtcars $ vs == 1 )+ -1:1 

是元素的明智之和两个向量的元素,其中较短的向量已被回收以匹配较长的向量的长度。


  [1] 2 4 7 7 9 11 10 18 20 19 21 27 27 32 


其中



此外,我们得到了


警告消息:

其中(mtcars $ vs == 1)+ -1:1:

较长的对象长度不是较短的对象长度的倍数


因为其中(mtcars $ vs == 1)的长度为14而 -1:1 的长度为3。



使用 outer()的解决方案



为了选择每个匹配行上方和下方的 N 行,我们需要将 -N:N 添加到每个 which(mtcars $ vs == 1)返回的行号

  outer(which(mtcars $ vs == 1),-1:1,`+` )

[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3, ] 5 6 7
[4,] 7 8 9
[5,] 8 9 10
[6,] 9 10 11
[7,] 10 11 12
[8,] 17 18 19
[9,] 18 19 20
[10,] 19 20 21
[11,] 20 21 22
[12,] 25 26 27
[13,] 27 28 29
[14,] 31 32 33

现在,我们有了一个包含所有行号的数组。不幸的是,它不能直接用于子集,因为它包含重复项并且 mtcars 中不存在行号。因此,结果必须先进行后处理,然后才能用于子集。

  library(magrittr)#为清楚起见使用
rn<-external(which(mtcars $ vs == 1),-1:1,`+`)%>%
as.vector()%&%;%
unique()%>%
Filter(function(x)x [1< = x& x< = nrow(mtcars)],。)

rn




  [1] 2 3 4 5 6 7 8 9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32 




  mtcars [rn,] 



<块报价>

  mpg cyl disp hp drat wt qsec vs am gear碳水化合物
马自达RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
英勇的18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
ster子360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
奔驰450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
克莱斯勒帝国14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
菲亚特128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
本田思域30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
丰田卡罗拉33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
丰田Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
道奇挑战者15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
庞蒂亚克火鸟19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
菲亚特X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
保时捷914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
莲花欧罗巴30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
福特Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
玛莎拉蒂宝来15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
沃尔沃142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2



I would like to select N rows above and below a match.

I'm trying the command:

mtcars[which(mtcars$vs == 1) + c(-1:1), ]

It returns the follow warning:

Warning message: In which(mtcars$vs == 1) + c(-1:1): longer object length is not a multiple of shorter object length

解决方案

This seems to be a simple question but is not as trivial as presumably expected.

The issue is that which(mtcars$vs == 1) returns a vector rather than a single value:

[1]  3  4  6  8  9 10 11 18 19 20 21 26 28 32

If another vector -1:1 (which is c(-1L, 0L, 1L)) is added to it, the normal R rules for operations on vectors of unequal lengths apply: The recycling rule says

Any short vector operands are extended by recycling their values until they match the size of any other operands.

Therefore the shorter vector -1:1 will be recycled to the length of which(mtcars$vs == 1), i.e.,

rep(-1:1, length.out = length(which(mtcars$vs == 1)))

 [1] -1  0  1 -1  0  1 -1  0  1 -1  0  1 -1  0

Therefore, the result of

which(mtcars$vs == 1) + -1:1

is the element-wise sum of the elements of both vectors where the shorter vector has been recycled to match the length of the longer vector.

 [1]  2  4  7  7  9 11 10 18 20 19 21 27 27 32

which is propably not what the OP has expected.

In addition, we get the

Warning message:
In which(mtcars$vs == 1) + -1:1 :
longer object length is not a multiple of shorter object length

because which(mtcars$vs == 1) has length 14 and -1:1 has length 3.

Solution using outer()

In order to select the N rows above and below each matching row, we need to add -N:N to each row number returned by which(mtcars$vs == 1):

outer(which(mtcars$vs == 1), -1:1, `+`)

      [,1] [,2] [,3]
 [1,]    2    3    4
 [2,]    3    4    5
 [3,]    5    6    7
 [4,]    7    8    9
 [5,]    8    9   10
 [6,]    9   10   11
 [7,]   10   11   12
 [8,]   17   18   19
 [9,]   18   19   20
[10,]   19   20   21
[11,]   20   21   22
[12,]   25   26   27
[13,]   27   28   29
[14,]   31   32   33

Now, we have an array of all row numbers. Unfortunately, it cannot be used directly for subsetting because it contains duplicates and there are row numbers which do not exist in mtcars. So the the result has to be "post-processed" before it can be used for subsetting.

library(magrittr) # piping used for clarity
rn <- outer(which(mtcars$vs == 1), -1:1, `+`) %>% 
  as.vector() %>% 
  unique() %>% 
  Filter(function(x) x[1 <= x & x <= nrow(mtcars)], .)

rn

 [1]  2  3  4  5  6  7  8  9 10 11 12 17 18 19 20 21 22 25 26 27 28 29 31 32

mtcars[rn, ]

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128          32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic       30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

这篇关于选择上下两个N行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆