使用正则表达式选择R数据帧中的行 [英] Using regexp to select rows in R dataframe
问题描述
我正在尝试在数据框中选择列中包含的字符串与正则表达式或子字符串匹配的行:
I'm trying to select rows in a dataframe where the string contained in a column matches either a regular expression or a substring:
dataframe:
aName bName pName call alleles logRatio strength
AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.54909
AX-11086564 B05_CD2920 2011-01-27_R6 AB CG -0.183802 9.766334
AX-11086564 D04_CD5950 2011-02-09_R9 AB CG 0.162586 10.165051
AX-11086564 D07_CD6025 2011-02-10_R10 AB CG -0.397097 9.940238
AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
AX-11086564 A01_CD2588 2011-01-27_R5 BB GG -1.208094 9.239801
例如,我想要一个数据框,只包含列 bNa中包含
。其次,我希望在 ADN
的行我 bName
列中包含 ADN
的所有行,并且匹配 2011- 02-10_R2
列 pName
。
For example, I want a dataframe containing only rows that contain ADN
in column bName
. Secondarily, I would like all rows that contain ADN
in column bName
and that match 2011-02-10_R2
in column pName
.
我尝试使用函数 grep()
, agrep()
等等,但没有成功...
I tried using functions grep()
, agrep()
and more but without success...
推荐答案
subset(dat, grepl("ADN", bName) & pName == "2011-02-10_R2" )
注意& (而不是&&它没有矢量化),而==(而不是=这是作业)。
Note "&" (and not "&&" which is not vectorized) and that "==" (and not"=" which is assignment).
请注意,你可以已使用:
Note that you could have used:
dat[ with(dat, grepl("ADN", bName) & pName == "2011-02-10_R2" ) , ]
...这可能在功能内部使用时更为可取,这将返回dat $ pName为NA的任何行的NA值。该缺陷(某些视为特征)可以通过添加& !is.na(dat $ pName)
到逻辑表达式。
... and that might be preferable when used inside functions, however, that will return NA values for any lines where dat$pName is NA. That defect (which some regard as a feature) could be removed by the addition of & !is.na(dat$pName)
to the logical expression.
这篇关于使用正则表达式选择R数据帧中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!