根据列中的部分字符串匹配选择数据帧行 [英] Selecting data frame rows based on partial string match in a column
问题描述
我想根据一列中字符串的部分匹配情况从数据框中选择行,例如列"x"包含字符串"hsa".使用 sqldf
- if ,它具有类似 like
的语法-我会做类似的事情:
I want to select rows from a data frame based on partial match of a string in a column, e.g. column 'x' contains the string "hsa". Using sqldf
- if it had a like
syntax - I would do something like:
从<>中选择*;其中x喜欢'hsa'
.
不幸的是, sqldf
不支持该语法.
Unfortunately, sqldf
does not support that syntax.
或类似地:
selectedRows <- df[ , df$x %like% "hsa-"]
当然不起作用.
有人可以帮我吗?
推荐答案
我注意到您在当前方法中提到了一个函数%like%
.我不知道这是否是对"data.table"中%like%
的引用,但是如果是这样,则可以按如下方式使用它.
I notice that you mention a function %like%
in your current approach. I don't know if that's a reference to the %like%
from "data.table", but if it is, you can definitely use it as follows.
请注意,该对象不必是 data.table
(但还要记住, data.frame
s和 data.table
s不同):
Note that the object does not have to be a data.table
(but also remember that subsetting approaches for data.frame
s and data.table
s are not identical):
library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]
如果这就是您所拥有的,那么也许您只是混合了行和列的位置来设置数据.
If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.
如果不想加载程序包,可以尝试使用 grep()
搜索您要匹配的字符串.这是 mtcars
数据集的示例,其中我们匹配行名包含"Merc"的所有行:
If you don't want to load a package, you can try using grep()
to search for the string you're matching. Here's an example with the mtcars
dataset, where we are matching all rows where the row names includes "Merc":
mtcars[grep("Merc", rownames(mtcars)), ]
mpg cyl disp hp drat wt qsec vs am gear carb
# Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
# Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
# Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3
还有另一个示例,使用 iris
数据集搜索字符串 osa
:
And, another example, using the iris
dataset searching for the string osa
:
irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
对于您的问题,请尝试:
For your problem try:
selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]
这篇关于根据列中的部分字符串匹配选择数据帧行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!