根据列中的部分字符串匹配选择数据框行 [英] Selecting data frame rows based on partial string match in a column

查看:25
本文介绍了根据列中的部分字符串匹配选择数据框行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据列中字符串的部分匹配从数据框中选择行,例如x"列包含字符串hsa".使用 sqldf - if 它有一个 like 语法 - 我会做这样的事情:

I want to select rows from a data frame based on partial match of a string in a column, e.g. column 'x' contains the string "hsa". Using sqldf - if it had a like syntax - I would do something like:

select * from <>其中 x 喜欢 'hsa'.

不幸的是,sqldf 不支持该语法.

Unfortunately, sqldf does not support that syntax.

或类似:

selectedRows <- df[ , df$x %like% "hsa-"]

这当然行不通.

有人可以帮我解决这个问题吗?

Can somebody please help me with this?

推荐答案

我注意到您在当前的方法中提到了一个函数 %like%.我不知道这是否是对data.table"中 %like% 的引用,但如果是,您绝对可以按如下方式使用它.

I notice that you mention a function %like% in your current approach. I don't know if that's a reference to the %like% from "data.table", but if it is, you can definitely use it as follows.

请注意,对象不必是 data.table(但也要记住 data.frames 和 data.tables 不相同):

Note that the object does not have to be a data.table (but also remember that subsetting approaches for data.frames and data.tables are not identical):

library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]

如果这就是您所拥有的,那么您可能只是混淆了行和列位置以设置子集数据.

If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.

如果不想加载包,可以尝试使用 grep() 来搜索匹配的字符串.以下是 mtcars 数据集的示例,我们将匹配行名称包含Merc"的所有行:

If you don't want to load a package, you can try using grep() to search for the string you're matching. Here's an example with the mtcars dataset, where we are matching all rows where the row names includes "Merc":

mtcars[grep("Merc", rownames(mtcars)), ]
             mpg cyl  disp  hp drat   wt qsec vs am gear carb
# Merc 240D   24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
# Merc 230    22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# Merc 280    19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# Merc 280C   17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
# Merc 450SE  16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
# Merc 450SL  17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
# Merc 450SLC 15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3

另外一个例子,使用iris数据集搜索字符串osa:

And, another example, using the iris dataset searching for the string osa:

irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

对于您的问题,请尝试:

For your problem try:

selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]

这篇关于根据列中的部分字符串匹配选择数据框行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆