使用grep来帮助子集R中的数据帧 [英] Using grep to help subset a data frame in R
问题描述
我的数据框:
x< - c(G448,G459,G479,G406)
y< - c(1:4)
我的数据< - data.frame(x,y)
我尝试过:
子集(My.Data,x ==G45 *)
但我不确定如何使用通配符。我也尝试过grep()来找到这些指标:
grep(G45 *,My.Data $ x)
但它返回所有4行,而不仅仅是那些开始G45,可能也是因为我不确定如何使用通配符。
使用 [
提取: / p>
grep
将为您提供与搜索模式匹配的位置(除非您使用 value = TRUE
)。
grep(^ G45,My.Data $ x)
#[1] 2
由于您正在单列的值中搜索,实际上对应于行索引。所以,使用 [
(您将使用 My.Data [rows,cols]
获取特定行和列)
My.Data [grep(^ G45,My.Data $ x),]
#xy
#2 G459 2
子集
的帮助页面显示了如何使用 grep
和 grepl
与子集
如果您喜欢使用此功能 [
。这是一个例子。
子集(My.Data,grepl(^ G45,My.Data $ x))
#xy
#2 G459 2
I am having trouble subsetting my data. I want the data subsetted on column x, where the first 3 characters begin G45.
My data frame:
x <- c("G448", "G459", "G479", "G406")
y <- c(1:4)
My.Data <- data.frame (x,y)
I have tried:
subset (My.Data, x=="G45*")
But I am unsure how to use wildcards. I have also tried grep() to find the indicies:
grep ("G45*", My.Data$x)
but it returns all 4 rows, rather than just those beginning G45, probably also as I am unsure how to use wildcards.
It's pretty straightforward using [
to extract:
grep
will give you the position in which it matched your search pattern (unless you use value = TRUE
).
grep("^G45", My.Data$x)
# [1] 2
Since you're searching within the values of a single column, that actually corresponds to the row index. So, use that with [
(where you would use My.Data[rows, cols]
to get specific rows and columns).
My.Data[grep("^G45", My.Data$x), ]
# x y
# 2 G459 2
The help-page for subset
shows how you can use grep
and grepl
with subset
if you prefer using this function over [
. Here's an example.
subset(My.Data, grepl("^G45", My.Data$x))
# x y
# 2 G459 2
这篇关于使用grep来帮助子集R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!