对具有多个条件的数据框进行子集 [英] subset a data.frame with multiple conditions
问题描述
假设我的数据如下所示:
Suppose my data looks like this:
2372 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.3 05/07/2006
9104 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.34 07/23/2006
9212 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.33 02/11/2007
2094 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.4 05/06/2007
16763 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.61 05/11/2009
1076 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.48 05/12/2002
1077 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.3 05/07/2006
我想要能够通过Analyte进行子集,并且部分匹配日期(即我只想要一年)。我一直在尝试这个,但我知道这是不对的。
I want to be able to subset by the Analyte and a partial match on the date(namely I just want the year). I have been trying this, but I know it isn't quite right.
data[data$Analyte=="ATRAZINE" & grep("2006",as.character(data$Date)),]
任何建议?
推荐答案
对于这个问题,我将在Apprentice Queue的解答中提取年份的答案,而不是使用通用字符串匹配。我建议:
For this problem I would go with the approach in Apprentice Queue's answer of extracting the year from the date rather than doing generic string matching. I would suggest:
data[data$Analyte =="ATRAZINE"
& as.POSIXlt(data$Date, format="%m/%d/%Y")$year == 106]
但是,如果你真的要做正则表达式匹配,你可以使用 grepl
返回一个逻辑向量,而不是 grep
它返回一个索引向量。
But if you really had to do regexp matching, you could use grepl
which returns a logical vector rather than grep
which returns a vector of indices.
data[data$Analyte=="ATRAZINE" & grepl("2006",as.character(data$Date)),]
这篇关于对具有多个条件的数据框进行子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!