数据帧中的R部分匹配 [英] R partial match in data frame
问题描述
如何解决数据框中的部分匹配? 可以说这是我的df df
V1 V2 V3 V4
1 ABC 1.2 4.3 A
2 CFS 2.3 1.7 A
3 dgf 1.3 4.4 A
,并且我想添加一个列V5,其中仅当V1中的值名称中包含"f"时才包含数字111,并且仅当V1中的值包含"gf"时才包含数字222.我会因为几个值包含一个"f"而遇到问题吗?还是我执行命令的顺序可以解决这个问题?
我尝试过类似的事情:
df$V5<- ifelse(df$V1 = c("*f","*gf"),c=(111,222) )
但它不起作用.
主要问题是如何告诉R查找部分匹配"?
感谢您的帮助!
除了按顺序为"f", "gf", ...
设置值的解决方案外,还有必要了解零宽超前/后向的正则表达式功能.>
如果您要grep包含"f"
但不包含"gf"
的所有行,则可以
v1 <- c("abc", "f", "gf" )
grep( "(?<![g])f" , v1, perl= TRUE )
[1] 2
,如果您只想grep包含"f"
但不包含"fg"
v2 <- c("abc", "f", "fg")
grep( "f(?![g])" , v2, perl= TRUE )
[1] 2
当然可以混合使用:
v3 <- c("abc", "f", "fg", "gf")
grep( "(?<![g])f(?![g])" , v3, perl= TRUE )
[1] 2
因此,您可以这样做
df[ grep( "(?<![g])f" , df$V1, perl= TRUE ), "V5" ] <- 111
df[ grep( "gf" , df$V1, perl= TRUE ), "V5" ] <- 222
How can I address a partial match in a data frame? Lets say this is my df df
V1 V2 V3 V4
1 ABC 1.2 4.3 A
2 CFS 2.3 1.7 A
3 dgf 1.3 4.4 A
and I want to add a column V5 containing a number 111 only if the value in V1 contains a "f" in the name and a number 222 only if the value in V1 contains a "gf". Will I get problems since several values contain an "f" - or does the order I ender the commands will take care of it?
I tried something like:
df$V5<- ifelse(df$V1 = c("*f","*gf"),c=(111,222) )
but it does not work.
Main problem is how can I tell R to look for "partial match"?
Thanks a million for your help!
Besides the solution setting the values in a sequence for "f", "gf", ...
it's worth to have a look at regular expressions capability for zero-width lookahead / lookbehind.
If you want to grep all rows which contain "f"
but not "gf"
you can
v1 <- c("abc", "f", "gf" )
grep( "(?<![g])f" , v1, perl= TRUE )
[1] 2
and if you want to grep only those which contain "f"
but not "fg"
v2 <- c("abc", "f", "fg")
grep( "f(?![g])" , v2, perl= TRUE )
[1] 2
And of course you can mix that:
v3 <- c("abc", "f", "fg", "gf")
grep( "(?<![g])f(?![g])" , v3, perl= TRUE )
[1] 2
So for your case you can do
df[ grep( "(?<![g])f" , df$V1, perl= TRUE ), "V5" ] <- 111
df[ grep( "gf" , df$V1, perl= TRUE ), "V5" ] <- 222
这篇关于数据帧中的R部分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!