R中的负面后视与多词分离 [英] Negative lookbehind in R with multi-word separation

查看：117 发布时间：2018/5/28 19:44:05 r regex grep lookbehind

本文介绍了R中的负面后视与多词分离的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用R来进行一些字符串处理，并且想要识别具有某个词根的字符串，而这些字符串不是由某个词根的另一个词语开头的。

这是一个简单的玩具示例。假设我想识别字符串中任何字符串中没有dog / s的单词cat / s。

 测试= c（
狗猫，
狗和猫，
狗和猫，
狗和蓬松的猫，
猫和狗，
猫和狗，
蓬松的猫和蓬松的狗）

使用这个模式，我可以把 cat：

  pattern =（dog（s |）。*）（cat（s |））
 grep（pattern，tests，perl = TRUE，value = TRUE）
 
 [1]dog catdog and catsdog and catdog and fluffy cats

我的负面lookbehind存在问题：

  neg_pattern =（？<！dog（s |）。*）（cat（s |））
 grep（neg_pattern，tests，perl = TRUE，value = TRUE）

grep中的错误（neg_pattern，tests，perl = TRUE，value = TRUE）：
无效正则表达式

另外：警告消息：
在grep（neg_pattern，tests，perl = TRUE ，value = TRUE）：
PCRE模式编译错误
'lookbehind断言不是固定长度'
at'）（cat（s |））'

据我所知，*不是固定长度，所以我怎样才能拒绝在cat之前有任何其他单词分隔的dog的字符串？

解决方案

我希望这可以帮助您：

 <$ c 
狗和猫，
狗和猫，
狗和蓬松的猫，
狗猫 b猫和狗，
猫和狗，
蓬松的猫和蓬松的狗
）
 
＃删除有狗后有猫的琴弦
 tests = tests [-grep（pattern =dog（？：s |）。* cat（？：s |），x = tests）] 
 
＃只选择包含cats 
 tests = tests [grep（pattern =cat（？：s |），x = tests）] 
 
 tests 
 $ b [1]cats和狗猫和狗
 [3]蓬松的猫和蓬松的狗

我不确定您是否想用一个表达式来完成此操作，但是当迭代应用时，
Regex仍然非常有用。 / p>

I'm using R to do some string processing, and would like to identify the strings that have a certain word root that are not preceded by another word of a certain word root.

Here is a simple toy example. Say I would like to identify the strings that have the word "cat/s" not preceded by "dog/s" anywhere in the string.
tests = c( "dog cat", "dogs and cats", "dog and cat", "dog and fluffy cats", "cats and dogs", "cat and dog", "fluffy cats and fluffy dogs")
Using this pattern, I can pull the strings that do have dog before cat:
pattern = "(dog(s|).*)(cat(s|))" grep(pattern, tests, perl = TRUE, value = TRUE) [1] "dog cat" "dogs and cats" "dog and cat" "dog and fluffy cats"
My negative lookbehind is having problems:
neg_pattern = "(?<!dog(s|).*)(cat(s|))" grep(neg_pattern, tests, perl = TRUE, value = TRUE)

Error in grep(neg_pattern, tests, perl = TRUE, value = TRUE) : invalid regular expression

In addition: Warning message: In grep(neg_pattern, tests, perl = TRUE, value = TRUE) : PCRE pattern compilation error 'lookbehind assertion is not fixed length' at ')(cat(s|))'

I understand that .* is not fixed length, so how can I reject strings that have "dog" before "cat" separated by any number of other words?
解决方案
I hope that this can help:
tests = c( "dog cat", "dogs and cats", "dog and cat", "dog and fluffy cats", "cats and dogs", "cat and dog", "fluffy cats and fluffy dogs" ) # remove strings that have cats after dogs tests = tests[-grep(pattern = "dog(?:s|).*cat(?:s|)", x = tests)] # select only strings that contain cats tests = tests[grep(pattern = "cat(?:s|)", x = tests)] tests [1] "cats and dogs" "cat and dog" [3] "fluffy cats and fluffy dogs"
I'm not sure if you wanted to do this with one expression, but Regex can still be very useful when applied iteratively.

这篇关于R中的负面后视与多词分离的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R中的负面后视与多词分离 [英] Negative lookbehind in R with multi-word separation

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中的负面后视与多词分离 [英] Negative lookbehind in R with multi-word separation

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭