基于正则表达式模式从向量中排除元素 [英] Exclude elements from vector based on regular expression pattern

查看:32
本文介绍了基于正则表达式模式从向量中排除元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据要使用 R 中的正则表达式进行清理.

I have some data which I want to clean up using a regular expression in R.

很容易找到如何获取包含某些模式或不包含某些单词(字符串)的元素,但我不知道如何排除包含模式的单元格.

It is easy to find how to get elements that contain certain patterns, or do not contain certain words (strings), but I can't find out how to do this for excluding cells containing a pattern.

如何使用通用函数来只保留那些不包含PATTERN的矢量元素?

How could I use a general function to only keep those elements from a vector which do not contain PATTERN?

我不想举个例子,因为这可能会导致人们使用其他(虽然通常很好)的方式来回答而不是预期的方式:基于正则表达式排除.无论如何,这里是:

I prefer not to give an example, as this might lead people to answer using other (though usually nice) ways than the intended one: excluding based on a regular expression. Here goes anyway:

如何排除包含以下任何字符的所有元素:'pyfgcrl

How to exclude all the elements that contain any of the following characters: 'pyfgcrl

vector <- c("Cecilia", "Cecily", "Cecily's", "Cedric", "Cedric's", "Celebes", 
            "Celebes's", "Celeste", "Celeste's", "Celia", "Celia's", "Celina")

在这种情况下,结果将是一个空向量.

The result would be an empty vector in this case.

推荐答案

从评论中,经过一些测试,人们会发现我的建议是不正确的.

From the comments, and with a little testing, one would find that my suggestion wasn't correct.

这里有两个正确的解决方案:

Here are two correct solutions:

vector[!grepl("['pyfgcrl]", vector)]                    ## kohske
grep("['pyfgcrl]", vector, value = TRUE, invert = TRUE) ## flodel

如果他们中的任何一个想要重新发布并接受他们的回答,我很乐意在这里删除我的.

If either of them wants to re-post and accept credit for their answer, I'm more than happy to delete mine here.

您正在寻找的通用函数是 grepl.来自 grepl 的帮助文件:

The general function that you are looking for is grepl. From the help file for grepl:

grepl 返回一个逻辑向量(匹配或不匹配 x 的每个元素).

grepl returns a logical vector (match or not for each element of x).

此外,您应该阅读 regex 的帮助页面,其中描述了 字符类 是什么.在本例中,您创建了一个字符类 ['pyfgcrl],它表示要查找方括号中的任何字符.然后你可以用 ! 否定它.

Additionally, you should read the help page for regex which describes what character classes are. In this case, you create a character class ['pyfgcrl], which says to look for any character in the square brackets. You can then negate this with !.

所以,到目前为止,我们有一些看起来像:

So, up to this point, we have something that looks like:

!grepl("['pyfgcrl]", vector)

为了得到你想要的东西,你像往常一样子集.

To get what you are looking for, you subset as usual.

vector[!grepl("['pyfgcrl]", vector)]

<小时>

对于@flodel 提供的第二种解决方案,grep 默认返回匹配的 positionvalue = TRUE 参数让您返回实际的字符串值.invert = TRUE 表示返回匹配的值.


For the second solution, offered by @flodel, grep by default returns the position where a match is made, and the value = TRUE argument lets you return the actual string value instead. invert = TRUE means to return the values that were not matched.

这篇关于基于正则表达式模式从向量中排除元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆