防止R中的grep处理“。”。作为一封信 [英] Prevent grep in R from treating "." as a letter

查看:219
本文介绍了防止R中的grep处理“。”。作为一封信的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含类似以下内容的字符向量:

  text < -  c .xYz,ge,lmo.qrstu)

a

  xyzgeqrstu

但是, grep 函数似乎以

$

 $ 

 $ 

 c> pattern<  - ([AZ] | [az])+ $

grep(pattern,text,value = T)

& ABc.def.xYzgelmo.qrstu

regexpal



如何获得 grep 的行为如预期?

解决方案

grep 用于查找模式。它返回与模式匹配的向量的索引。如果指定 value = TRUE ,则返回值。从描述中,似乎要删除子字符串,而不是返回初始向量的子集。



如果需要删除子字符串,可以使用 sub

  sub('。* \\。','' ,text)
#[1]xYzgeqrstu

第一个参数,我们匹配一个模式ie '。* \\。'。它匹配多个字符(。* )后跟一个点( \\。)。需要 \\ 来转义,将其视为符号而不是任何字符。这将匹配,直到字符串中最后的字符。我们用''替换匹配模式作为替换参数,从而删除子字符串。


I have a character vector that contains text similar to the following:

text <- c("ABc.def.xYz", "ge", "lmo.qrstu")

I would like to remove everything before a .:

> "xYz" "ge" "qrstu"

However, the grep function seems to be treating . as a letter:

pattern <- "([A-Z]|[a-z])+$"

grep(pattern, text, value = T)

> "ABc.def.xYz" "ge"          "lmo.qrstu" 

The pattern works elsewhere, such as on regexpal.

How can I get grep to behave as expected?

解决方案

grep is for finding the pattern. It returns the index of the vector that matches a pattern. If, value=TRUE is specified, it returns the value. From the description, it seems that you want to remove substring instead of returning a subset of the initial vector.

If you need to remove the substring, you can use sub

 sub('.*\\.', '', text)
 #[1] "xYz"   "ge"    "qrstu"

As the first argument, we match a pattern i.e. '.*\\.'. It matches one of more characters (.*) followed by a dot (\\.). The \\ is needed to escape the . to treat it as that symbol instead of any character. This will match until the last . character in the string. We replace that matched pattern with a '' as the replacement argument and thereby remove the substring.

这篇关于防止R中的grep处理“。”。作为一封信的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆