如何用基础 R 做 str_extract ? [英] How to do str_extract with base R?

查看:50
本文介绍了如何用基础 R 做 str_extract ?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在平衡多个版本的 R,并希望根据我使用的 R 和操作系统来更改加载的 R 库.因此,我想坚持使用基本的 R 函数.

I am balancing several versions of R and want to change my R libraries loaded depending on which R and which operating system I'm using. As such, I want to stick with base R functions.

我正在阅读此页面以了解与 stringr::str_extract 等效的基本 R 是什么:

I was reading this page to see what the base R equivalent to stringr::str_extract was:

http://stat545.com/block022_regular-expression.html

它建议我可以用 grep 复制这个功能.但是,如果有匹配项,我无法让 grep 做更多的事情,而不是返回整个字符串.这是否可以单独使用 grep ,或者我需要将它与另一个函数结合使用吗?就我而言,我试图区分 CentOS 版本 6 和 7.

It suggested I could replicate this functionality with grep. However, I haven't been able to get grep to do more than return the whole string if there is a match. Is this possible with grep alone, or do I need to combine it with another function? In my case I'm trying to distinguish between CentOS versions 6 and 7.

grep(pattern = "release ([0-9]+)", x = readLines("/etc/system-release"), value = TRUE)

推荐答案

1) strcapture 如果你想从release 1.2.3"中提取一串数字和点代码>使用基础然后

1) strcapture If you want to extract a string of digits and dots from "release 1.2.3" using base then

x <- "release 1.2.3"
strcapture("([0-9.]+)", x, data.frame(version = character(0)))
##   version
## 1   1.2.3

2) regexec/regmatches 还有 regmatchesregexec 但这已经在另一个答案中介绍过了.

2) regexec/regmatches There is also regmatches and regexec but that has already been covered in another answer.

3) sub 通常也可以使用 sub:

sub(".* ([0-9.]+).*", "\\1", x)
## [1] "1.2.3"

3a) 如果您知道比赛在开头或结尾,则删除它之后或之前的所有内容:

3a) If you know the match is at the beginning or end then delete everything after or before it:

sub(".* ", "", x)
## [1] "1.2.3"

4) gsub 有时我们知道要提取的字段有某些字符,它们不会出现在其他地方.在这种情况下,只需删除不能出现在字符串中的每个字符的每个匹配项:

4) gsub Sometimes we know that the field to be extracted has certain characters and they do not appear elsewhere. In that case simply delete every occurrence of every character that cannot be in the string:

gsub("[^0-9.]", "", x)
## [1] "1.2.3"

5) read.table 通常可以将输入分解为字段,然后通过数字或通过 grep 选择所需的字段.可以使用 strsplitread.tablescan:

5) read.table One can often decompose the input into fields and then pick off the desired one by number or via grep. strsplit, read.table or scan can be used:

read.table(text = x, as.is = TRUE)[[2]]
## [1] "1.2.3"

5a) grep/扫描

grep("^[0-9.]+$", scan(textConnection(x), what = "", quiet = TRUE), value = TRUE)
## [1] "1.2.3"

5b) grep/strsplit

grep("^[0-9.]+$", strsplit(x, " ")[[1]], value = TRUE)
## [1] "1.2.3"

6) substring 如果我们知道字段的字符位置,我们可以像这样使用substring:

6) substring If we know the character position of the field we can use substring like this:

substring(x, 9)
## [1] "1.2.3"

6a) substring/regexpr 或者我们可以使用 regexpr 为我们定位字符位置:

6a) substring/regexpr or we may be able to use regexpr to locate the character position for us:

substring(x, regexpr("\\d", x))
## [1] "1.2.3"

7) read.dcf 有时可以将输入转换为 dcf 形式,在这种情况下可以使用 read.dcf 读取.此类数据的格式为 name: value

7) read.dcf Sometimes it is possible to convert the input to dcf form in which case it can be read with read.dcf. Such data is of the form name: value

 read.dcf(textConnection(sub(" ", ": ", x)))
 ##      release
 ## [1,] "1.2.3"

这篇关于如何用基础 R 做 str_extract ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆