返回特定字母后直到下一个字母的数字的正则表达式 [英] Regular expression that returns numbers following a specific letter until the next letter
问题描述
我需要一个正则表达式,它返回一个特定的字母和后面的(一个或两个)数字,直到下一个字母.例如,我想使用 R 中的正则表达式提取公式中的碳数 (C)
I need a regular expression that returns a specific letter and the following (one or two) digits until the next letter. For example, I would like to extract how many carbons (C) are in a formula using regular expressions in R
strings <- c("C16H4ClNO2", "CH8O", "F2Ni")
我需要一个表达式来返回 C 的数量,可以是一位或 2 位数字,并且不返回氯 (Cl) 之后的数字.
I need an expression that returns the number of C which can be one or 2 digits and that does not return the number after chlorine (Cl).
substr(strings,regexpr("C[0-9]+",strings) + 1, regexpr("[ABDEFGHIJKLMNOPQRSTUVWXYZ]+",strings) -1)
[1] "16" "C" ""
但我想要返回的答案是
"16","1","0"
此外,我希望正则表达式能够自动定位下一个字母并停在它之前,而不是将最终位置指定为不是 C 的字母.
Moreover, I would like the regular expression to automatically locate the next letter and stop before it, instead of having a final position which I specify as a letter not being a C.
推荐答案
makeup
在 CHNOSZ 包中将解析一个化学式.以下是一些使用它的替代方案:
makeup
in the CHNOSZ package will parse a chemical formula. Here are some alternatives that use it:
1) 创建一个列表 L
这样完全解析的公式,然后检查每个公式是否有 "C"
组件和如果没有,则返回其值或 0:
1) Create a list L
of such fully parsed formulas and then for each one check if it has a "C"
component and return its value or 0 if none:
library(CHNOSZ)
L <- Map(makeup, strings)
sapply(L, function(x) if ("C" %in% names(x)) x[["C"]] else 0)
## C16H4ClNO2 CH8O F2Ni
## 16 1 0
请注意,L
是完整解析公式的列表,以防您有其他要求:
Note that L
is a list of the fully parsed formulas in case you have other requirements:
> L
$C16H4ClNO2
C H Cl N O
16 4 1 1 2
$CH8O
C H O
1 8 1
$F2Ni
F Ni
2 1
1a) 通过将 c(C = 0)
添加到每个列表组件,我们可以避免测试碳的存在,从而产生以下较短版本的 sapply
(1) 中的行:
1a) By adding c(C = 0)
to each list component we can avoid having to test for the existence of carbon yielding the following shorter version of the sapply
line in (1):
sapply(lapply(L, c, c(C = 0)), "[[", "C")
2) 除了名称之外,(1) 的这一单行变体给出了与 (1) 相同的答案.它将 "C0"
附加到每个公式以避免必须测试碳的存在:
2) This one-line variation of (1) gives the same answer as in (1) except for names. It appends "C0"
to each formula to avoid having to test for the existence of carbon:
sapply(lapply(paste0(strings, "C0"), makeup), "[[", "C")
## [1] 16 1 0
2a) 这是 (2) 的变体,它通过使用 makeup
将接受矩阵这一事实消除了 lapply
:
2a) Here is a variation of (2) that eliminates the lapply
by using the fact that makeup
will accept a matrix:
sapply(makeup(as.matrix(paste0(strings, "C0"))), "[[", "C")
## [1] 16 1 0
这篇关于返回特定字母后直到下一个字母的数字的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!