如何从R中的字符串中提取数字? [英] how can I extract numbers from a string in R?

查看:167
本文介绍了如何从R中的字符串中提取数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

names(score)
 [1] "(Intercept)"              "aado2_calc(20,180]"       "aado2_calc(360,460]"     
 [4] "aado2_calc(460,629]"      "albumin[1,1.8]"           "albumin(1.8,2.2]"        
 [7] "albumin(2.2,2.8]"         "aniongap(15,18]"          "aniongap(18,20]"         
[10] "aniongap(20,22]"          "aniongap(22,25]"          "aniongap(25,49]"    

我想提取括号内的两个数字(括号外的数字不需要)并且有("或[".第一个数字将分配给对象低",第二个数字将分配给高"".

I want to extract the two numbers within parenthesis (numbers outside the parenthesis are not needed) and there are "(" or "[". the first number will be assigned to an object "low" and the second to "high".

推荐答案

scorenames <- c(
  "(Intercept)"              ,"aado2_calc(20,180]"       ,"aado2_calc(360,460]"     
 ,"aado2_calc(460,629]"      ,"albumin[1,1.8]"           ,"albumin(1.8,2.2]"        
 ,"albumin(2.2,2.8]"         ,"aniongap(15,18]"          ,"aniongap(18,20]"         
 ,"aniongap(20,22]"          ,"aniongap(22,25]"          ,"aniongap(25,49]"
)

第一步可能是提取括号"分隔符中的所有内容(包括 ()[] 和逗号 代码>).

The first step might be to extract everything within the "parens"-delimiters (to include (), [], and the comma ,).

mat <- regmatches(scorenames,
                  gregexpr("(?<=[\\[\\(,])[0-9.]+(?=[\\]\\),])", scorenames, perl = TRUE))
str(mat)
# List of 12
#  $ : chr(0) 
#  $ : chr [1:2] "20" "180"
#  $ : chr [1:2] "360" "460"
#  $ : chr [1:2] "460" "629"
#  $ : chr [1:2] "1" "1.8"
#  $ : chr [1:2] "1.8" "2.2"
#  $ : chr [1:2] "2.2" "2.8"
#  $ : chr [1:2] "15" "18"
#  $ : chr [1:2] "18" "20"
#  $ : chr [1:2] "20" "22"
#  $ : chr [1:2] "22" "25"
#  $ : chr [1:2] "25" "49"

从这里,我们可以看到 (1) 第一个是有问题的(毫不奇怪,你需要在这里弄清楚你想要什么),以及 (2) 其余的看起来都差不多.

From here, we can see that (1) the first one is problematic (no surprise, you need to figure out what you want here), and (2) the rest look about right.

这是处理此列表的一种粗略方法.这是非常信任和天真的......你应该添加检查以确保列表的长度为 2,所有内容都正确转换(可能在 tryCatch 中)等.

Here's one rough way to process this list. This is very trusting and naïve ... you should probably add checks to ensure the list is of length 2, that everything converts correctly (perhaps in a tryCatch), etc.

newnames <- lapply(mat, function(m) {
  if (! length(m)) return(list(low = NA, high = NA))
  setNames(as.list(as.numeric(m)), nm = c("low", "high"))
})
str(newnames)
# List of 12
#  $ :List of 2
#   ..$ low : logi NA
#   ..$ high: logi NA
#  $ :List of 2
#   ..$ low : num 20
#   ..$ high: num 180
#  $ :List of 2
#   ..$ low : num 360
#   ..$ high: num 460
# ...snip...

您可以使用以下命令将其转换为 data.frame:

You can turn this into a data.frame with:

head(do.call(rbind.data.frame, newnames))
#     low  high
# 1    NA    NA
# 2  20.0 180.0
# 3 360.0 460.0
# 4 460.0 629.0
# 5   1.0   1.8
# 6   1.8   2.2

这篇关于如何从R中的字符串中提取数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆