如何在R中提取由正斜杠分隔的数字? [英] How can I extract numbers separated by a forward slash in R?

查看:72
本文介绍了如何在R中提取由正斜杠分隔的数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从字符串中提取 1/7、2/7、... 格式的页码.在 R 我想使用以下输入

I am trying to extract page numbers of the format 1/7, 2/7, ... from a string. In R I'd like to use the following input

input <- "Some text 7/8\n"

并提取输出7/8"甚至更好的数字 7.我不是一个普通的正则表达式用户,因此非常感谢您的帮助.

and extract the output "7/8" or even better just the number 7. I'm not a regular regex user and would therefore very much appreciate your help.

推荐答案

在正则表达式中,\d 表示数字,+ 表示一个或多个".所以模式 "\d+" 匹配一位或多位数字.我们可以使用带有此模式的 stringr::str_extract 来提取数字 - 默认情况下将提取模式匹配(根据需要).在 R 中使用正则表达式,我们需要用第二个 \ 对模式中的 \ 进行转义:

In regex, \d means a digit, and + means "one or more". So the pattern "\d+" matches one or more digits. We can use stringr::str_extract with this pattern to extract a number - by default the pattern match will be extracted (as desired). Using regex in R, we need to escape the \ in the pattern with a second \:

str_extract("Some text 7/8\n", "\\d+")
#[1] "7"

在前面的文本可能包含数字的情况下,我建议采用两阶段过程 - 首先提取数字,然后是 /(只需将其添加到正则表达式模式的末尾),然后将提取的 / 替换为空白.

In the case where the preceding text may include numbers, I'd recommend two stage process - first extract the number followed by a / (just add it to the end of the regex pattern), then replace the extracted / with a blank.

result = str_extract("Some 2879 numbery 8972 text 7/8\n", "\\d+/")
result = str_replace(result, pattern = "/", replacemet = "")
result
#[1] "7"

如果您想担心前面的文本可能包含分数的情况,我们需要更仔细地考虑如何提取正确的分子.如果总是需要提取的最后一个分数,我们可以使用 stringi::stri_extract_last_regex 而不是 stringr::str_extract.如果它不是最后一个,那么你需要制定一些逻辑来确定使用哪个......

If you want to worry about the case where the preceding text might have a fraction in it, we'll need to think harder about how to pull out the correct numerator. If it's always the last fraction that needs to be extracted, we could use stringi::stri_extract_last_regex instead of stringr::str_extract. If it isn't consistently the last one, then you'll need to work out some logic to figure out which one to use...

这篇关于如何在R中提取由正斜杠分隔的数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆