使用stringr从R系列中提取最后4位数字 [英] Extract last 4-digit number from a series in R using stringr

查看:86
本文介绍了使用stringr从R系列中提取最后4位数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想扁平化从 HTML 表格中提取的列表.下面给出了一个最小的工作示例.该示例依赖于 R 中的 stringr 包.第一个示例展示了所需的行为.

I would like to flatten lists extracted from HTML tables. A minimal working example is presented below. The example depends on the stringr package in R. The first example exhibits the desired behavior.

years <- c("2005-", "2003-")
unlist(str_extract_all(years,"[[:digit:]]{4}"))

[1] "2005" "2003"

当我尝试匹配一系列其他数字中的最后 4 位数字时,下面的示例会产生不良结果.

The below example produces an undesirable result when I try to match the last 4-digit number in a series of other numbers.

years1 <- c("2005-", "2003-", "1984-1992, 1996-")
unlist(str_extract_all(years1,"[[:digit:]]{4}$"))

character(0)

根据我对文档的理解,我应该在模式的末尾包含 $ 以便在字符串的末尾请求匹配.我更愿意匹配第二个示例中的数字2005"、2003"和1996".

As I understand the documentation, I should include $ at the end of the pattern in order to request the match at the end of the string. I would prefer to match from the second example the numbers, "2005", "2003", and "1996".

推荐答案

stringi 包具有操作字符串特定部分的便捷函数.因此,您可以使用以下内容找到最后出现的四个连续数字.

The stringi package has convenient functions that operate on specific parts of a string. So you can find the last occurrence of four consecutive digits with the following.

library(stringi)

x <- c("2005-", "2003-", "1984-1992, 1996-")

stri_extract_last_regex(x, "\\d{4}")
# [1] "2005" "2003" "1996"

获得相同结果的其他方法是

Other ways to get the same result are

stri_sub(x, stri_locate_last_regex(x, "\\d{4}"))
# [1] "2005" "2003" "1996"

## or, since these count as words
stri_extract_last_words(x)
# [1] "2005" "2003" "1996"

## or if you prefer a matrix result
stri_match_last_regex(x, "\\d{4}")
#      [,1]  
# [1,] "2005"
# [2,] "2003"
# [3,] "1996"

这篇关于使用stringr从R系列中提取最后4位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆