使用stringr从R系列中提取最后4位数字 [英] Extract last 4-digit number from a series in R using stringr
问题描述
我想扁平化从 HTML 表格中提取的列表.下面给出了一个最小的工作示例.该示例依赖于 R 中的 stringr
包.第一个示例展示了所需的行为.
I would like to flatten lists extracted from HTML tables. A minimal working example is presented below. The example depends on the stringr
package in R. The first example exhibits the desired behavior.
years <- c("2005-", "2003-")
unlist(str_extract_all(years,"[[:digit:]]{4}"))
[1] "2005" "2003"
当我尝试匹配一系列其他数字中的最后 4 位数字时,下面的示例会产生不良结果.
The below example produces an undesirable result when I try to match the last 4-digit number in a series of other numbers.
years1 <- c("2005-", "2003-", "1984-1992, 1996-")
unlist(str_extract_all(years1,"[[:digit:]]{4}$"))
character(0)
根据我对文档的理解,我应该在模式的末尾包含 $
以便在字符串的末尾请求匹配.我更愿意匹配第二个示例中的数字2005"、2003"和1996".
As I understand the documentation, I should include $
at the end of the pattern in order to request the match at the end of the string. I would prefer to match from the second example the numbers, "2005", "2003", and "1996".
推荐答案
stringi
包具有操作字符串特定部分的便捷函数.因此,您可以使用以下内容找到最后出现的四个连续数字.
The stringi
package has convenient functions that operate on specific parts of a string. So you can find the last occurrence of four consecutive digits with the following.
library(stringi)
x <- c("2005-", "2003-", "1984-1992, 1996-")
stri_extract_last_regex(x, "\\d{4}")
# [1] "2005" "2003" "1996"
获得相同结果的其他方法是
Other ways to get the same result are
stri_sub(x, stri_locate_last_regex(x, "\\d{4}"))
# [1] "2005" "2003" "1996"
## or, since these count as words
stri_extract_last_words(x)
# [1] "2005" "2003" "1996"
## or if you prefer a matrix result
stri_match_last_regex(x, "\\d{4}")
# [,1]
# [1,] "2005"
# [2,] "2003"
# [3,] "1996"
这篇关于使用stringr从R系列中提取最后4位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!